User:Osborne

Revision as of 16:48, 23 August 2006 by Osborne (talk | contribs)

Basic Microarray Analysis with geWorkbench

Getting Started

  1. Requirements (Windows/Mac/Linux OS, Java 1.5 installed, at least 512 MB RAM)
  2. Installation
    1. geWorkbench downloads
    2. Java downloads

Background

Correct and complete microarray analysis requires both an understanding of the actual experiment and the statistical and mathmatical tools being used. The tools and techniques being used will vary depending on the type of experiment and what knowledge the user hopes to gain from the experiment. Here we will describe a how to go about analyzing one of the most common types of microarray experiments - differential gene expression on Affymetrix arrays. Most of the techniques described should be suitable for other types of analysis when appropriately modified, but the user is cautioned against applying them blindly to their own data.

Introduction

This tutorial walks the user through a fairly typical microarray experiment done using the Affymetrix HGU133Plus2 platform. In this case the experiments is a study of multiple myeloma resistance, it investigates 3 cell lines established from a patient resistant to glucocortoids. The 3 cell lines are:

  1. MM.1S expresses mostly the normal receptor. (C2E3)
  2. MM.Re expresses a small amount of normal receptor, but more alternatively spliced receptor (which is non-functional) (P1414)
  3. MM.1Rl expresses very little receptor of any kind. (P1310)

Cell Line History

The goal is to get a baseline measurement to find the difference in expression between the three cell lines, no large difference is expected because they all orginate from the same ancestor. Splicing differences are expected.

Step 1 - Inspect your data

With the release of version 1.04 caWorkbench can now read in the Affymetrix CEL file format natively. However due to data structure incompatibility problems it can not yet do anything other than display the data unless it is pre-processed using R (geWorkbench can directly access an R server). The ability to see an image of the microarray is still useful, because it is worthwhile to make sure there are not obvious errors (streaking, etc...) on the microarray you are about to analyze. CEL files can be loaded by selecting type 'CEL' in the file loader. You should be able to see an image as shown below.


CEL File Image


This image looks good, so we can continue. Inspect all of your images for defects if you want to be diligent. If you don't want to bother it is not usually a problem, but is may be worth doing if you run into problems later.


Preparing the Data

Affymetrix data is best imported into geWorkbench as tab-delimited files which contain 3 columns. The first column is the probeset identifier, the 2nd an optional annotation for the column, and the 3rd column contains the signal data. This file format is referred to in geWorkbench as "Affymetrix File Matrix" format and in order to be recognized by the file loading component of geWorkbench the filename should end with .exp. This is not to be confused with the Affymetrix .exp (experiment) file which is *not* loaded by geWorkbench. While this file cannot be directly generated from Affymetrix software, the CEL files or spreadhsheets that are generated by Affymetrix software can be modified into this format. Just use a spreadsheet program such as Excel.
  1. Generation from a spreadsheet
The easiest way to get data from a small number of microarray files is to load in a file and modify in a spreadsheet program. Below is a result for a typical experiment.


Initial spreadsheet file

The user should modify the spreadsheet to get it to look like the picture below.


Modified spreadsheet file

It should then be saved in tab delimited format with an .exp extension. This needs to be done for ALL microarrays in the experiment. If you are dealing with dozens or even hundreds of files, trying using R to generate the file instead.


  1. From CEL files
When dealing with more than just a few microarrays it may make more sense to do the data preparation and perhaps some preliminary analysis in R. R allows easy loading of an entire directory of CEL files which can then be written in tab delimited format. Below is some sample R commands that could be used to generate a file that can be imported into geWorkbench.


Load the data

Batch loading of Excel Files

    1. Loading of a single CEL derived geWorkbench file
  1. Inspect the data
  2. Normalize
  3. T-Test
  4. GO Analysis
  5. Promoter Analysis
  6. Pattern Discovery
  7. Other Analysis