User:Osborne

Revision as of 14:52, 15 August 2006 by Osborne (talk | contribs)

Basic Microarray Analysis with geWorkbench

Background

Correct and complete microarray analysis requires both an understanding of the actual experiment and the statistical and mathmatical tools being used. The tools and techniques being used will vary depending on the type of experiment and what knowledge the user hopes to gain from the experiment. Here we will describe a how to go about analyzing one of the most common types of microarray experiments - differential gene expression on Affymetrix arrays. Most of the techniques described should be suitable for other types of analysis when appropriately modified, but the user is cautioned against applying them blindly to their own data.

Introduction

Getting Started

  1. Requirements (Windows/Mac/Linux OS, Java 1.5 installed, at least 512 MB RAM)
  2. Installation
    1. geWorkbench downloads
    2. Java downloads

Preparing the Data

Affymetrix data is best imported into geWorkbench as tab-delimited files which contain 3 columns. The first column is the probeset identifier, the 2nd an optional annotation for the column, and the 3rd column contains the signal data. This file format is referred to in geWorkbench as "Affymetrix File Matrix" format and in order to be recognized by the file loading component of geWorkbench the filename should end with .exp. This is not to be confused with the Affymetrix .exp (experiment) file which is *not* loaded by geWorkbench. While this file cannot be directly generated from Affymetrix software, the CEL files or spreadhsheets that are generated by Affymetrix software can be modified into this format. Just use a spreadsheet program such as Excel.
  1. Generation from a spreadsheet
The easiest way to get data from a small number of microarray files is to load in a file and modify in a spreadsheet program. Below is a result for a typical experiment.

Initial spreadsheet file

The user should modify the spreadsheet to get it to look like the picture below.

Modified spreadsheet file

It should then be saved in tab delimited format with an .exp extension. This needs to be done for ALL microarrays in the experiment. If you are dealing with dozens or even hundreds of files, trying using R to generate the file instead.
  1. From CEL files

Load the data

Batch loading of Excel Files

    1. Loading of a single CEL derived geWorkbench file
  1. Inspect the data
  2. Normalize
  3. T-Test
  4. GO Analysis
  5. Promoter Analysis
  6. Pattern Discovery
  7. Other Analysis