Difference between revisions of "Tutorial - Filtering and Normalizing"

Line 5: Line 5:
 
* Uses of filtering and normalization
 
* Uses of filtering and normalization
 
* Types of filtering and normalization in geWorkbench
 
* Types of filtering and normalization in geWorkbench
* Example 1 - using the Affymetrix Present/Absent/Marginal filter
+
* Example 1 - Combined normalization and filtering
* Example 2 - Combined normalization and filtering
+
* Example 2 - using the Affymetrix Present/Absent/Marginal filter
  
  
 
==Background==
 
==Background==
Filtering can be used to remove low quality data or reduce the size of the dataset by removing less interesting data.  Normalization can be used to remove extraneous sources of variation between arrays, thus making the data more comparable.
+
Filtering can be used to remove low quality data or reduce the size of the dataset by removing less interesting data.  Normalization can be used to decrease the effects of systematic differences across a set of microarrays, thus making the data more comparable.
  
In geWorkbench, the effect of filtering out a value is to mark it internally as "Missing".  Many types of analysis require that missing values be dealt with properly. The '''Missing Values Filter''' allows markers that have more than a specified number of missing values to be removed.  Another option is to use the '''Missing Value Computation''' found in '''Normalizers''', which will replace missing values with imputed values.  Some analysis routines have built-in methods for replacing missing values.
+
In geWorkbench, filtering and normalization alter the loaded dataset, the original is not retained. The effect of filtering out a value is to mark it internally as "Missing".  Many types of analysis require that missing values be dealt with properly. The '''Missing Values Filter''' allows markers that have more than a specified number of missing values to be removed.  Another option is to use the '''Missing Value Computation''' found in '''Normalizers''', which will replace missing values with imputed values.  Some analysis routines have built-in methods for replacing missing values.  Normalization results in the replacement of existing data values with new values.
  
 
Direct support for methods such as RMA and GCRMA, which include a normalization step, is not directly available in geWorkbench, but its addtion is planned (as of June 2006).  Affymetrix CEL files can be processed externally to geWorkbench using a program such as RMAExpress (available for Windows computers) and then imported into geWorkbench.
 
Direct support for methods such as RMA and GCRMA, which include a normalization step, is not directly available in geWorkbench, but its addtion is planned (as of June 2006).  Affymetrix CEL files can be processed externally to geWorkbench using a program such as RMAExpress (available for Windows computers) and then imported into geWorkbench.
 
  
  
Line 29: Line 28:
 
|-
 
|-
  
|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
+
|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of P, A or M (Present, Absent, Marginal) as missing.
 
|-
 
|-
  
|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
+
|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Missing values can arise either from the original data or from the results of another filtering step.
 
|-
 
|-
 
|-
 
|-
 
|-
 
|-
  
|'''Deviation''' ||Marks as missing all markers whose measurements deviate below a given value across all microarrays.
+
|'''Deviation''' ||Marks as missing all markers whose deviation is less than a given value across all microarrays.
 
|-
 
|-
 
|-
 
|-
Line 54: Line 53:
  
  
==Preparation==
+
==Normalizers==
 
 
In this tutorial we will use the microarray dataset file '''webmatrix.exp''', available in [[Downloads]]. Please refer to [[Tutorial - Projects and Data Files]] tutorial for assistance in loading a file.
 
 
 
[[Image:Filterpanel.gif]]
 
 
 
 
 
===Filtering out data called absent in an Affymetrix file===
 
 
 
* Here the file webmatrix.exp has been loaded.  It is available on the [[Downloads]] page.  It contains unfiltered, unnormalized data.  The first microarray dataset, seen in the '''Microarray Viewer''' using the '''Absolute''' display setting, looks like this:
 
 
 
 
 
[[Image:T_Filtering_Before_Affy_85p.png]]
 
 
 
 
 
* In the Filtering Panel, select'' Affy Detection Call Filter''.
 
* Select ‘A’ (Absent) checkbox and press '''Filter.''' Values that were called '''Absent''' in the original dataset are highlighted in yellow.  Internally, the values are now marked as '''Missing'''.
 
 
 
 
 
[[Image:T_Filtering_After_Affy_85p.png]]
 
 
 
 
 
 
 
 
 
* In the Filtering Panel, select '''Missing Values Filter'''.
 
* Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
 
* Click '''Filter'''.  Markers with more than 0 missing values are removed.  Notice in the below picture that the yellow values are now gone.
 
 
 
 
 
[[Image:T_Filtering_After_MV_85p.png]]
 
 
 
==Normalize==
 
Normalization can be used to decrease the effects of systematic differences across a set of experiments. In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''
 
 
 
  
 
{|style="border: 1px solid lightGray"
 
{|style="border: 1px solid lightGray"
Line 118: Line 84:
 
|-
 
|-
 
|}
 
|}
 +
 +
  
 
[[Image:Normalpanel.gif]]
 
[[Image:Normalpanel.gif]]
 +
 +
 +
==Preparation==
 +
 +
In these examples we will use the microarray dataset file '''webmatrix.exp''', available in [[Downloads]]. Please refer to [[Tutorial - Projects and Data Files]] tutorial for assistance in loading a file.  It contains unfiltered, unnormalized data.
 +
 +
[[Image:Filterpanel.gif]]
 +
 +
 +
  
  
Line 145: Line 123:
 
|-
 
|-
 
|}
 
|}
 +
 +
 +
 +
==Example 2: Filtering out data called absent in an Affymetrix file==
 +
  The first microarray dataset, seen in the '''Microarray Viewer''' using the '''Absolute''' display setting, looks like this:
 +
 +
 +
[[Image:T_Filtering_Before_Affy_85p.png]]
 +
 +
 +
* In the Filtering Panel, select'' Affy Detection Call Filter''.
 +
* Select ‘A’ (Absent) checkbox and press '''Filter.''' Values that were called '''Absent''' in the original dataset are highlighted in yellow.  Internally, the values are now marked as '''Missing'''.
 +
 +
 +
[[Image:T_Filtering_After_Affy_85p.png]]
 +
 +
 +
 +
 +
* In the Filtering Panel, select '''Missing Values Filter'''.
 +
* Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
 +
* Click '''Filter'''.  Markers with more than 0 missing values are removed.  Notice in the below picture that the yellow values are now gone.
 +
 +
 +
[[Image:T_Filtering_After_MV_85p.png]]

Revision as of 14:41, 8 June 2006

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Outline

  • Uses of filtering and normalization
  • Types of filtering and normalization in geWorkbench
  • Example 1 - Combined normalization and filtering
  • Example 2 - using the Affymetrix Present/Absent/Marginal filter


Background

Filtering can be used to remove low quality data or reduce the size of the dataset by removing less interesting data. Normalization can be used to decrease the effects of systematic differences across a set of microarrays, thus making the data more comparable.

In geWorkbench, filtering and normalization alter the loaded dataset, the original is not retained. The effect of filtering out a value is to mark it internally as "Missing". Many types of analysis require that missing values be dealt with properly. The Missing Values Filter allows markers that have more than a specified number of missing values to be removed. Another option is to use the Missing Value Computation found in Normalizers, which will replace missing values with imputed values. Some analysis routines have built-in methods for replacing missing values. Normalization results in the replacement of existing data values with new values.

Direct support for methods such as RMA and GCRMA, which include a normalization step, is not directly available in geWorkbench, but its addtion is planned (as of June 2006). Affymetrix CEL files can be processed externally to geWorkbench using a program such as RMAExpress (available for Windows computers) and then imported into geWorkbench.


Filters

Available geWorkbench filters are as follows:


Filter Description
Affy Detection Call Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of P, A or M (Present, Absent, Marginal) as missing.
Missing values Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Missing values can arise either from the original data or from the results of another filtering step.
Deviation Marks as missing all markers whose deviation is less than a given value across all microarrays.
Expression Threshold Marks as missing all markers whose measurements are inside (or outside) a user-defined range.
2 Channel Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and marks as missing all expression measurements for which either channel intensity is inside (or outside) the defined range.


Normalizers

Normalizer Description
Missing value calculation Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
Log2 Transformation Applies a log2 transformation to all measurements in a microarray
Threshold Normalizer All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
Marker-based Centering Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
Array-based centering Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
Mean-variance normalizer For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation


Normalpanel.gif


Preparation

In these examples we will use the microarray dataset file webmatrix.exp, available in Downloads. Please refer to Tutorial - Projects and Data Files tutorial for assistance in loading a file. It contains unfiltered, unnormalized data.

Filterpanel.gif



Apply Quantile Normalizer

1. In the Normalization Panel, select Quantile Normalizer.


2. Leave the default averaging method of Mean Profile Marker to indicate handling of missing values..


3. Click Normalize. The View Area is updated to reflect normalization (after the screen has been refreshed). Note: The first value in the second row was update from 41,394.6 to 55,779.26.

PRENORMALIZATION NORMALIZED
Prenormalizer ed.gif Postnormalizera.gif.


Example 2: Filtering out data called absent in an Affymetrix file

 The first microarray dataset, seen in the Microarray Viewer using the Absolute display setting, looks like this:


T Filtering Before Affy 85p.png


  • In the Filtering Panel, select Affy Detection Call Filter.
  • Select ‘A’ (Absent) checkbox and press Filter. Values that were called Absent in the original dataset are highlighted in yellow. Internally, the values are now marked as Missing.


T Filtering After Affy 85p.png



  • In the Filtering Panel, select Missing Values Filter.
  • Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
  • Click Filter. Markers with more than 0 missing values are removed. Notice in the below picture that the yellow values are now gone.


T Filtering After MV 85p.png