Filtering

Revision as of 18:51, 7 June 2010 by Smith (talk | contribs)


Overview

The Filters panel is located in the Command and Analysis Area in the lower right of the application. The application offers a selection of pluggable filters to assist in the preparation of data for analysis.

Filtering can be used to remove low quality data or reduce the size of the dataset by removing less interesting data. Most geWorkbench filters allow the user to specify a minimum number or percentage of arrays that must meet that filter's criterion before the marker will be removed.

All filtering operations directly alter the loaded data set and are not reversible. A separate copy of the data is not generated. Filtering operations do not respect any marker or array sets that may be activated; filtering always acts on the entire data set.

Filter Configuration

Some filters are not loaded by default in geWorkbench. To configure which filters to load, use the Component Configuration Manager. It is available in the top menu-bar under Tools->Component Configuration.

Filters in CCM.png

Available Filters

Filter Description
Affy Detection Call Applicable to Affymetrix data only. Filter on Present, Marginal or Absent calls.
Deviation Removes markers whose sample standard deviation is less than a specified value across all microarrays.
Expression Threshold Removes markers where more than a specified number (or percentage) have values inside (or outside) a user-defined range.
Genepix Expression Threshold Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and removes markers for which, for more than a specified number (or percentage) of markers, either channel intensity is inside (or outside) the defined range.
GenePix Flags Remove markers where more than a specified number (or percentage) of values match the selected flag (flagged in GenePix software).
Missing values Removes markers that have “missing” measurements in more than a specified number (or percentage) of microarrays.

Basic Controls

Overview

Filtering-Affy detection call full.png

  • Filter (menu) - select the desired filter.
  • Save Parameters (menu) - allows selection of stored parameter settings.
  • Filter (button)- run the selected filter.
  • Preview - preview the filtering action (see following section).
  • Save Settings - Save the current settings (see following section).
  • Delete Settings- delete the currently selected parameter set.

Filtering Preview

The filtering action can be previewed to allow the user to judge whether to proceed with the current parameter settings. The markers that will be removed are listed, as is a count of the markers in the list. The list displays marker names and, where available, gene names. Either list can be searched on.

Filtering-Preview.png

  • Search Marker - Search the list by marker name.
  • Search Gene - Search the list by gene name.
  • Filter - perform the filtering action.
  • Cancel - Cancel the filtering action, no change is made.

Saving Parameters

The current parameter settings can be saved to a named parameter set. The saved set will be displayed in the pull-down menu at upper right in the component. Any number of parameter sets can be saved. If the currently set parameters match a saved set, that set's entry will be shown in the menu.

  • Save Settings - save the current settings to a new parameter set.
  • Delete Settings - delete the currently selected parameter set from the menu.

Specific Controls for each Filter

Affymetrix Detection Call Filter

Filtering-Affy detection call.png

Certain Affymetrix data analysis software, e.g. MAS5/GCOS, produces a confidence value for the expression measurement of each probeset (marker) on each array. These confidence values (actually p-values) are used to categorize each reading as either Present, Marginal or Absent, based on fixed cutoff values.

The Detection Call Filter allows the user to remove markers which in more than a certain number, or a certain percentage, of arrays have a particular call.

That is, the user might specify that if the value for a particular marker is called "Absent" on more than 40% of the arrays, the marker should be filtered out.

Detection calls to be filtered out

These check-boxes indicate on which detection call values to filter:

  • P - Present
  • M - Marginal
  • A - Absent

Any combination of boxes may be checked, and the number of arrays on which any of the checked conditions are met for a given marker will be summed.

Filtering Options

  • Remove the marker if the percentage of matching arrays is more than N. - If for a given marker, the sum of detection calls matching those chosen by the user exceeds the given percentage N, the marker will be removed.
  • Remove the marker if the number of matching arrays is more than N. - If for a given marker, the sum of detection calls matching those chosen by the user exceeds the given number N, the marker will be removed.

Deviation Filter

Filtering-Deviation.png

This filter measures, for each marker, the sample standard deviation of the expression values across all arrays. If the sample standard deviation is less than the given "bound", then the marker will be filtered out.

  • Std. Deviation bound - markers showing a sample standard deviation below this bound will be filtered out.
  • Missing values - Before computing the standard deviation, this filter can replace any missing values in the data. The available methods are
    • Marker Average - Replace any missing values for a particular marker with the average of its available values across all arrays.
    • Microarray Average - Replace any missing values for a particular array with the average of its available values across all markers.
    • Ignore - Do not replace any missing values.

Expression Threshold Filter

Filtering-Expression Threshold.png

In this filter, a reference range is defined by a lower and upper bound. Values either inside or outside of this range can then be filtered out.

Threshold settings

  • Range Min - The lower bound of the range.
  • Range Max - The upper bound of the range.
  • Filter-out values
    • Inside of range - remove expression values that fall within the specified range.
    • Outside of range - remove expression values that fall outside of the specified range.

Filtering Options

  • Remove the marker if the percentage of matching arrays is more than N. - If for a given marker, the percentage of expression values meeting the range setting exceeds the given percentage N, the marker will be removed.
  • Remove the marker if the number of matching arrays is more than N. - If for a given marker, the number of expression values meeting the range setting exceeds the given number N, the marker will be removed.

GenePix Expression Threshold

Filtering-GenePix Expression Threshold.png

This filter supports filtering of two-channel data from the GenePix platform. Based on the chemical labels often used to differentiate the two channels, they are referred to in the component as Cy3 (green) and Cy5 (red).

The intensity value for a channel is calculated by subtracting the background measurement from the foreground measurement for the channel.

For each channel, a reference range of values is defined by setting a lower and upper bound. Filtering on values either inside or outside of this range can then be performed.

The filter considers both channels together. If for a given marker on a given array either the Cy3 channel value OR the Cy5 channel value meets its specified range requirement, it will be counted toward meeting the filtering requirement (see Filtering Options below).

Please note that Genepix expression value computation options are specified in Tools->Preferences. The default setting is (Mean F635 - Mean B635) / (Mean F532 - Mean B532). However, this filter acts on the data prior to the calculation of the relative expression values.

Threshold settings

The threshold values are real numbers.

  • Cy3 Range Min - The lower bound of the Cy3 (channel 1) range.
  • Cy3 Range Max - The upper bound of the Cy3 (channel 1) range.
  • Cy5 Range Min - The lower bound of the Cy5 (channel 2) range.
  • Cy5 Range Max - The upper bound of the Cy5 (channel 2) range.
  • Filter-out values
    • Inside of range - remove expression values that fall within the specified range.
    • Outside of range - remove expression values that fall outside of the specified range.

Filtering Options

  • Remove the marker if the percentage of matching arrays is more than N. - If for a given marker, the percentage of expression values (Cy3 or Cy5) meeting the range setting exceeds the given percentage N, the marker will be removed.
  • Remove the marker if the number of matching arrays is more than N. - If for a given marker, the number of expression values (Cy3 or Cy5) meeting the range setting exceeds the given number N, the marker will be removed.


GenePix Flags Filter

Filtering-GenePix Flag Filter.png

GenePix software allows individual expression values to be flagged, using integer flags with defined meanings. The flags can be assigned either directly by the software or by the user.

Please note that Genepix expression value computation options are specified in Tools->Preferences. The default setting is (Mean F635 - Mean B635) / (Mean F532 - Mean B532). However, this has no effect on the Flags filtering.


Flags

  • Filter - check-boxes to indicate on which flags to filter.
  • Flag name - Name of the flag if known.
  • Description - Description of the flag if known.
  • # of Occurences - The number of times that the given flag occurs in the data set, irrespective of marker or array.

Standard flag values were obtained from the "GenePix Pro 4.0 Tutorial", GenePix Pro 4.0 User’s Guide, Copyright 2001 Axon Instruments, Inc.

For meeting the filtering threshold for a given marker, the number of matches for all selected flags are summed together.


Filtering Options

  • Remove the marker if the percentage of matching arrays is more than N. - If for a given marker, the percentage of expression values with the selected flags exceeds the given percentage N, the marker will be removed.
  • Remove the marker if the number of matching arrays is more than N. - If for a given marker, the number of expression values with the selected flags exceeds the given number N, the marker will be removed.


Missing Values Filter

Filtering-Missing Values Filter.png

If a value is missing in the input file, it will be marked as missing in geWorkbench. Markers with more than a certain number or percentage of missing values can be removed with this filter.


Filtering Options

  • Remove the marker if the percentage of matching arrays is more than N. - If for a given marker, the percentage of arrays containing a missing expression value exceeds the given percentage N, the marker will be removed.
  • Remove the marker if the number of matching arrays is more than N. - If for a given marker, the number of arrays containing a missing expression value exceeds the given number N, the marker will be removed.