Difference between revisions of "Gene Ontology Term Analysis"

(Overview)
(Gene Ontology OBO file source)
Line 16: Line 16:
  
 
=Gene Ontology OBO file source=
 
=Gene Ontology OBO file source=
By default, each time geWorkbench starts, it downloads the latest Gene Ontology OBO file from the geneontology.org website.  However, a setting in the geWorkbench [[Preferences|Preferences]] allows an OBO file to be loaded locally from disk instead.  The file is chosen using a standard file browser.  After the preference setting has been changed, geWorkbench must be restarted before it will take effect.
+
By default, each time geWorkbench starts, it downloads the latest Gene Ontology OBO file from the geneontology.org website.  However, a setting in the geWorkbench [[Menu_Bar|Menu Bar]] Tools item allows an OBO file to be loaded locally from disk instead.  The file is chosen using a standard file browser.  After the setting has been changed, geWorkbench must be restarted before it will take effect.
  
 
=Analysis Component GUI=
 
=Analysis Component GUI=

Revision as of 17:15, 6 June 2011

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

The Gene Ontology project describes genes (gene products) using terms from three structured vocabularies: biological process, cellular component and molecular function.

A number of analysis methods in geWorkbench produce a list of interesting genes, for example, those differentially expressed (t-test, ANOVA), or those which show similarities in expression (Hierarchical Clustering, SOM, ARACNe). The Gene Ontology Enrichment component, also referred to as the "GO Terms" component, allows the genes in any such "changed-gene" list to be characterized using the Gene Ontology terms annotated to them. It asks, whether for any particular GO term, the fraction of genes assigned to it in the "changed-gene" list is higher than expected by chance (is over-represented), relative to the fraction of genes assigned to that term in the reference set. In statistical terms, the analysis tests the null hypothesis that, for any particular ontology term, there is no difference in the proportion of genes annotated to it in the reference list and the proportion annotated to it in the test list. The reference list is typically comprised of all genes on a microarray (after any filtering and removal of redundant entries).

The Gene Ontology (GO Terms) analysis component in geWorkbench is built around the Ontologizer 2.0 software product from Peter Robinson's group at the Charite Medical Institute of the Humboldt University in Berlin. It provides several methods for over-representation analysis, including Term-for-Term, Parent-Child, and Topology. More information about these methods can be found at the Ontologizer website at http://compbio.charite.de/index.php/ontologizer2.html, and in the descriptions and references below.

The Gene Ontology is structured as a directed acyclic graph (DAG). This has several consequences. A term can have more than one parent, and hence there can be multiple paths from the root by which a term can be reached. The Ontologizer code uses the "true path" property of the Gene Ontology in counting genes assigned to a term, by which a gene annotated to any term is considered also annotated to all that term’s parent terms. A term may thus show significant over-representation through the cumulative effects of its children rather than through genes assigned directly to it.


Note - If a marker has an annotation to a GO Term but has no gene symbol, it will not be included in the "Reference Gene" list or the "Changed Gene" list.

Gene Ontology OBO file source

By default, each time geWorkbench starts, it downloads the latest Gene Ontology OBO file from the geneontology.org website. However, a setting in the geWorkbench Menu Bar Tools item allows an OBO file to be loaded locally from disk instead. The file is chosen using a standard file browser. After the setting has been changed, geWorkbench must be restarted before it will take effect.

Analysis Component GUI

Parameters

Selection

GeneOntology Analysis Selection.png

Reference Gene List

The first pulldown allows one to choose from the following sources for the reference gene list:

  • All Genes - uses all markers in the current microarray dataset.
  • From Set - if chosen, the second pull-down shows the available sets defined in the Markers component.
  • From File - if chosen, the "Load" button becomes active.

Load - If From File is chosen, the user can load a comma-separated list of markers to use as the reference set.

Text field - displays the contents of the currently loaded reference gene list, regardless of source.

Note - If a marker has an annotation to a GO Term but has no gene symbol, it will not be included in the "Reference Gene" list.

Changed Gene List

The first pull-down allows one to choose from the following sources for the changed-gene list:

  • From Set - if chosen, the second pull-down shows the available sets defined in the Markers component.
  • From File - if chosen, the "Load" button becomes active.
  • From Result Node - if chosen, the second pull-down shows a list of available differential expression (t-test, ANOVA) result nodes from the Project Folders component.

Load - If From File is chosen, the user can load a comma-separated list of markers to use as the changed gene list.

Text field - displays the contents of the currently loaded changed-gene list, except if "From Result Node" is chosen.

Note - If a marker has an annotation to a GO Term but has no gene symbol, it will not be included in the "Changed Gene" list.

Ontology Selection

Not currently implemented, this is intended to allow the loading of alternate ontologies besides the three comprising the Gene Ontology.

Ontologizer

GeneOntology Analysis Ontologizer2.png

Annotations

  • Use loaded annotations - If an annotation file was read in when the microarray dataset was loaded, it is displayed here.
  • Use alternate annotation file - A new or alternate annotation file can be read in by selecting this option. Currently only the Affymetrix annotation file format is supported.
  • Alternate annotation text field - if an alternate annotation file is chosen, the file name will be display in this field.
  • Browse - this button brings up a file browser for choosing an annotation file.

Enrichment Method

GeneOntology Analysis EnrichmentMethod.png

Multiple Testing Correction

GeneOntology Analysis MultipleTesting.png

  • Benjamini-Hochberg -
  • Benjamini-Yekutieli -
  • Bonferroni -
  • Bonferroni-Holm -
  • None (default) -
  • Westfall-Young-Single-Step -
  • Westfall-Young-Step-Down -

Example

Setup

Running a GO Terms analysis requires a list of genes to analyze (the study set). Here, we will run a simple t-test on two classes of cell-lines in the BCell-100.exp dataset.


  1. Load Bcell-100.exp
  2. Threshold normalizer: min threshold 1.0.
  3. Log2 Normalize
  4. In Arrays, select the "Class" list of array sets and activate GC B-cell and GC-Tumor (Case).
  5. Run t-test with alpha threshold = 0.01 and using Bonferroni correction.


A new Marker set is created called "Significant Genes" with 437 markers.


The picture below shows the "Significant Genes" set has been chosen for the Changed Gene list.


GeneOntology Analysis Setup.png


In the Ontologizer tab, the Enrichment Method used is term-for-term (default) and the Bonferroni multiple testing correction is added.

GeneOntology Analysis Setup Ontologizer.png


Results

The results of running this analysis are shown on the Gene Ontology Viewer page.

References

  • Alexa A, Rahnenführer J, Lengauer T (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13), pps. 2600-1607 (link to paper)
  • Bauer S, Grossmann S, Vingron M, Robinson PN (2008). Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics 24(14), pps. 1650-1. (link to paper)
  • Falcon S, Gentleman R. (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23(2), pps. 257-8. (link to paper)
  • Grossmann S, Bauer S, Robinson PN, Vingron M (2007). Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 23(22), pps. 3024-31. (link to paper)
  • Robinson PN, Wollstein A, Böhme U, Beattie B. (2004) Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics 20(6), pps. 979-81. (link to paper)