Difference between revisions of "Marker Annotations"

(Overview)
(Changes in geWorkbench 2.5.0)
 
(27 intermediate revisions by 3 users not shown)
Line 7: Line 7:
  
 
==Overview==
 
==Overview==
 +
 
The Marker Annotations component enables the retrieval of biological annotation information for a collection of genes. For every gene, the following data can be retrieved:
 
The Marker Annotations component enables the retrieval of biological annotation information for a collection of genes. For every gene, the following data can be retrieved:
  
 +
* Links to gene detail pages.
 
* A set of pathways containing the gene.  
 
* A set of pathways containing the gene.  
* A set of gene-disease and gene-compound associations derived from the literature articles.
+
 
 
   
 
   
All annotations are retrieved from remote servers maintained by the National Cancer Institute (NCI) using Cancer Bioinformatics Infrastructure Objects (caBIO) services. The data come from the following sources:
+
All annotations are retrieved from remote servers maintained by the National Cancer Institute (NCI) [http://biodbnet.abcc.ncifcrf.gov/ bioDBnet resource of the Advanced Biomedical Computing Center] (NCI Frederick).
 
 
* '''Pathways''': NCI's Pathway Interaction Database (PID). PID pathways come from 3 sources: [http://www.biocarta.com BioCarta], Reactome and "NCI-Nature Curated". Information about the PID and each of the contributing sources is available at: http://pid.nci.nih.gov/userguide/database_content.shtml. These pathways are stored in servers used by the Cancer Gene Anatomy Project [http://cgap.nci.nih.gov/ CGAP].
 
 
 
* '''Gene-disease/compound associations''': the Cancer Gene Index (CGI) data base. The reported associations are extracted from article abstracts using a combination of automatic text mining, semi-automatic verification, and manual curation. Project details are available at: http://ncicb.nci.nih.gov/NCICB/projects/cgdcp.
 
  
 
==Submit Query==
 
==Submit Query==
 
The Marker Annotations module will retrieve information for all markers that belong to activated marker sets, or, more precisely, for the genes corresponding to those markers:
 
The Marker Annotations module will retrieve information for all markers that belong to activated marker sets, or, more precisely, for the genes corresponding to those markers:
  
[[Image:Select_marker_sets.png]]
+
[[Image:MarkerAnnotations_Marker_set_activation.png]]
  
  
Line 27: Line 25:
  
  
E.g., in the example shown above, information will be retrieved about the genes AATF, CD40, STAT3. Checkboxes at the bottom of the component's user interface can be used to specify which data source(s) to query: CGAP, CGI or both. For CGAP, the associated drop-down can be used to designate the target organism for which annotations are retrieved: human (the default) or mouse. Clicking the "Retrieve Annotations" button initiates the communication with the NCI servers:
+
E.g., in the example shown above, information will be retrieved about the genes STAT3 and CEBPB. Clicking the "Retrieve Annotations" button initiates the communication with the remote server:
  
  
  
[[Image:Data_source_checkboxes.png]]
+
[[Image:MarkerAnnotations_Controls.png|{{ImageMaxWidth}}]]
  
  
While the information is being retrieved, progress indicators will be shown for either CGAP, CGI, or both query types, as appropriate.
 
  
The following image is taken from a query on STAT3.
+
While the information is being retrieved, a progress bar will be shown.
  
  
[[Image:Marker_Annotations_ProgressBar_CGAP_and_CGI.png]]
+
[[Image:MarkerAnnotations_ProgressBar.png]]
  
 
==Pathway and Gene Annotations==
 
==Pathway and Gene Annotations==
The "Annotations" tab presents a summary listing of the annotations retrieved from CGAP:
+
The "Annotations" tab presents a summary listing of the annotations retrieved from bioDBnet:
  
[[Image:Marker_Annotations_CGAP_gene.png]]
+
[[Image:MarkerAnnotations_Results.png|{{ImageMaxWidth}}]]
  
  
The listing contains at least one row for each gene which annotation information is available for. If a gene is associated with more than one pathways, then one row for every pathway is listed (e.g., as is the case above for CD40 and STAT3). Every row displays the marker (i.e., probeset) id, the corresponding gene name and the name of the associated pathway.  
+
The listing contains at least one row for each gene which annotation information is available for. If a gene is associated with more than one pathways, then one row for every pathway is listed (e.g., as is the case above for STAT3). Every row displays the marker (i.e., probeset) id, the corresponding gene name and the name of the associated pathway.  
  
 
Right-clicking on a gene name shows the following menu options.  Each will link to the named data source to display gene information in your browser:
 
Right-clicking on a gene name shows the following menu options.  Each will link to the named data source to display gene information in your browser:
 +
 +
 +
[[Image:MarkerAnnotations_Gene_Menu.png]]
 +
  
 
* '''Go to Entrez for ''gene name'''''
 
* '''Go to Entrez for ''gene name'''''
Line 56: Line 57:
  
  
Clicking on a pathway brings up a popup menu offering a number options:
+
Right-clicking on a pathway brings up a popup menu offering the following options:
 +
 
  
 +
[[Image:MarkerAnnotations_Pathway_Menu.png]]
  
[[Image:Marker_Annotations_CGAP_pathway.png]]]
 
  
  
* '''View Diagram''': available only for BioCarta pathways. Such pathways are accompanied by images offering a graphical/artistic rendition of the pathway. Selecting the "View Diagram" option will display this image within the "Pathway" tab.  
+
* '''View Diagram''': available only for BioCarta pathways. geWorkbench generates a diagram of the pathway based on the BioCarta description. Selecting the "View Diagram" option will display this diagram within the "Pathway" tab.
 +
* '''View Diagram on BioCarta site''' - View the BioCarta diagram in your browser directly on the BioCarta website.
 
* '''Add pathway genes to set''': extracts the pathway genes for which there are associated probes in the microarray set currently selected by the user and places all such probes in a new marker set within the "Markers" component (by default, the marker set is named after the pathway).  
 
* '''Add pathway genes to set''': extracts the pathway genes for which there are associated probes in the microarray set currently selected by the user and places all such probes in a new marker set within the "Markers" component (by default, the marker set is named after the pathway).  
 
* '''Export genes to CSV''': creates a new text file containing a listing of all pathway genes. The file format (csv = comma separated values) is compatible with Microsoft Excel.  
 
* '''Export genes to CSV''': creates a new text file containing a listing of all pathway genes. The file format (csv = comma separated values) is compatible with Microsoft Excel.  
  
 
The export file contains the following columns:
 
The export file contains the following columns:
Marker
+
* Marker
Gene
+
* Gene
Entrez GeneId
+
* Entrez GeneId
Pathway
+
* Pathway
Entrez URL
+
* Entrez URL
CGAP URL
+
* CGAP URL
GeneCards URL
+
* GeneCards URL
 
 
 
 
 
 
[[Image:Biocarta.png]]
 
 
 
 
 
A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab. The drop-down box, on the top left corner above the diagram, shows the name of the currently displayed diagram. The component keeps a history of all BioCarta diagrams selected by the user; using the drop-down it is possible to switch among the corresponding pathway images. The "Clear Diagram" button clears the currently displayed diagram. The "Clear History" button both clears the currently displayed diagram and removes all pathway history information from the pathway name drop-down box.
 
 
 
==Cancer Gene Index==
 
For many genes, there are hundreds of records in the CGI database. Retrieving all those records at once can be a very time consuming operation, especially if the query involves many genes. To avoid very long waits, the retrieval of the data occurs in 2 stages. In the first stage, at most 10 records for each association type (gene-disease/gene-compound) are being fetched (for each query gene). Data retrieved as displayed in the CancerGeneIndex tab:
 
 
 
 
 
[[Image:CGI_summary_page.png]]
 
 
 
 
 
The user interface is divided in two regions, sharing the same overall structure and functionality: the table on the left displays gene-disease associations, while the table on the right is used for reporting gene-compound associations. To avoid redundancy, the paragraphs that follow describe only the gene-disease table. The exact same description applies in the case of the gene-compound table.
 
 
 
Each table row represent an association between a query gene and a disease. The first column contains the name of the probeset associated with the query gene. For any given gene, if its corresponding probeset name appears in bold-face, this means that there are more gene-disease records on the CGI server that have yet to be fetched (beyond the 10 records acquired in the initial retrieval stage). The disease name within a row is followed by a number in parentheses. This number indicates how many records (among those fetched) support the reported association. E.g, in the image above, the first row indicates that there are 6 distinct records linking STAT3 to melanoma. A detailed listing of these 6 records is available by "expanding" the row. This can be achieved by right-clicking on the row and selecting "expand" from the ensuing popup menu:
 
 
 
 
 
[[Image:CGI_expanded.png]]
 
 
 
 
 
Each detailed record contains 3 additional pieces of information:
 
 
 
* '''Role''': a curator-assigned description of the kind of association being reported. The values in this column come from a controlled vocabulary (developed to support the CGI database creation effort).
 
* '''Sentence''': the actual article abstract sentence used to derive the reported gene-disease association. The full sentence is displayed in the text area at the bottom portion of the interface (it is also available as a tool tip text, by mousing over the "Sentence" column).
 
* '''Pubmed''': the Pubmed ID of the source article. Clicking on the Pubmed link brings up in the web browser the corresponding Pubmed abstract page (in this example, the sentence used for deriving the gene-disease association is actually the paper title):
 
 
 
 
 
[[Image:Pubmed.png]]
 
 
 
 
 
 
 
It is also possible to link out to the NCI thesaurus in order to see the definition of the disease being associated with a gene; this is achieved by right-clicking on the table row for the gene-disease association and selecting the "Link to NCI_Thesaurus" option from the popup:
 
 
 
[[Image:NCI_thesaurus_disease_small.png]]
 
 
 
 
 
 
 
It should also be noted that the user interface allows ordering and filtering of the data (the latter can be very useful if there are many records being displayed):
 
 
 
* '''Ordering''': the table rows can be sorted alphabetically by the contents of any column, by clicking on a column heading.
 
* '''Filtering''': the drop down boxes that appear above the columns "Marker", "Gene", "Disease", and "Role" contain one value for each distinct entry within those columns. They can be used to select only records that contain the designated values. Of note is the drop-down associated with the "Disease" column:
 
  
[[Image:CGI_disease_dropdown.png]]
 
  
  
 +
[[Image:MarkerAnnotations_h_erkPathway.png]]
  
The parentheses next to a disease name indicate how many distinct genes (among those included in the user query) are associated with this particular disease.
 
  
It should be noted that these numbers are calculated using only fetched records. As mentioned above, the first stage of the information retrieval will fetch at most 10 records per gene. The remaining records associated with a gene can be retrieved from the CGI server by right-clicking on a table row corresponding to the gene and by selecting the popup menu option "retrieve" all. After all gene-disease association records for a given gene have been fetched, the bold-face type of its associated probeset name is removed:
+
A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab.  
 +
* '''pathway dropdown box''' - The drop-down box, on the top left corner above the diagram, shows the name of the currently displayed diagram. The component keeps a history of all BioCarta diagrams selected by the user; using the drop-down it is possible to switch among the corresponding pathway images.
 +
* '''Clear Diagram''' - this button clears the currently displayed diagram.
 +
* '''Clear History''' - this button clears the currently displayed diagram and removes all pathway history information from the pathway name drop-down box.
 +
* '''Image Snapshot''' - save a snapshot of the diagram to the [[Workspace]].
  
[[Image:CGI_all_retrieved.png]]
+
==References==
  
 +
Mudunuri,U., Che,A., Yi,M. and Stephens,R.M. (2009) bioDBnet: the biological database network. Bioinformatics, 25, 555-556. [http://www.ncbi.nlm.nih.gov/pubmed/19129209 link to paper]
  
Finally, by clicking on the "Export" button at the bottom of the user interface, the contents of the gene-disease and the gene-compound tables displayed within the CancerGeneIndex tab can be exported as comma separated values text files for further analysis or/and visualization by spreadsheet software.
 
  
 +
[http://www.biocarta.com BioCarta]
  
Marker ID
+
[http://www.biocarta.com/legal/terms.asp Biocarta Terms of Use]
Gene Symbol
 
Agent
 
Gene-Agent Relation
 
Abstract Sentence
 
PMID
 

Latest revision as of 18:27, 22 January 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot




Overview

The Marker Annotations component enables the retrieval of biological annotation information for a collection of genes. For every gene, the following data can be retrieved:

  • Links to gene detail pages.
  • A set of pathways containing the gene.


All annotations are retrieved from remote servers maintained by the National Cancer Institute (NCI) bioDBnet resource of the Advanced Biomedical Computing Center (NCI Frederick).

Submit Query

The Marker Annotations module will retrieve information for all markers that belong to activated marker sets, or, more precisely, for the genes corresponding to those markers:

MarkerAnnotations Marker set activation.png



E.g., in the example shown above, information will be retrieved about the genes STAT3 and CEBPB. Clicking the "Retrieve Annotations" button initiates the communication with the remote server:


MarkerAnnotations Controls.png


While the information is being retrieved, a progress bar will be shown.


MarkerAnnotations ProgressBar.png

Pathway and Gene Annotations

The "Annotations" tab presents a summary listing of the annotations retrieved from bioDBnet:

MarkerAnnotations Results.png


The listing contains at least one row for each gene which annotation information is available for. If a gene is associated with more than one pathways, then one row for every pathway is listed (e.g., as is the case above for STAT3). Every row displays the marker (i.e., probeset) id, the corresponding gene name and the name of the associated pathway.

Right-clicking on a gene name shows the following menu options. Each will link to the named data source to display gene information in your browser:


MarkerAnnotations Gene Menu.png


  • Go to Entrez for gene name
  • Go to CGAP for gene name
  • Go to GeneCards for gene name


Right-clicking on a pathway brings up a popup menu offering the following options:


MarkerAnnotations Pathway Menu.png


  • View Diagram: available only for BioCarta pathways. geWorkbench generates a diagram of the pathway based on the BioCarta description. Selecting the "View Diagram" option will display this diagram within the "Pathway" tab.
  • View Diagram on BioCarta site - View the BioCarta diagram in your browser directly on the BioCarta website.
  • Add pathway genes to set: extracts the pathway genes for which there are associated probes in the microarray set currently selected by the user and places all such probes in a new marker set within the "Markers" component (by default, the marker set is named after the pathway).
  • Export genes to CSV: creates a new text file containing a listing of all pathway genes. The file format (csv = comma separated values) is compatible with Microsoft Excel.

The export file contains the following columns:

  • Marker
  • Gene
  • Entrez GeneId
  • Pathway
  • Entrez URL
  • CGAP URL
  • GeneCards URL


MarkerAnnotations h erkPathway.png


A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab.

  • pathway dropdown box - The drop-down box, on the top left corner above the diagram, shows the name of the currently displayed diagram. The component keeps a history of all BioCarta diagrams selected by the user; using the drop-down it is possible to switch among the corresponding pathway images.
  • Clear Diagram - this button clears the currently displayed diagram.
  • Clear History - this button clears the currently displayed diagram and removes all pathway history information from the pathway name drop-down box.
  • Image Snapshot - save a snapshot of the diagram to the Workspace.

References

Mudunuri,U., Che,A., Yi,M. and Stephens,R.M. (2009) bioDBnet: the biological database network. Bioinformatics, 25, 555-556. link to paper


BioCarta

Biocarta Terms of Use