Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Project Folders | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Viewing a Microarray Dataset | Filtering | Normalization | Tutorial Data
Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | Classification | Color Mosaic | Cytoscape | Differential Expression (t-test) | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | Volcano Plot
The Marker Annotations component enables the retrieval of biological annotation information for a collection of genes. For every gene, the following data can be retrieved:
- Links to gene detail pages.
- A set of pathways containing the gene.
- A set of gene-disease and gene-compound associations derived from the literature articles.
All annotations are retrieved from remote servers maintained by the National Cancer Institute (NCI) using Cancer Bioinformatics Infrastructure Objects (caBIO) services. The data come from the following sources:
- Pathways: NCI's Pathway Interaction Database (PID). PID pathways come from 3 sources: BioCarta, Reactome and "NCI-Nature Curated". Information about the PID and each of the contributing sources is available at: http://pid.nci.nih.gov/userguide/database_content.shtml.
- Gene-disease/compound associations: the Cancer Gene Index (CGI) data base. The reported associations are extracted from article abstracts using a combination of automatic text mining, semi-automatic verification, and manual curation. Project details are available at: https://wiki.nci.nih.gov/display/cageneindex/Creation+of+the+Cancer+Gene+Index.
The Marker Annotations module will retrieve information for all markers that belong to activated marker sets, or, more precisely, for the genes corresponding to those markers:
E.g., in the example shown above, information will be retrieved about the genes AATF, CD40, STAT3. Checkboxes at the bottom of the component's user interface can be used to specify which data source(s) to query: CGAP, CGI or both. For CGAP, the associated drop-down can be used to designate the target organism for which annotations are retrieved: human (the default) or mouse. Clicking the "Retrieve Annotations" button initiates the communication with the NCI servers:
While the information is being retrieved, progress indicators will be shown for either CGAP, CGI, or both query types, as appropriate.
The following image is taken from a query on STAT3.
Pathway and Gene Annotations
The "Annotations" tab presents a summary listing of the annotations retrieved from CGAP:
The listing contains at least one row for each gene which annotation information is available for. If a gene is associated with more than one pathways, then one row for every pathway is listed (e.g., as is the case above for CD40 and STAT3). Every row displays the marker (i.e., probeset) id, the corresponding gene name and the name of the associated pathway.
Right-clicking on a gene name shows the following menu options. Each will link to the named data source to display gene information in your browser:
- Go to Entrez for gene name
- Go to CGAP for gene name
- Go to GeneCards for gene name
Clicking on a pathway brings up a popup menu offering a number options:
- View Diagram: available only for BioCarta pathways. Such pathways are accompanied by images offering a graphical/artistic rendition of the pathway. Selecting the "View Diagram" option will display this image within the "Pathway" tab.
- Add pathway genes to set: extracts the pathway genes for which there are associated probes in the microarray set currently selected by the user and places all such probes in a new marker set within the "Markers" component (by default, the marker set is named after the pathway).
- Export genes to CSV: creates a new text file containing a listing of all pathway genes. The file format (csv = comma separated values) is compatible with Microsoft Excel.
The export file contains the following columns:
- Entrez GeneId
- Entrez URL
- CGAP URL
- GeneCards URL
A BioCarta pathway image is displayed above after selecting the "View Diagram" option from the "Annotations" tab. The drop-down box, on the top left corner above the diagram, shows the name of the currently displayed diagram. The component keeps a history of all BioCarta diagrams selected by the user; using the drop-down it is possible to switch among the corresponding pathway images. The "Clear Diagram" button clears the currently displayed diagram. The "Clear History" button both clears the currently displayed diagram and removes all pathway history information from the pathway name drop-down box.
Cancer Gene Index
For many genes, there are hundreds of records in the CGI database. Retrieving all those records at once can be a very time consuming operation, especially if the query involves many genes. To avoid very long waits, the retrieval of the data occurs in 2 stages. In the first stage, at most 10 records for each association type (gene-disease/gene-compound) are being fetched (for each query gene). Data retrieved as displayed in the CancerGeneIndex tab.
Note that cells that appear empty (...) may have information when expanded. These "hidden" entries will appear in the filtering controls above them, even when "collapsed".
The user interface is divided in two regions, sharing the same overall structure and functionality: the table on the left displays gene-disease associations, while the table on the right is used for reporting gene-compound associations. To avoid redundancy, the paragraphs that follow describe only the gene-disease table. The exact same description applies in the case of the gene-compound table.
Each table row represent an association between a query gene and a disease. The first column contains the name of the probeset associated with the query gene. For any given gene, if its corresponding probeset name appears in bold-face, this means that there are more gene-disease records on the CGI server that have yet to be fetched (beyond the 10 records acquired in the initial retrieval stage). The disease name within a row is followed by a number in parentheses. This number indicates how many records (among those fetched) support the reported association. E.g, in the image above, the first row indicates that there are 6 distinct records linking STAT3 to melanoma. A detailed listing of these 6 records is available by "expanding" the row. This can be achieved by right-clicking on the row and selecting "expand" from the ensuing popup menu:
Each detailed record contains 3 additional pieces of information:
- Role: a curator-assigned description of the kind of association being reported. The values in this column come from a controlled vocabulary (developed to support the CGI database creation effort).
- Sentence: the actual article abstract sentence used to derive the reported gene-disease association. The full sentence is displayed in the text area at the bottom portion of the interface (it is also available as a tool tip text, by mousing over the "Sentence" column).
- Pubmed: the Pubmed ID of the source article. Clicking on the Pubmed link brings up in the web browser the corresponding Pubmed abstract page (in this example, the sentence used for deriving the gene-disease association is actually the paper title):
It is also possible to link out to the NCI thesaurus in order to see the definition of the disease being associated with a gene; this is achieved by right-clicking on the table row for the gene-disease association and selecting the "Link to NCI_Thesaurus" option from the popup:
It should also be noted that the user interface allows ordering and filtering of the data (the latter can be very useful if there are many records being displayed):
- Ordering: the table rows can be sorted alphabetically by the contents of any column, by clicking on a column heading.
- Filtering: the drop down boxes that appear above the columns "Marker", "Gene", "Disease", and "Role" contain one value for each distinct entry within those columns. They can be used to select only records that contain the designated values. Of note is the drop-down associated with the "Disease" column:
The parentheses next to a disease name indicate how many distinct genes (among those included in the user query) are associated with this particular disease.
It should be noted that these numbers are calculated using only fetched records. As mentioned above, the first stage of the information retrieval will fetch at most 10 records per gene. The remaining records associated with a gene can be retrieved from the CGI server by right-clicking on a table row corresponding to the gene and by selecting the popup menu option "retrieve" all. After all gene-disease association records for a given gene have been fetched, the bold-face type of its associated probeset name is removed:
Finally, by clicking on the "Export" button at the bottom of the user interface, the contents of the gene-disease and the gene-compound tables displayed within the CancerGeneIndex tab can be exported as comma separated values text files for further analysis or/and visualization by spreadsheet software.
- Marker ID
- Gene Symbol
- Gene-Agent Relation
- Abstract Sentence