Difference between revisions of "Cellular Networks KnowledgeBase"

(Cytoscape Viewer)
Line 1: Line 1:
 
{{TutorialsTopNav}}
 
{{TutorialsTopNav}}
 
  
 
__TOC__
 
__TOC__
  
  
==Outline==
+
=Overview=
  
This tutorial contains
+
==General==
# an overview of the Cellular Networks KnowledgeBase (CNKB),
+
The Cellular Network Knowledge Base (CNKB) is a repository of interactions between protein-protein and protein-DNA interactions (these interactions can be either computationally or experimentally derived). Both direct, physical interactions can be captured as well as indirect transcriptional relationships (where an interaction is between a transcription factor and its gene target). Each pairwise interaction has an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational.  Details on the methodology used to construct the CNKB are available in [http://www.ncbi.nlm.nih.gov/pubmed/18277385?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVBrief Mani et al. 2008].  
# A detailed description of the various components of the Graphical User Interface of the CNKB,
 
# A brief example of how a network of interactions can be generated and viewed.
 
  
 +
Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.
  
==Overview==
+
The CNKB component allows the user to select a group of markers and specify for each the interaction type of interest (Protein-Protein and/or Protein-DNA). All interactions (of the designated types) involving the selected markers are retrieved from the CNKB and displayed (along with associated information such as GO annotation, interaction attributes, etc) both in the CNKB component and also in Cytoscape.
  
 +
==The CNKB graphical interface==
  
The Cellular Network Knowledge Base (CNKB) is a repository of interactions between protein-protein and protein-DNA interactions (these interactions can be either computationally or experimentally derived). Both direct, physical interactions can be captured as well as indirect transcriptional relationships (where an interaction is between a transcription factor and its gene target). Each pairwise interaction has an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational.
+
The CNKB component displays a list of markers that have been activated in the Markers component. The user can select from these markers by double-clicking them; those selected will be added to the Selected Marker List just below.
  
This component allows the user to select a group of markers and specify for each the interaction type of interest (Protein-Protein and/or Protein-DNA). All interactions (of the designated types) involving the selected markers are retrieved from the CNKB and displayed (along with associated information such as GO annotation, interaction attributes, etc) both in the CNKB component and also in Cytoscape.
+
The graphical user interface (GUI) of the CNKB component has several areas of distinct functionality (marked as 1, 2, 3 in the figure below). To use the component a microarray set node must first be selected in the Project Folders area of geWorkbench and then one or more marker sets from the Markers component must be activated. The markers in those activated marker sets will appear in the "Activated Marker List" area of the the CNKB GUI. From there, one or more markers can be selected and moved into the "Selected Marker List" (markers can be selected one at a time by double-clicking; or they can be selected as a group and moved by right-clicking on the selection). Communication with the CNKB database is initiated by clicking on the "Refresh" button and all pairwise interactions that involve genes represented by a marker in the "Select Marker List" are retrieved (an interaction is retrieved if at least one of its 2 members is in the "Selected Marker List"). There are 2 types of interactions stored in the CNKB: Protein-Protein and Protein-Dna (the latter indicating transcriptional relationships between transcription factors and their target). The aggregate number (across all genes) of retrieved interactions is displayed on the Throttle Graph. Three distinct graphs are drawn, showing the aggregate for each of the two interaction categories as well as their combined total.  
  
==GUI==
+
The interactions can be assembled into a network and visualized in the Cytoscape component. This is achieved by clicking on the "Create Network" button. The x-axis slider in the Throttle graph can be used to threshold which interactions will be included in the Cytoscape network; by moving the slider to any given threshold value from 0-1, only interactions whose confidence level is above the threshold will be retained. Further, it is possible to include/exclude interactions that belong to one or the other type. To that end, the checkboxes in the "Selected Marker List" can be used.
  
The CNKB component displays a list of markers that have been activated in the Markers component.  The user can select from these markers by double-clicking them; those selected will be added to the Selected Marker List just below.
+
=Working with the CNKB graphical interface=
 +
==Activated Marker List==
  
===Activated Marker List===
+
The "Activated Marker List" contains the markers that belong to activated marker sets from the Markers component. It contains 3 columns:
  
This table contains three columns: Marker, Gene (the gene name corresponding to the marker, if available), and Gene Type. The possible values in the Gene Type column are
+
===Marker===
 +
The marker name (comes from the microarray set node selected in the Project Folders area).
 +
===Gene===
 +
The gene name corresponding to the marker, if known.
 +
===Type===
 +
A gene type designation, derived from the gene's GO annotation:
 
# TF - Transcription Factor,
 
# TF - Transcription Factor,
 
# K  - Kinase,
 
# K  - Kinase,
Line 32: Line 37:
 
# (no entry) - type is unknown.
 
# (no entry) - type is unknown.
  
These gene types are derived from the GO annotation associated with a gene name.
 
  
 
===Selected Marker List===
 
===Selected Marker List===
Line 160: Line 164:
 
* 27,938 interactions  
 
* 27,938 interactions  
 
* GeneWays is a system for automatically extracting, analzying, visualizing and integrating molecular pathway data from the research literature.
 
* GeneWays is a system for automatically extracting, analzying, visualizing and integrating molecular pathway data from the research literature.
 +
 +
=References=
 +
 +
* Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A., " A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas", Molecular Systems Biology 4:169, 2008  [http://www.ncbi.nlm.nih.gov/pubmed/18277385?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVBrief link to paper]

Revision as of 12:32, 9 October 2009

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

General

The Cellular Network Knowledge Base (CNKB) is a repository of interactions between protein-protein and protein-DNA interactions (these interactions can be either computationally or experimentally derived). Both direct, physical interactions can be captured as well as indirect transcriptional relationships (where an interaction is between a transcription factor and its gene target). Each pairwise interaction has an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational. Details on the methodology used to construct the CNKB are available in Mani et al. 2008.

Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.

The CNKB component allows the user to select a group of markers and specify for each the interaction type of interest (Protein-Protein and/or Protein-DNA). All interactions (of the designated types) involving the selected markers are retrieved from the CNKB and displayed (along with associated information such as GO annotation, interaction attributes, etc) both in the CNKB component and also in Cytoscape.

The CNKB graphical interface

The CNKB component displays a list of markers that have been activated in the Markers component. The user can select from these markers by double-clicking them; those selected will be added to the Selected Marker List just below.

The graphical user interface (GUI) of the CNKB component has several areas of distinct functionality (marked as 1, 2, 3 in the figure below). To use the component a microarray set node must first be selected in the Project Folders area of geWorkbench and then one or more marker sets from the Markers component must be activated. The markers in those activated marker sets will appear in the "Activated Marker List" area of the the CNKB GUI. From there, one or more markers can be selected and moved into the "Selected Marker List" (markers can be selected one at a time by double-clicking; or they can be selected as a group and moved by right-clicking on the selection). Communication with the CNKB database is initiated by clicking on the "Refresh" button and all pairwise interactions that involve genes represented by a marker in the "Select Marker List" are retrieved (an interaction is retrieved if at least one of its 2 members is in the "Selected Marker List"). There are 2 types of interactions stored in the CNKB: Protein-Protein and Protein-Dna (the latter indicating transcriptional relationships between transcription factors and their target). The aggregate number (across all genes) of retrieved interactions is displayed on the Throttle Graph. Three distinct graphs are drawn, showing the aggregate for each of the two interaction categories as well as their combined total.

The interactions can be assembled into a network and visualized in the Cytoscape component. This is achieved by clicking on the "Create Network" button. The x-axis slider in the Throttle graph can be used to threshold which interactions will be included in the Cytoscape network; by moving the slider to any given threshold value from 0-1, only interactions whose confidence level is above the threshold will be retained. Further, it is possible to include/exclude interactions that belong to one or the other type. To that end, the checkboxes in the "Selected Marker List" can be used.

Working with the CNKB graphical interface

Activated Marker List

The "Activated Marker List" contains the markers that belong to activated marker sets from the Markers component. It contains 3 columns:

Marker

The marker name (comes from the microarray set node selected in the Project Folders area).

Gene

The gene name corresponding to the marker, if known.

Type

A gene type designation, derived from the gene's GO annotation:

  1. TF - Transcription Factor,
  2. K - Kinase,
  3. P - Phosphatase, and
  4. (no entry) - type is unknown.


Selected Marker List

The Selected Markers List table contains markers moved over from the Activated Markers table. In addition to the columns defined for the Activated Marker List, the Selected Marker List adds the following:

  1. Entrez Id: The Entrez ID of the gene name. When present, the Entrez ID is hyperlinked to the relevant entry within Entrez Gene .
  2. GO Term: Associated GO Term. Users can browse the GO terms associated with a gene (using an AmiGO browser-like interface) and select one of those terms for display.
  3. Prot-Prot #: number of protein-protein interactions (reported in the CNKB) involving the gene.
  4. Prot-DNA #: number of protein-DNA interactions (reported in the CNKB) involving the gene.

Additionally, each row has check-boxes that can be used to designate whether protein-protein, protein-DNA, or both types of interactions should be used.

Items that have been added to the Selected Markers list are initially shown using an italic font, to indicate the CKNB has not yet been queried to retrieve their interaction information. After the query has been completed they are displayed in a normal font. (Red is used in both cases).

Items that have been added to the Selected Markers list can be removed and sent back to the Activated Markers list by double clicking on their entries.

Viewing GO Annotations

GO annotations can be viewed for desired genes in the Selected Marker list. Right click on the entry under the GO Description column. A pop-up menu will offer the choice of the three categories of GO annotation: Component, Function and Process. Expanding one of these terms will show the available annotations for that gene.

Preferences

Using a “preferences”-type tab, the user can control how many of the columns listed above are visible within the Selected Markers table.

Throttle Graph

This interactive graph allows users to "throttle" (for the genes in the Selected Markers table) which interactions to work with, using as a criterion the interactions’ confidence indicator. As the required threshold of likelihood of the interactions is increased, the sum of interactions meeting this criterion decreases.

Cytoscape Viewer

The network created using the selected interactions can be displayed in the Cytoscape component.

Please note that if the targets retrieved from the Knowledge Base include genes not present in the active microarray dataset, then those targets will not be displayed in the Cytoscape viewer.


The Cytoscape Viewer maintains a list of networks which it has currently loaded (See image after Example step 8 below). It allows individual loaded networks to be deleted. However, the network can be reloaded by clicking on its entry in the Project Folders component. Cytoscape controls are more fully described in the Cytoscape component tutorial.

Network Creation from Retrieved Interactions

Once the desired set of markers is present and its interation data has been retrieved from the database, an adjacency matrix can be computed by clicking the "Create Network" button. The resulting matrix is placed in the Project Folders component under its parent microarray dataset. The adjacency matrix is visualized in the Cytoscape Viewer.


Example

1. Select markers in the Markers component to add to a marker set, and activate the set(s). Here we show two small sets of markers, set1 and Set2, which have both been activated. Their constituent markers will thus appear in the CNKB component Activated Markers list as well. Sets of markers may also have been created through many other tools in geWorkbench, such as hierarchical clustering.


Tutorial-CNKB-Markers.png


2. Move markers from the Activated Markers list to the Selected Marker List by double clicking on them.


Tutorial-CNKB-ActivatedMarkerList.png


3. (Markers in the Selected Marker List can be removed by double-clicking on them, sending them back to the Activated Marker list).


4. Mark the check-boxes for the type of interactions desired (Protein-Protein, Protein-DNA).

5. Hit the refresh button to query the database for interaction information for these markers.

Tutorial-CNKB-SelectedMarkerList.png


6. If desired, adjust the throttle graph to limit the interactions to be used in building an adjacency matrix based on the assigned confidence values. Here, we have selected a cutoff of 0.12 based on inspection of the distribution of confidence values.

Tutorial-CNKB-ThrottleGraph.png


The component as a whole is depicted below. Tutorial-CNKB-AfterRetrieval.png


7. Hit the Create Network button.

8. The resulting adjacency matrix is displayed in the Cytoscape component.

Tutorial-CNKB-Cytoscape.png

Appendix - Data Sources

Cellular Network Knowledge Base (CNKB) Source Descriptions & Interaction Statistics

CNKB for geWorkbench queries

The CNKB itself is comprised of data from sources shown in the following sections. However, geWorkbench only queries for two types of interactions, protein-protein and protein-dna, coded in the database as "ppi" and "pdi".

As of 7/1/2009, the following datasources will match these geWorkbench queries:

  • PPI: BIND, HPRD, INTACT, INTERACTOME, MIPS.
  • PDI: BIND, INTERACTOME.

B-cell lymphoma Interactome (INTERACTOME) (Della Favera/Califano labs, collected by Celine Lefebvre)

  • 12,902 protein-dna interactions
  • 22,734 protein-protein interactions

Munich Information Center for Protein Sequences (MIPS)

Protein-protein interaction database at EBI (INTACT). No species restriction.

The Molecular INTeraction database (MINT)

The Biomolecular Interaction Network Database (BIND)

Database of Interacting Proteins (DIP)

Reactome knowledgebase (REACTOME)

Human Protein Reference Database (HPRD)

Geneways (GENEWAYS)

References

  • Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A., " A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas", Molecular Systems Biology 4:169, 2008 link to paper