Difference between revisions of "Cellular Networks KnowledgeBase"

m (General)
Line 9: Line 9:
 
The Cellular Network Knowledge Base (CNKB) is a repository of molecular interactions, including those both computationally and experimentally derived.  Sources for interactions include both publicly available databases such as BioGRID and HPRD, as well as reverse-engineered cellular regulatory interactomes developed in the lab of Dr. Andrea Califano at Columbia University.
 
The Cellular Network Knowledge Base (CNKB) is a repository of molecular interactions, including those both computationally and experimentally derived.  Sources for interactions include both publicly available databases such as BioGRID and HPRD, as well as reverse-engineered cellular regulatory interactomes developed in the lab of Dr. Andrea Califano at Columbia University.
  
Each pairwise interaction may have an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational.  Details on the methodology used to construct the CNKB are available in [http://www.ncbi.nlm.nih.gov/pubmed/18277385?ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVBrief Mani et al. 2008].  
+
Each pairwise interaction may have an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational.  Details on the methodology used to construct the CNKB are available in [http://www.ncbi.nlm.nih.gov/pubmed/18277385 Mani et al. 2008].  
  
 
Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.
 
Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.

Revision as of 10:50, 24 June 2010

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



Overview

This material describes the graphical interface and functionality of the CNKB component released with geWorkbench 2.0.0. This component has been extensively enhanced from the version available in geWorkbench 1.8 and earlier.

General

The Cellular Network Knowledge Base (CNKB) is a repository of molecular interactions, including those both computationally and experimentally derived. Sources for interactions include both publicly available databases such as BioGRID and HPRD, as well as reverse-engineered cellular regulatory interactomes developed in the lab of Dr. Andrea Califano at Columbia University.

Each pairwise interaction may have an associated confidence indicator (a value between 0 and 1) reflecting the strength of the underlying data, whether experimental or computational. Details on the methodology used to construct the CNKB are available in Mani et al. 2008.

Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.

The CNKB component allows the user to select a group of markers of interest, specify the interaction types (e.g. Protein-Protein, Protein-DNA etc.), and chose a particular interaction data source. After a query of the Knowledge Base, results are displayed both in the CNKB component and as interaction graphs in the Cytoscape component.

The CNKB graphical interface

CNKB with markers.png


The CNKB component displays a list of markers that have been activated in the Markers component. From this list, the user can select markers for use in querying the CNKB database. This can be done either by double-clicking on desired markers, or through use of a right-click menu (see below); those selected will be added to the Selected Marker List just below.

The graphical user interface (GUI) of the CNKB component has three areas of distinct functionality: The "Activated Marker List", the "Selected Marker List", and the "Throttle Graph". To use the component a microarray set node must first be selected in the Project Folders area of geWorkbench and then one or more marker sets from the Markers component must be activated. The markers in those activated marker sets will appear in the "Activated Marker List" area of the the CNKB GUI. From there, one or more markers can be selected and moved into the "Selected Marker List" (markers can be selected one at a time by double-clicking; or they can be selected as a group and moved by right-clicking on the selection). Communication with the CNKB database is initiated by clicking on the "Refresh" button and all pairwise interactions that involve genes represented by a marker in the "Select Marker List" are retrieved (an interaction is retrieved if at least one of its 2 members is in the "Selected Marker List"). The aggregate number (across all genes) of retrieved interactions is displayed on the Throttle Graph.

The interactions can be assembled into a network and visualized in the Cytoscape component. This is achieved by clicking on the "Create Network" button. The x-axis slider in the Throttle graph can be used to threshold which interactions will be included in the Cytoscape network; by moving the slider to any given threshold value from 0-1, only interactions whose confidence level is above the threshold will be retained. Further, it is possible to include/exclude interactions that belong to one or the other type. To that end, the checkboxes in the "Selected Marker List" can be used.

Working with the CNKB graphical interface

The figures in this section correspond to the example in the final section of this tutorial.

Activated Marker List

CNKB add all markers.png


The "Activated Marker List" contains the markers that belong to activated marker sets from the Markers component. It contains 3 columns:

Marker

The marker name (comes from the microarray set node selected in the Project Folders area).

Gene

The gene name corresponding to the marker, if loaded from an annotation file.

Type

A gene type designation, derived from the gene's GO annotation:

  • TF - Transcription Factor,
  • K - Kinase,
  • P - Phosphatase, and
  • (no entry) - type is unknown.

Individual markers in the Activated Marker List can be moved to the Selected Markers List by double-clicking on them.

Alternatively, a right-click menu (shown above) allow a group of highlighted markers, or all markers, to be moved to the "Selected Markers List".

Selected Marker List

Main

CNKB markers transfered.png


Markers which have been moved from the Activated Markers List to the Selected Markers List can be used to query the CNKB database. Until a query is run, the list items are shown in red and in italics.

After a query has been run against the CNKB database, the marker entries are shown in regular font, blue letters.


CNKB query result.png


Items that have been added to the Selected Markers list can be removed and sent back to the Activated Markers list by double clicking on their entries or through a right-click menu. The right-click menu (shown below) gives the choice of moving only highlighted, or all markers back to the Activated Markers list.


CNKB Selected Marker List Right click.png


The Selected Marker List table contains the following columns:


Marker

The marker name (same as in the "Activated Marker List"). Appears italicized to indicate that interaction information has not yet been retrieved. Bold face font indicates that interaction information has been retrieved (and is displayed).

Gene

The gene name corresponding to the marker, if loaded from an annotation file (same as in the "Activated Marker List").

Gene Type

A gene type designation, derived from the gene's GO annotation The list of type codes is the same as that under Type under "Activated Marker List".

GO Annotation

The GO annotation of the gene. Right clicking on the column brings up the the gene GO classification of the gene across the 3 top-level GO categories: Component, Function and Process.

Hovering the mouse cursor over an entry in the GO Description column will display a short summary of the Gene Ontology terms associated with that entry.

CNKB Selected Marker List GO Anno.png


More extensive GO annotations can be viewed for desired genes in the Selected Marker list by right-clicking on its entry. A pop-up menu will offer a choice of the three categories of GO annotation: Component, Function and Process. Expanding one of these terms will show the available annotations for that gene.


CNKB Selected Marker List GO Anno RC.png


Interaction Query columns

A column will display interaction results for each data source selected in the Preferences tab. It will show, for each marker, the number of interactions that meet the threshold currently set in the throttle graph slider control.

The illustrations above depict a Protein-DNA interaction column.

Preferences

CNKB Preferences.png


Interactions Database

Change - This button allows the address of the CNKB servlet to be changed. This should not normally be necessary.

Database pulldown menu - This menu contains all data sources available via the CNKB component.

Select Version - Each data source has a version associated with it. Versions may be updates or may contain different types of interactions from a particular system. The exact contents of each data source are available on the CNKB data sources page.


CNKB Selected Interactions Types.png

Column Display Preferences

  • Marker
  • Gene
  • Gene Type
  • GO Annotation
  • Available Interaction Types - Contains a list of all interaction types used in the CNKB component that have not been already moved to the "Selected Interaction Types" list to the right. Note that this list is not specific to the particular data source chosen.
  • Selected Interaction Types - A column in the Main tab "Selected Marker List" will appear for each interaction type appearing in this list.

List entries can be moved between the "Available" and "Selected" lists by either double-clicking directly on an entry, or through use of the right and left double arrows "<<", ">>" located between them.



Network Generation Preferences

  • Restrict to genes present in microarray set - queries to the CNKB database may return interaction partners that are not members of the microarray dataset from which the original query markers were chosen. Checking this box will cause such markers NOT to be used in generating a network graph.
  • Use setting from column display preferences
    • If checked, the same interaction types that have been selected in the "Column Display Preferences" control above will be used in generating the network graph.
    • If unchecked, only those interaction types added to the "Selected Interaction Types" list will be used to construct a network graph.


Throttle Graph

This interactive graph allows users to "throttle" (for the genes in the Selected Markers table) which interactions to work with, using as a criterion the interactions’ confidence indicator. As the required threshold of likelihood of the interactions is increased, the sum of interactions meeting this criterion decreases.


CNKB throttle slider.png


  • Right-click menu - Right-clicking on the Throttle Graph will bring up a menu which allows the graph to be customized.


CNKB throttle graph RC.png


  • Properties -
  • Save as -
  • Print -
  • Zoom in -
  • Zoom out -
  • Auto Range -

Example - Creating an interaction network and viewing it in Cytoscape

Once the desired set of markers is present and its interaction data has been retrieved from the database, an adjacency matrix can be computed by clicking the "Create Network" button. The resulting matrix is placed in the Project Folders component under its parent microarray dataset. The adjacency matrix is visualized in the Cytoscape Viewer.

Please note that if the targets retrieved from the Knowledge Base include markers/genes not present in the active microarray dataset, then whether those markers will be used in creating the network graph in Cytoscape is determined by the setting of the "Restrict to genes present in microarray set" checkbox under "Network Generation Preferences" on the CNKB Preferences tab.

The Cytoscape Viewer maintains a list of networks which it has currently loaded. It allows individual loaded networks to be deleted. However, the network can be reloaded by clicking on its entry in the Project Folders component. Cytoscape controls are more fully described in the Cytoscape component tutorial.


Prerequisites

  • This example uses the Bcell-100.exp dataset available in the data/public_data directory of geWorkbench, and further described on the Download page. Briefly, this dataset is composed of 100 Affymetrix HG-U95Av2 arrays on which various B-cell lines, both normal and cancerous, were analyzed. Thus it explores a potentially wide variety of expression phenotypes.
  • Obtain the annotation file for the HG-U95Av2 array type from the Affymetrix NetAffx website (http://www.affymetrix.com/analysis/index.affx). The name will be similar to "HG_U95Av2.na30.annot.csv", where na30 is the version number. Loading the annotation file associates gene names and Gene Ontology information with the Affymetrix probeset IDs (see the geWorkbench FAQ for details on obtaining these files).

Loading the example data

  1. Load the Bcell-100.exp dataset into geWorkbench as type "Affymetrix File Matrix". (See Local Data Files).
  2. When prompted, load the annotation file.
  3. Create and activate a set of markers in the Markers component. For this example, save the file GBM_MR_Markers to disk and then load it directly into the Markers component by pushing its "Load Set" button and browsing to the file located in the geWorkbench directory data/public_data.


CNKB marker setup.png


When the set is activated by checking the box to the left of its name, the transcription factor markers will appear in the "Activate Markers" list in the CNKB component.

Setting up the query in the CNKB component

1. On the CNKB Main tab, right-clicking on the "Activated Markers" list will bring up a menu which allows one to move all activated markers or just the highlighted markers to the "Selected Markers List" for querying.

2. Select the desired data source and interaction types in the CNKB Preferences tab.

3. Now hit the "Refresh" button to perform the query against the Cellular Networks Knowledge Base database.

4. Adjust the Throttle Graph allows to set a minimum confidence requirement on interactions that will be used to create a network. In the example images above, a value of 0.22 was used.

5. Hit the "Create Network" button.

6. The resulting adjacency matrix is displayed in the Cytoscape component.



CNKB Cytoscape display.png


Integration of Cytoscape and geWorkbench

The use of Cytoscape for network visualization is covered in detail in the tutorial Tutorial_-_Cytoscape_Network_Viewer. Here we show a few ways in which the Cytoscape component can be used to investigate the interaction results.

Selecting interactions (edges)

Using the mouse, a group of edges can be selected.


CNKB Cytoscape select edges.png

The list of selected edges is displayed in a list below the graph.


CNKB Cytoscape edges selected.png


Selecting nodes

Multiple nodes/genes can be selected by holding down the Shift key while left-clicking on individual nodes.

CNKB Cytoscape intersection.png


The markers corresponding to the selected genes will be displayed directly in the Markers component in a new subset called "Cytoscape selection". Note that this set is volatile - it displays markers corresponding to whatever nodes are currently highlighted in the network graph.

CNKB Cytoscape two genes markers.png

To make a copy of the markers in the Cytoscape Selection subset, right-click on it and select "Copy".

Options for selected nodes

Right-clicking on a particular node in the network graph brings up a menu with three options:

  • Visual Mapping Bypass
  • LinkOut
  • Add to set

Visual Mapping Bypass

CNKB Cytoscape visual mapping bypass.png


LinkOut

CNKB Cytoscape LinkOut.png


This menu option provides hyperlinks to a number of external sources of gene annotation.

Add to set

If one or more graph nodes have been selected (highlighted in yellow in figure below), the markers they directly interact with (via edges) can be copied to the default "Cytoscape selection" subset in the Markers component at lower left in the geWorkbench graphical interface.


CNKB Cytoscape add to set.png


Two options are available under "Add to Set". These are

  • Intersection - find the set of markers that have interactions (edges) with ALL selected nodes. Such markers are placed into the Markers component, in the "Cytoscape selection" subset.
  • Union - find the set of markers that have interactions (edges) with ANY of the selected nodes. Such markers are placed into the Markers component, in the "Cytoscape selection" subset.


This image shows the intersection set of markers for the two selected genes:

CNKB Cytoscape intersection markers.png


This image shows the union set of markers for the two selected genes.


CNKB Cytoscape union markers list.png

Appendix - Data Sources

Please see the CNKB data page for a list of currently available data sources and types of interactions.

References

  • Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A., " A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas", Molecular Systems Biology 4:169, 2008 link to paper