GeWorkbench-web/Cellular Networks KnowledgeBase tmp

Home | Overview | Set View | File Formats | Desktop Tutorials

ANOVA | ARACNe | Cellular Networks KnowledgeBase | Gene Ontology | Hierarchical Clustering | MarkUs | msViper | T-Test


Overview

The Cellular Network Knowledge Base (CNKB) is a repository of molecular interactions, including ones both computationally and experimentally derived. Sources for interactions include both publicly available databases such as BIND, MINT, and Reactome, as well as reverse-engineered cellular context-specific regulatory interactomes developed in the lab of Dr. Andrea Califano at Columbia University.

Each pairwise interaction may have an associated likelihood indicator (a value between 0 and 1) or another dataset-specific metric reflecting the strength of the underlying data, whether experimental or computational. Details on the methodology used to construct the CNKB are available in Mani et al. 2008.

Gene interaction information from the CNKB can be used, for example, in order to assess the plausibility of a hypothesis of concerted molecular action represented by a gene set that has been discovered using computational approaches (e.g., by running a clustering analysis on a microarray set to identify tandems of co-expressed genes). If the genes in such a set are reported in the CNKB to have several direct interactions (or several common targets) then this may be evidence that the gene set indeed reflects at some level a real biological process.

The CNKB component in geWorkbench-web allows the user to select one or marker sets on which to query, plus an interactome and version. Results are placed in a new data node of type "CNKB" under the parent microarray dataset. Selecting the "CNKB" data node will display the query results in tabular and graphical form. The threshold value for including retrieved interactions in a generated network or export file can be adjusted using the threshold slider in the "throttle graph". Finally, small networks can be displayed in Cytoscape, or the network can be exported to a file.

Data Sources

Please see the CNKB data page for a list of currently available data sources and types of interactions.


Prerequisites

  • A microarray dataset must be loaded and selected.
  • Both queries against the CNKB database, and display of gene annotation information require that an annotation file be associated with the microarray dataset at the time that it is loaded. See Local Data Files and File Formats for further information.

The CNKB Query Interface

Usage

The CNKB component appears in the list of available microarray data analysis modules when a dataset of that type has been selected in the Workspace.


CNKB web GUI v2.png


A query against the CNKB database is initiated by clicking on the "Submit" button. All pairwise interactions in the chosen interactome/version that involve any marker in the "Selected Marker Sets" are retrieved.

Details

Marker Context

If additional marker set contexts have been created to hold various sets of markers, the desired context can be selected here.

Select Marker Sets

Choose one or more marker sets with which to query the CNKB.

Select Interactome and Version

This list shows all interactomes available in the CNKB. The number of interactions present in each is shown in parentheses after the name. See also the CNKB data sources page. If multiple versions of the interactome are available, each is presented as a separate entry.

Some interactomes/versions may not yet be public. Pre-release data is password protected and not yet available for public use.

Example Query

  • Load the Bcell-100 example dataset. Here we are using the Log2 normalized version.
  • Create a marker set (using the right-click filter function in the set view) containing two genes, MYB1 and FOXM1.


CNKB filter marker set.png


These two genes were identified as B-cell master regulators of proliferation in germinal centers (Lefevre et al., 2010). Here we have labeled the new marker set "gc".

CNKB web markerset.png


  • Select the "gc" marker set.
  • Select the "BCi" Bcell interactome (version 1.0).
  • Hit "Submit"

CNKB web query setup v2.png


The query results are place in a new "CNKB" node in the Workspace.

Selecting the CNKB result node displays the results in the CNKB Viewer.

The CNKB Results Viewer

The full query results for the two genes are shown below.


CNKB web result BCi.png


The threshold for interactions to be accepted can be adjusted using the "throttle". Below, the threshold (for this interactome it is probability) has been increased from 0.0 to 0.7 to show the detail of the lines more clearly. Increasing the threshold decreases the number of hits displayed in each cell of the tabular listing - only hits at or above the threshold value are displayed.

CNKB web result BCi 0.7.png


Tabular Display

The tabular display has columns for marker, gene, gene type, GO annotations, and for each interaction type present in the result. Common interaction types in CNKB data include protein-protein, protein-dna, and modulator-transcription factor.

Marker

The marker (probeset) name.

Gene

The gene name corresponding to the marker, from the array annotation file.

Gene Type

A gene type designation, derived from the gene's GO annotation, from the array annotation file:

  • TF - Transcription Factor,
  • K - Kinase,
  • P - Phosphatase, and
  • (no entry) - type is unknown.

Interactome

The CNKB Viewer has been enhanced to support queries against multiple interactomes (once compatible interactomes have been released). The interactome column shows in which interactome the listed interactions were found.

GO Annotation

The Gene Ontology (GO) annotation for the gene, keyed off of GO terms for each marker found in the microarray annotation file. The term descriptions originate in a copy of the gene ontology file "go-basic.obo" downloaded to the geWorkbench-web server. The column displays the Biological Process annotations, however, there may be many more annotations than can be displayed in the available space. Hovering the mouse cursor over the field will display the remaining entries.

CNKB web hovertext.png


Interaction Type Result columns

A separate column will appear in the Selected Markers display for each interaction type represented in the query results. Types of interactions include Protein-Protein, Protein-DNA (e.g. TF interaction with its target genes), and modulator-TF. The numbers in the columns indicate, for each marker, the number of interactions returned by the query.

The number of interactions can be adjusted by changing the threshold using the throttle graph slider control or typing a new value into the threshold text field.

Throttle Graph

As already mentioned above, the interactive viewer allows users to "throttle" which interactions to work with, using as a criterion the interactions’ specific type, e.g. probability. As the threshold is increased, the number of interactions meeting this criterion decreases, as displayed in the query results columns (e.g "Protein-DNA") of the Selected Markers list.


The graph shows a result with three interaction types. A fourth line ("Total Distribution") depicts the sum of those three.


Here, the cutoff has been increased to a value of 0.90, and the cursor hovered at the point 0.91, with hover text showing the count of interactions remaining beyond that point. As the cursor is moved further to the right, the number of interactions remaining will decrease.

CNKB web BCi 0.9 hover.png

Controls

Create Network

Create a network based on the query results, and as filtered by the throttle graph threshold setting. The new network is placed in the Workspace in the form of an adjacency matrix. It is summarized at the gene level if an annotation file was loaded (with probeset to gene annoations). The network will be displayed in a viewer implemented using Cytoscape.js.

  • Note on network size - If the network created is larger than it may be possible to display in Cytoscape, Cytoscape will offer the user the option of a tabular display instead.

Export

  • Export table to Excel - The full tabular data is exported to an Excel format (.xls) file in the same format as it is displayed.
  • Export interactions to SIF - Export interactions, with their interaction types, to a Cytoscape SIF format file.
  • Export interactions to ADJ - Export interactions, with their values (e.g. probability), to a geWorkbench adjacency matrix file.

Cytoscape Display of Network

The simple network viewer is implemented using the Cytoscape.js Javascript package. Additional features are planned to be added.

The network displayed below is from the previous example, with a threshold setting of 0.7 (probability) and using the "cose" layout manager. Nodes shared between the two hub genes have been moved slightly to make them more obvious.

CNKB Cytoscape 0.7 cose.png


Controls

Layout managers

Several different layout managers are available with which to draw the network. Often the most useful is "cose".

The layouts shown below are:

  • concentric
  • grid
  • circle
  • cose
  • breadthfirst


concentric
grid


circle
cose


breadthfirst

Export

The network will be exported to a file as an adjacency matrix using gene symbols.

Display

CNKB Cytoscape 0.7 t-test.png


  • t-test - project the results of a t-test onto the displayed network. You will be prompted to choose an existing t-test result node from the Worksapce. Nodes showing overexpression are colored red, and underexpressed nodes are colored blue. The shade of red or blue becomes darker with increasing over- or under-expression.
  • reset - remove the t-test display.

Technical Note

For some ids used in the CNKB database, there may a matching marker which however does not have a gene symbol. In the Affymetrix annotation file, these are indicated with a gene symbol of "---". These results are included in the CNKB results table.