The goal of the in silico challenges is the reverse engineering of gene networks from steady state and time series data. Participants are challenged to predict the directed unsigned network topology from the given in silico generated gene expression datasets.

These challenges have been provided by Daniel Marbach and his colleagues from the Laboratory of Intelligent Systems of the Swiss Federal Institute of Technology in Lausanne. The data can be freely used. Please cite the DREAM project and the following paper in your publications:

Marbach, D., Schaffter, T., Mattiussi, C. and Floreano, D. (2009) Generating Realistic in silico Gene Networks for Performance Assessment of Reverse Engineering Methods. Journal of Computational Biology. To appear. [detailed record] [preprint] [bibtex]


Contents

The Three Challenges

There are three in-silico challenges corresponding to gene networks with 10, 50, and 100 genes. Predictions are assessed independently for each challenge. Thus, teams may choose to submit predictions only for one or two of the challenges. However, we encourage teams to participate in all three challenges in order to compare how well different methods perform on different network sizes.

Each challenge consists of five gold standard networks. In order to participate in a challenge, predictions for all five networks of this challenge must be submitted. The rational is that in this way it will be possible to assess how consistently a method predicts the topology in five independent networks of the same type and size.

The Datasets

For consistency, we provide the same type of data as in the DREAM2 in-silico Challenge. For every network, the following experiments are simulated:

Heterozygous knock-down. The files *-heterozygous.tsv (the meaning of the wild card * will be explained lines below) contain the steady state levels for the wild-type and the heterozygous knock-down strains for each gene (+/-). Thus, for a network of size N there are N+1 experiments (wild-type plus knock-down of every gene).

Null-mutants. The files *-null-mutants.tsv contain the steady state levels for the wild-type and the null-mutant strains for each gene (-/-). Thus, for a network of size N there are N+1 experiments (wild-type plus knock-out of every gene).

Trajectories. The files *-trajectories.tsv contain time courses of the network recovering from several external perturbations. For the networks of size 50, the same number of time courses as in the DREAM2 in silico challenge are provided (23 different perturbations). For the networks of size 10 and 100, we give 4 and 46 perturbations respectively (each one with 21 time points).

The * in front of *–heterozygous.tsv, *-null-mutants.tsv and *-trajectories.tsv can take the values *=InSilicoSizeN-OraganismK, where N=10, 50, or 100, Organism is Ecoli or Yeast and K = 1 or 2 if Organism is Ecoli, and K=1, 2, or 3 if Organism is Yeast.

Note that we call that data "Ecoli" because we are using a subetwork with a topology of connetions borrowed from the Ecoli Gene Regulatory network. As we wanted to keep a set of perturbations that was similar to those of DREAM2 In Silico Challenge, we abused notation and called the data heterozygous mutant (which should be read: transcription rate for that gene is half the wild type transcription rate) even to the networks with topology borrowed from Ecoli. This is the "freedom" given by the InSilico world but of course, Ecoli is haploid and the "heterozygous" data wouldn't make sense in real life for E. coli.

In all cases, the data corresponds to noisy measurements of mRNA levels, which have been normalized such that the maximum normalized gene expression value in a given dataset is one. These datasets can be downloaded from the DREAM3 data repository, after you have proceeded with the registration to the challenge.

Submission Information

The same submission format and scoring metrics as in the DREAM2 challenges are used. However, this year all predictions must be directed and unsigned. Important: there are no self-interactions (auto-regulatory loops) in the gold standard networks.

Submit a ranked list of regulatory link predictions ordered according to the confidence you assign to the predictions, from the most reliable (first row) to the least reliable (last row) prediction. Use a 3 tab-separated column format as in the example below:

A \tab B \tab XYZ

where A and B are two different genes (no self-interactions). Links are directed: the gene in the first column regulates the gene in the second column. (If both A regulates B and B regulates A, then both lines should be included.) XYZ is a score between 0 and 1 that indicates the confidence level you assign to the prediction. (E.g., XYZ = 1 if gene A is deemed to regulate gene B with highest confidence and XYZ = 0 if A is deemed not to directly regulate B. See Marbach et al. (2008) for a discussion of how confidence levels could be derived from standard network predictions). All pairs omitted from the list will be considered to appear randomly ordered at the end of the list with XYZ = 0. Save the file as text, and name it:

TeamName_Challenge_Network.txt

where TeamName is the name of the team with which you registered for the challenge, Challenge is either InSilicoSize10, InSilicoSize50, or InSilicoSize100, and Network is one of the five networks of the indicated challenge (Ecoli1, ..., Yeast3). As mentioned above, to participate in a challenge you need to submit predictions for all five networks of this challenge.

Scoring Metrics

We will score the results using the area under the precision versus recall curve for the whole set of link predictions for a network. For the first k predictions (ranked by score, and for predictions with the same score, taken in the order they were submitted in the prediction files), precision is defined as the fraction of correct predictions to k, and recall is the proportion of correct predictions out of all the possible true connections. Other metrics such as precision at 1%, 10%, 50%, and 80% recall, and the area under the ROC curve will also be evaluated.

Teams will be ranked according to their overall performance over the five networks of a challenge.

How Were the in-silico Networks Generated?

Great care was taken to generate in-silico gene networks that are biologically plausible, both with respect to the network structure and the network dynamics. Network topologies were obtained by extracting sub-networks from the gene-to-gene interaction network of E.coli and S. cerevisiae. Auto-regulatory interactions were removed, i.e., there are no self-interactions in the in-silico networks.

The dynamics of the networks were simulated using a detailed kinetic model based on one of several possible approaches for modeling gene regulation. Both independent and synergistic gene regulation occur in the networks.

Note that transcription and translation are modeled. However, the protein concentrations are not included in the provided datasets. As mentioned above, the datasets correspond to the mRNA concentration levels, as one would obtain from gene expression data.

Results & Additional Information

The challenge of size 10 had 29 participants, the one of size 50 had 27 participants, and the one of size 100 had 22 participants. This makes these challenges currently the most widely used gene network reverse engineering benchmark.

We would like to thank all participating teams and congratulate the team that achieved the best performance on all network sizes: Kevin Y. Yip, Roger P. Alexander, Koon-Kiu Yan, and Mark Gerstein from Yale University. You can now view the detailed results of all teams and the true structure of the networks.

The challenges have been generated with GeneNetWeaver (GNW). GNW allows one to easily generate additional benchamarks of the same type as the DREAM3 in silico challenges. GNW is available open source at: gnw.sourceforge.net.

Additional information (the datasets without noise, the signed network structures, etc.) is available at: DREAM3 in silico challenge additional information.

Quick Links

Data Download

Gold Standards

Results

Additional information

Retrieved from "http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM3_In-Silico-Network_Challenges._Description"

This page has been accessed 4,148 times. This page was last modified 16:57, 12 December 2008.

x
Find
Browse
The DREAM Project
Community portal
Current events
Recent changes
Random page
Help
Donations
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Create an account or log in
Special pages
New pages
File list
Statistics
Bug reports
More...