Contents |
Genome-Scale Network Inference (DREAM2, Challenge 5)
This archival page describes the challenge exactly as it was presented to the participants. Go to the main DREAM2 Challenge 5 page to download data, view team rankings, cite this work, etc.
Synopsis
A panel of single-channel microarrays was collected for a particular microorganism, including some already published and some in-print data. The data was appropriately normalized (to the logarithmic scale). The challenge consists of reconstructing a genome-scale transcriptional network for this organism. The accuracy of network inference will be judged using chromatin precipitation and otherwise experimentally verified Transcription Factor (TF)-target interactions.
Dataset
This challenge dataset consists of two files. One file, data.csv contains the experimental data, and the other, tfs.csv, lists the transcription factors.
- data.csv This file contains a 3456 genes x 300 experiments dataset. The names of both genes and experiments have been withheld, and operon information is not provided. As described above, the experiments represent both published and not-yet-released data from a variety of sources. The 3456 genes include all known and putative transcription factors and all genes whose interactions will be used for testing, as well as a number of other recognized coding sequences. This file is comma-separated and is easily imported into Excel or any other program.
- tfs.csv This file contains the indices of rows belonging to transcription factors in the matrix from data.csv, one per line.
Submission Information
Submit one network prediction in one or more of the following categories: DIRECTED-UNSIGNED, DIRECTED-SIGNED. Use the 3 tab-separated column format as in the example below:
- row#1 \tab row#2 \tab XYZ
where row#1 is the index of the row of a transcription factor and row#2 is the index of the row of one of its putative targets in the same order as the rows in the file data.csv. XYZ is a connectivity score between 0 and 1 that indicates the confidence level you assign to the prediction that a the TF at row#1 regulates the gene at row#2. The value of XYZ will be different for UNSIGNED, SIGNED-EXCITATORY and SIGNED-INHIBITORY, and will be discussed below.
Note that row#1 has to be one of the values of the file tfs.cvs. Interactions for which row#1 does not correspond to a transcription factor will not be judged for scoring.
Only DIRECTED networks will be accepted. If the transcription factor at row#1 regulates the transcription factor at row#2, and also the transcription factor at row#2 regulates the transcription factor at row#1, then both lines should be included. Participants whose algorithms produce UNDIRECTED networks can submit their predicitons provided they directionalize their UNDIRECTED network into a DIRECTED one. To do that, simply assume that only transcription factors may be regulators. This will make all edges directed, except for TF1 --> TF2 edges. These edges must then be submitted twice as: TF1 --> TF2, and TF2 --> TF1, in effect as if predicting a feedback loop. Naturally, this strategy will produce no false positive if the feedback loop is correctly predicted, one false positive even when one edge is correctly predicted, and two false positives when it's not.
For UNSIGNED submissions:
- XYZ is a connectivity score between 0 and 1 that indicates the confidence level you assign to the prediction that a TF regulates a target gene, regardless of the sign of the regulation. (E.g., XYZ = 1 if the pair TF-gene is deemed to be connected with highest confidence, and XYZ = 0 if the pair is deemed not to interact.) Order your predictions in decreasing order of XYZ values, i.e., from the most reliable prediciton (highest XYZ value) in the first row and the least reliable prediction (lowest XYZ value) in the last row. Save the file as text, and name it:
- TeamName_UNSIGNED_GenomeScale.txt
- where TeamName is the name of the team with which you registered for the challenge.
For SIGNED submissions:
- Submit one network predictions for excitatory connections and one for inhibitory connections.
- For EXCITATORY connections:
- XYZ is a connectivity score between 0 and 1 that indicates the confidence level you assign to the prediction that a TF upregulates a target gene. (E.g., XYZ = 1 if the TF is deemed to upregulate the target gene with the highest confidence, and XYZ = 0 if the pair is deemed to be disconnected, or the TF is deemed to downregulate the target gene.) Order your predictions in decreasing order of XYZ values, i.e., from the most reliable prediction (highest XYZ value) in the first row, and the least reliable prediction (lowest XYZ value) in the last row. Save the file as text, and name it:
- TeamName_SIGNED_EXCITATORY_GenomeScale.txt
- where TeamName is the name of the team with which you registered for the challenge
- For INHIBITORY connections:
- XYZ is a connectivity score between 0 and 1 that indicates the confidence level you assign to the prediction that a TF downregulates a target gene. (E.g., XYZ = 1 if the TF is deemed to downregulate the target gene with the highest confidence, and XYZ = 0 if the pair is deemed to be disconnected, or the TF is deemed to upregulate the target gene.) Order your predictions in decreasing order of XYZ values, i.e., from the most reliable prediction (highest XYZ value) in the first row and the least reliable prediction (lowest XYZ value) in the last row. Save the file as text, and name it:
- TeamName_SIGNED_INHIBITORY_GenomeScale.txt
- where TeamName is the name of the team with which you registered for the challenge.
Scoring metrics
We will score the results using the area under the precision versus recall curve for the whole set of predicitons. No threshold need be applied to your predicitons, since even low precisions at increasing recall will contribute to the final score. All pairs omitted from the list in your prediction files will be considered to appear randomly ordered at the end of the list with XYZ = 0. For the first k predictions (ranked by connectivity score, and for predictions with the same score, taken in the order they were submitted in the prediction files), precision is defined as the fraction of correct predictions to k, and recall is the proportion of correct predictions out of all the possible true connections (with the approperiate sign, if the category is SIGNED). Other metrics such as precision at 1%, 10%, 50%, and 80% recall, and the area under the ROC curve will also be evaluated.
