geWorkbench

1 Multi t-test
2 Removed from MINDy tutorial because in fact we do not check these parameter files in the aracne version used in mindy
- 2.1 Advanced - setting ARACNe dataset parameters
3 Removed from MINDy tutorial because DPI not used
- 3.1 DPI settings (Not used in MINDy)
  - 3.1.1 DPI Target List (Not used in MINDy)
  - 3.1.2 DPI Tolerance (Not used in MINDy)
4 Retired geWorkbench tutorial pages
5 Notes from Bernd on Tutorials, plus responses.
6 NOTES
- 6.1 HistoneDB

Multi t-test

Removed from MINDy tutorial because in fact we do not check these parameter files in the aracne version used in mindy

Advanced - setting ARACNe dataset parameters

MINDy makes use of the original Fixed Bandwidth implementation of ARACNe. This algorithm can make use of parameters which are data set specific, if available (by separate calculation), and which can be used in setting the Kernel Width and Threshold. ARACNe includes default values with which to calculate these parameters, which also depend on the number of arrays in the dataset. However, it is possible to use the newer version of ARACNe (also called ARACNe2), which is included in geWorkbench as a separate component, to calculate the needed values for a particular dataset. The key is that ARACNe looks for two parameter files with the fitted parameters, and will use these if they are found. The files are called "config_kernel.txt" and "config_threshold.txt". If you want to use custom parameters in MINDy, you must create these two files by using a separate PREPROCESSING run of ARACNe on your dataset.

Running ARACNe in PREPROCESSING mode, with algorithm FIXED_BANDWIDTH, will create two files in the geWorkbench root directory, named according to the following template:

DatasetName_ARACNe_FBW_kernel.txt
DatasetName_ARACNe_FBW_threshold.txt

where "DatasetName" is the name of the microrarray dataset for which you ran ARACNe. For example, for the Bcell-100.exp dataset, the following two files would be generated:

Bcell-100.exp_ARACNe_FBW_kernel.txt
Bcell-100.exp_ARACNe_FBW_threshold.txt

To make these file available to MINDy, just rename them to "config_kernel.txt" and "config_threshold.txt".

Note that these default file names will be seen and the contents used by all versions of ARACNe, both standalone and within MINDy. So you should remove or rename these files before doing any other work with ARACNe/MINDy.

Removed from MINDy tutorial because DPI not used

DPI settings (Not used in MINDy)

The Data Processing Inequality (triangle inequality) can be used to remove the effects of indirect interactions, e.g. if TF1->TF2->Target, the DPI can be used to remove the indirect action of TF1 on the target. Stated another way, the DPI can be used to remove the weakest interaction of those between any three markers. Setting the DPI Tolerance is independent of whether a DPI Target List is used.

DPI Target List (Not used in MINDy)

The DPI target list can be used to limit the ARACNE calculation to transcriptional networks. It is used to screen out spurious regulatory interaction signals of genes that are tightly coexpressed but are not in a regulatory relationship to each other, for example genes for proteins that are used to build a protein complex. The eukaryotic ribosome, for example, needs the stoichiometric expression of about 80 proteins, but those proteins are not in a regulatory relationship to each other.

The specified markers will be given preferential treatment during the DPI edge-removal step. Edges originating from markers on this list will not be removed by edges originating from markers not on this list. However, for DPI calculations where all three markers are members of the list, the weakest connecting edge may still be removed.

If used, it is suggested that the DPI Target List should contain all markers that are annotated as transcription factors. Signaling proteins could also be included.

A comma-separated list of markers can be typed in to the text field, or it can be loaded from an external file.

DPI Tolerance (Not used in MINDy)

The DPI tolerance specifies the degree of MI sampling error to be accepted, as with a finite sample size an exact value MI cannot be calculated.

The DPI tolerance is normally set between 0 and 0.15, since values larger than 0.15 yield higher false positives.
See the ARACNe tutorial page and Margolin et al. 2006 for further details on use of DPI.

Retired geWorkbench tutorial pages

These links are to components no longer part of geWorkbench - just in case they come back someday.

Reverse Engineering |

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Reverse_Engineering

Network Browser |

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Network_Browser

Synteny may come back some day but does not seem to be under active development:

Synteny |

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Synteny

Notes from Bernd on Tutorials, plus responses.

5/9/2006:T-Test

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Differential_Expression

All in all nothing much to complain, great example (i.e. it is working for me and I understood, at least I think so)

Where is the Multi T Test sample?

KCS RESPONSE - I have added a multi-t-test example.

A reference for the T-test would be nice.

KCS RESPONSE - I have added a web-site link to a description of the t-test.

Why is it “T Test” and not “T-test” or something else? (it just looks strange, but I don’t know what is right)

KCS RESPONSE - I have at least made it t-Test or t-test now. It is still a bit inconsistent.

T Test analysis identifies markers with statistically significant differential expression between two sets of microarrays. The t-test (T Test???) determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels(sets of microarrays) as “case” and “control”, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in the online help.

I don’t have “Gene Panel” but rather “Marker” for the results.

KCS RESPONSE - fixed.

The label to the right displays the Significance value (the lower the value, the more likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

KCS RESPONSE - please restate this as a question.

Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display. => why italic? For me they don’t do anything, does it mean they shouldn’t be there?

KCS RESPONSE - no, they don't seem to do much. I have taken out the extraneous text.

As to the functionality:

Gene height and gene width can take negative values, this is a bug! (Color Mosaic)
Pat, Abs, etc don’t do anything (see above)
Marking a gene in the Markers panel doesn’t do anything in the Volcano Plot.
I can only zoom into the plot, but not mark any spots in either of the visualization panels

5/9/2006: Clustering

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Clustering

I guess you know that the data set is not available for download ;-)

KCS RESPONSE - it is there now (as of a few weeks ago now).

"Go to the Analysis component, and select Fast Hierarchical Clustering Analysis"
- I believe there should be somewhere something said about the algorithm used for fast hierarchical clustering. Not all parameters are self explanatory.

KCS RESPONSE - there is a little problem with the algorithm right now....

Should the check box called “enable zoom” or maybe “enable selection”? I think it might be kind of confusing this way.

KCS RESPONSE - good point, it is not really a zoom is it? I have entered this into Mantis.

And after this nice tutorial I am left with the question: And now what? Or, why did I do this, again?

KCS RESPONSE - I have added a bit of context at the beginning....

5/10/2006:Basics

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Basics

The Data Management area can hold one workspace, and a workspace in turn can hold one or more projects. Projects can be used as wished {remove} to group different data sets. Each opened data file or analysis result is stored in a project [This is not really clear. Especially I would like to know something about the general concept of merging files vs. not merging files]. A workspace with all the data [what about the state of the data, especially what happened to like analyzing the data and its results?] it contains can be saved and returned to later.

The GUI provides a menu bar at top with a standard choice of commands. Many commands that are available in the menu bar are also available by right-clicking on data objects.

That is not entirely true. Usually you have an exit function under File, which is missing

In general I would like to see some more details on the Project/File concept as eluded on earlier. I think this is a good place to put this information and I haven’t seen it anywhere else

RESPONSE by KCS - I have added a section about data representation which introduces files and merging. There is another section of the tutorials called Projects and Data Files which describes some of the mechanics.

5/10/2006: Project and Data Files

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Projects_and_Data_File

KCS - These responses were added 6/2/2006.

Affymetrix File Matrix - this is the native file type created by geWorkbench
- I actually don’t know how to create this file from geWorkbench…

KCS - I have added more about merging....

By the way you when we were talking about Matlab you said you only support free software: What about Affymetrix??? Are Genepix RMA Express free as well???

KCS - Affymetrix is our primary data source. Matlab is an analysis program that we do not have ourselves.

What are Pattern Files?
What are Genotypic data Files (should be files not Files, same for FASTA Files and Pattern Files and others)

We select the 10 MAS5 format text files from the directory geworkbench\data\training\cardiogenomics.med.harvard.edu, which is included in the geWorkbench download as shown in the picture below.

I don’t get the message that you show

KCS - you probably had already loaded a file of that type, so its definition did not need to be reloaded.

The merged dataset is listed in the Project folder. The data is displayed, in single array format, in the Microarray Viewer. Note we have increased the intensity slider to maximum here.
- Here you should mention that you only see the first/last array and that you can scroll through the arrays with the array slider

KCS - I have added this.

There are no Okay buttons, but rather OK

KCS - fixed.

I mentioned this somewhere else already: When you want to delete/remove a bunch of data nodes you can select them, right click them, but only one file is then removed = BUG!

KCS - You should add this to Mantis.

Ah, now I see how you can save your special geWorkbench file. Maybe you should mention here that this is actually saving the data in this particular format. (At least I assume it does so)

KCS - added more explanatory text.

For the remote upload: The difference between Open and Go is not clear to me. Here is THE place for me where I am missing the mouse over help messages.

KCS - added explanatory text.

The first image is not correct. For me it doesn’t show all the array experiments

KCS - The image is correct as far as I know...

It is totally NON intuitive to have to right-click on a remote dataset to get additional information. It was at first not even obvious that there is additional data available…

KCS - I completely agree, this is the oddest thing....

It is interesting that you chose this example, because it seems that only four or so of all the entries actually have derived assays. ;-)

KCS - not by chance....

Maybe you want to explain what derived assays are?

KCS - Added....

Also for the remote source, I would like to know what other sources are there and how the interface should look like. I have no clue what and why and how I should link other sources.

KCS - there are actually no other sources. This is for future reference.

Maybe this is another place to put some more information about merging files…

KCS - another mention of merging has been added.

5/10/2006: Data subset

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Data_Subsets

I would like to see a reference to the paper/web page where you took your example from. This way the interested reader can get some insights into the biological question…

KCS RESPONSE - I have added this info to the Tutorial_-_Data page.

I don’t think you the Activate/Deactivate functions under the right mouse click.

KCS RESPONSE - Activate/Deactivate work for me.

5/10/2006: remote data

Are you sure that the remote data function is working correctly. I seam to have trouble loading some of the data… KCS - RESPONSE 6/5/2006 - I can load caArray data normally

5/10/2006: Viewing microarray dataset

Comments on

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Viewing_a_Microarray_Dataset

in the visualization panel I don’t think the alignment of properties and corresponding names is ideal, but that is just optical

KCS RESPONSE - I don't understand the comment - what exactly are you refering too?

Why does it say “+ Intensity”?

KCS RESPONSE - If you stretch out the display horizontally a bit, you will see the color-code bar appear, which shows the color spectrum from - to + expression maxima. It is probably a bug that this disappears when screen real-estate gets tight.... I have entered it into Mantis.

Why is there a bluish bar underneath the slider of Intensity?

KCS RESPONSE - I think it is for esthetics.

When removing object, maybe the delete button should do the same thing

KCS RESPONSE - The same thing as what? I don't understand the comment.

The images created (right click, image snapshot) can be saved and exported (File-> export).

When analyzing sets of arrays, wouldn’t it be helpful to have a mean/median function over all spots at specific positions. This way systematic errors can be detected.

KCS RESPONSE - I am not sure what you mean. We should discuss this in person.

Expression Profiles: This is a line graph of gene[s] expression profiles across several arrays/ hybridizations. [space] Each marker is a separate color line.

KCS RESPONSE - rewrote section.

Scatter Plot: A pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values. [One array servers as the reference (x-axis serves, set by right-clicking and selecting x-axis, dark background) and subsequent arrays are plotted against this reference in different sub images. Up to six sub images can be created.]

KCS RESPONSE - changes incorporated.

Genepix Value Computation: You can specify how to compute the value displayed for a Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

KCS RESPONSE - corrected.

I don’t know anything about Genepix, but I assume that everyone playing around with geWorkbench and microarrays would know this, right?
Select Relative for the visualization preference. Note that this choice will not take effect until the next time you load a data set.
- I would consider this as a bug!

KCS RESPONSE - I don't know why this is so, you can enter it as an enhancement request.

Great page!

5/10/2006: Filtering and Normalization

Comments on:

http://www.geworkbench.org/workbench/index.php/Tutorial_-_Filtering_and_Normalizing

KCS RESPONSE - all corrections/suggestions below were acted on. The page was mostly rewritten, and all new screenshots supplied. The detailed example 1 was added, which explains how the normalized and filtered data set used in many of the other tutorials was created.

Affy Detection Call

Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.

are you sure that everyone knows A, P, or M?

4. Choose the maximum number of arrays that can have missing values before a/the marker is removed – default is 0.

Somehow I have difficulties to understand what is going on, but that can be me or my sleepyness…

Normalizers:

AS you know, I don’t much about micro array analysis, but from what I understood from my friends, is that normalization is a BIG issue. So I would guess that there should be done much more on this front. When the starting data is not optimal you can’t expect much from later analysis. Therefore I would make this a high priority.

You haven’t mentioned the quantile Normalizer in the list of normalizers. What about houseKeeping Genes Normalizer

It is either missing value computation (as in my program) or missing value calculation (web-page)

5/11/2006: Marker Annotations

Comments on:

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Marker_Annotations

1. The desired marker set is activated by checking its box in the Marker Sets component.

=> This should be explained in more detail. It took me some time to figure out what you meant.

KCS - I added a link to a picture of selecting markers in the Markers component.

Otherwise everything is straight forward.

There is much you can do here. It would be good if you could incorporate some of the information into geWorkbench for further analysis. Of the top of my head I can think about retrieving sequence for alignments, combining pathway information: which common elements are within the pathways my array/experiment came up with etc…

This obviously needs some further thoughts and discussions.

KCS - some of this is actually already in place, but more can be done....

5/11/2006:; Sequence Retrieval

Comments on:

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Sequence_Retrieval

You forgot to mention that you have to add the sequence to the project, they are not automatically imported.

KCS RESPONSE - added description of how to add sequences to a project.

When playing with the sequence features, I realized that the distinction between the visualization panel and the analysis panel is not clear enough.

KCS RESPONSE - I have added a bit more explanation to differentiate these.

There is no scroll bar for the sequences = BUG!!

KCS RESPONSE - There is a vertical scrollbar when needed. Not sure what the problem is here.

The squence that is displayed should be marked in the window, otherwise it has no meaning.

KCS RESPONSE - this should be entered as an enhancement request.

I saw some stretches of “EEEEEEEEEEEEEEEEEEEEEE” are those correct, never saw them in NCBI.

KCS RESPONSE - this is an error. The cached data is itself corrupt. We are going to switch to Santa Cruz to get data live.

Where do the promoters in the Promoter panel come from? Where are they located on the sequence?

KCS RESPONSE - not pertinent to this panel. The list is derived from Jasper. The Promoter component will search the motifs against sequences.

Is the length of the squence normalized? I believe that the line in the visual panel represents the sequence, but why are the all the same length?

KCS RESPONSE - they are all the same length because 4000 bp was retrieved for each.

Is this the squence from the array or the sequence from the associated gene/ predicted associated gene?

KCS RESPONSE - these are sequences +- the gene transcription start site.

When using the analysis try to modify the vertical size of the panel: You will see that the BLAST/STOP buttons disapear pretty soon and it look awkward what is happening there.

KCS RESPONSE - I don't see this problem in the Sequence Analysis component.

How long does BLAST run, but I guess this belongs in the next page.

5/11/2006: Blast

Comments on:

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_BLAST

I don’t think it is a good idea to put the advanced options on a separate page next to all the other tools. Does this mean that those advanced options are also for HMM etc?

KCS RESPONSE - No, they disappear when HMM is selected.

Service is now called Server_Info