geWorkbench

Revision as of 10:29, 15 August 2006

Functionality Comments

Rich, add functionality comments and new feature suggestions here.

One quick inital suggestion. geWorkbench should be able to import files in the following GCG formats: sequence, mutiple sequence, and rsf.

(3/23/06) A more robust couterpart of k-means clustering with statistical estimates for micorarray analysis is described in the following papers:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12801869&query_hl=11&itool=pubmed_docsum

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12184810&query_hl=11&itool=pubmed_docsum

3/30/06 I don't like the slider to change arrays in the microarray widow. The identity of an array is a fixed, not a variable quantity. I suggest that a pull-down window for this would be better.

4/7/06 I suggest asking "are you sure" when a user asks to remove a project.

4/26/06 It would be very helpful if the workbench could display an hourglass, or a watch, or a sundial or something, when it ia loading or working - for example when it is loading micorarray files from a remote database.

5/25/06. I just installed Version 1.03. In the Windows menu it says version geWorkbench 1.0 and on top of the geWorkbench GUI its says geWorkbench 1.0. I suggest that all labels give the full workbench version.

5/25/06 The two tutorial sets should be included in the download automatially.

5/25/06 I would like to ammend my recommendation of 4/26/06 to inlcude an estimate of the time a task will take, so that people may use it more easily.

5/25/06 When I spoke to the group, Ken had stated that the intensities in the microarray viewer did not correspond to an image of the chip. In which case the phrase "microarray viewer" is misleading. In fact I am not sure to what the intensities an spacing in microarray viewer corresponds.

5/25/06 I think that "Get bioassays" is a poor command on 2 grounds: 1. I am not used to "bioassays" being used in place of "arrays" or "array data". 2. We are obatining a list, rather than loading the bioassays into the program. What I think we eman then is "list arrays".

Additionally, it is not clear what format the arrays are being loaded (Cel, normalized probeset intensities, etc).

5/25/06 Some indication that a work is in progress should be given while the arrays are being loaded.

5/25/06 I suggest that a dummy new source be made available to the users to learn how to access a remote source and I suggest that intructions for posting a remote source be made available. Doing these things will increase the ease with which users can use the workbench in collaborative projects.

5/30/06 The terms "marker" and "phenotypes" are not optimal. In the microarray world we use "probsets" (affymetrix) or "probes" (glass-slides) instead of "markers". "Arrays" is much more informative than "phenotypes" because there can be several arrays for a phenotype, or arrays can represent different patients rather than a phenotype, or because arrays can correspont to points in a time series. Also, you might want to reserve "phenotype" for instantiations that have precise defintions in a controlled vocabulary.

5/31/06 With respectto the tabular microarray view. There is also a "probe number" for affy chips (1,2,3. ..) based upon its poistion in a sort. It would be useful to have a colum for that. It would also be useful to have seperate, searchable columns for the following 3 items: 1. Probe id. 2. Gene name. 3. Gene defintion.

(If it sounds as if I am thinking of Excel here - I am).

5/31/06. I strongly recommend that there be a way to reverse filtering, by a global undo command or some other means, so that the user may try different filters.

6/14/06 Inclusion in the announcments mailing list should be made an integral part of teh downloading process.

6/14/06 The "expression thresshold filter" instructions shoudl be clearer. stating "Filer values insdie range" is ambiguous in that it is not clear if those values are left after filtering or removed by filtering (I believe that the later is the case). I suggest the language be changed to "remove values inside range". or "flter-out values inside range".

6/29/06 I recently did some Hierarchical Clustering using Cluster 3.0. Instead of simply filtering by absolulte M value, its also enables the user to retain genes that are larger than a given mvalue in a user-specifiiable

of experiments. It also offers the following options:

1. %. present >= of chips (this only works if you use present/absent threshholds rather than statistical noise. 2 .SD gene vector >=X to remove genes with insufficient variability. 3. Max-Min >= another variability filter.

I can see why someone might want to use 2 or 3.

7/17/06 Tabular micorarray format - The column widths on the tabular microarray format should be suffcient to accomodate the whole title of the chip.

7/17/06 The color mosiac only makes sense if the data already has a log2 or other variance-stabilization transformation. As is, an unsuspecting user can look at real values at this can be confusing. Furthermore, heatmaps make teh most sense for log ratio comarisons versus a standard.

8/8/06 The tabular micorarray viewer should be saveable as an Excel spreadsheet. all tables should be saveable as an excel spreadsheets.

8/10/06 A linear or spline fit to the reference line in the scatterplot would be helpful.

8/11/06 A good filter feature would be to give the user an option of accepting or rejecting the filtering based upon the number that survived the filtering prior to acceptance. Also, there should be the option of blowing up the heatmap. In general the functionality of cluster 3.0 (written by our own Michiel de Hoon) and JavaTreeview should be reproduce for clustering.

8/14/06 A word about t-tests. It is very common in the microarray field for experimentalists to not give sufficient numbers of replicates to get good statistics. The word in the statistical community is to use some variant of a Bayesian t-test which pools variances of similar sample sizes to take into compensate for small smaple size. This started with Cyber-T, but the 3 most used and validated ones are: 1. LIMMA (LInear Models for Microarray Analsysis) from Gordon Smyth based on earlier work by Terry Speed and Ingrid 2. SAM (Significance Analsysis for Micorarrays) by Tibrishani and coworkers. 3. The method included in BRBArray tools, by Simon, Radamacher, and cowrkers.

I am told that LIMMA and SAM give similar results to one another and that BRBArrayTools gives somewhat different results to the two former programs, but not necessarily inferior results.

GeneSpring and GeneTraffic also have their own versions of this method. My anecdotal experiene is that GeneTraffic does not match the results from LIMMA.

I recommend that some version(s) of the Bayesian method be included in geWorkbench. Both LIMMA and SAM are available as part of Bioconductor and therefore can be ported to geWorkbench as part of a of a general Bioconductor port.

8/14/06 i strongly recommed that the benjamini-Hochberg False discovery rate correction be offered as an option. In fact all of the options in the current version of AffyLmGUI would find use.

8/14/06 I suggest that the heat map ratehr than the volcnaoc plot be the default display onb the t-test output.

8/15/06 With respect to the multiple t-test, an obvious improvement, alluded to in the functionality write-up is to take into account multiple comparisons. The most basic way to do this is to add terms corresponding to the variance of all of the smaple studied in the denominator of the expression for t. An expresion for this appears in the powerpoint presentation that I gave to the geWorkbench development group. This correct is different and more fundamental than additional multiple comparison corrections Bonferroni, Benjamini-Hochberg etc. Indeed doing Benjamin-Hochberg corrections for multiple tests (probesets, i.e. "markers) and multiple phenotypes simultaneously has not been implemented by the microarray statistical community to my knowledge. So I wouldn't worry about it for geWorkbench. However LIMMA, SAM, and BRBarrayTools each has its own version of correction at the level of the t-test, and I suggest that at least the first two be implemented in geWorkbench.

I realize that the above paragraph might be rather cryptic. I am available to discuss the considerations involved with the developemnt team.

Tutorials Comments

Tutorials comments go here.

The initial download should come with all of the datasets in the tutorial (the cardio set was missing when I installed) OR the tutorial should show where these can be downloaded.

3/30/06: Some mention of what the micorarray viewer does should be included in the manual - i.e that it shows a raw image of the chip.

3/30/06: What it means to merge microarray files should be stated more explicitly.

4/07/06 That the chip recognition message is only shown once should be stated. Alternatively maybe it should be shown each time - but not require an okay button.

4/10/06 How to save a merged affy dataset so that one may open it again shoudl be described more clearly. The following points (courtesy of Ken) should be mentioned (and illustated).

1. The set should be saved with an exp suffix.

2. The set can be reopened with the filter set to "Affymatrix Matrix file".

4/24/06 The tutorials comments for opening a remote site are misleading. It should state: 1. Go is clicked for getting the list of micorarray experiments. 2. "Get Bioassays" is necessary for getting a list of arrays in the

  experiment-not for retreival.

3. "open" will retrieve the selcted bioassays. I found this veyr hard to use and required correspondence with Kem and a visit from Xiaoqing in orfer to learn to use it.

4/26/06 I suggest that the tutorial not mention adding a new site for remote downlaod umtil such sites are commonly available. Otehrwise it just begs questions from the reader/

5/25/06 I suggest that the tutorial pages state to which version of geWorkbench they apply. This is implicit in the label of the window, that appears in the screenshot, but it should also be on the web-page that the user unloads.

5/25/06 I suggest that the tutorial pages be downlaodable as a pdf file.

5/25/06 I suggest that there be a public mailing list where users can be notified of updates.

5/25/06 I suggest to what the intensities and layout on the microarray viewer slide be discussed.

5/30/06 Designating a group of arrays a "case" causes the thumbtack to be labeled red. However, designating a group of arrays as the "control" does not change the color of the thumbtack. I suggest that the color of the thumbtack be changed to green to distinguish it from a group whose nature has not been demonstrated. Also, the designation "case" is used in clinical and epidemiological research. The corresponding term in laboratory research is "experiemnt".

6/13/06 It should be explained that the microarray viewer image is in probset order split across each row and is not an actual image of the slide.

6/14/06 Examples of each the different filter options should be given in the tutorial.

7/17/06 I believe that you are doing the person learning to use geWorkbench a disservice by showing the heat map instructions in the tutorial before you have shown log transformation (or at least am assuming that there is no log trandformation because some of the numbers are so high). Heat maps are most useful relative to a standard and hence this should be used as part of a didactic example in which a log2 ratio standard is used.

7/17/06 I suggest that the instructions in the tutorial for using the scatter plot graph be more detailed and step-by-step.

8/8/06 I suggest that the difference between a project and a workspace be spelled out.

8/10/06 I suggest the tutorial note explicitly the meaning of the buttons necessary to display the heat map in the micorarray display panel.

8/10/06 There should be some discussion as to where array/phenotype labels come from in the tutorial.

8/10/06 I suggest that there be a tutorial example to plotting the array with a subset of a few probesets.

8/10/06 An illustration of using reference line in the scatterplot would be helpful.

8/14/06 A discussion of the interpretaion of heat maps and volcano plots which wouldbe helpful. I would especially appreciate this in the case of volcano plots, because although I can read the axes, I don't really know how to interpret them. Also an example in which filtering by p-value and by fold change should also be given.

geWorkbench

Difference between revisions of "User:Rfriedman"

Revision as of 10:29, 15 August 2006

Contents

Functionality Comments

Tutorials Comments

Search

Personal tools

Tools

@@ Line 144: / Line 144: @@
 /14/06 I suggest that the heat map ratehr than the volcnaoc plot be the default
 display onb the t-test output.
+/15/06 With respect to the multiple t-test, an obvious improvement, alluded to
+in the functionality write-up is to take into account multiple comparisons. The
+most basic way to do this is to add terms corresponding to the variance of
+all of the smaple studied in the denominator of the expression for t. An
+expresion for this appears in the powerpoint presentation that I gave to the
+geWorkbench development group. This correct is different and more fundamental
+than additional multiple comparison corrections Bonferroni, Benjamini-Hochberg etc. Indeed doing Benjamin-Hochberg corrections for
+multiple tests (probesets, i.e. "markers) and multiple phenotypes simultaneously
+has not been implemented by the microarray statistical community to my knowledge.
+So I wouldn't worry about it for geWorkbench. However LIMMA, SAM, and BRBarrayTools each has its own version of correction at the level of the
+t-test, and I suggest that at least the first two be implemented in geWorkbench.
+I realize that the above paragraph might be rather cryptic. I am available to
+discuss the considerations involved with the developemnt team.
 ==Tutorials Comments==