User:Rfriedman

Revision as of 14:40, 12 July 2007 by Rfriedman (talk | contribs) (Functionality Comments)

Functionality Comments

Rich, add functionality comments and new feature suggestions here.

One quick inital suggestion. geWorkbench should be able to import files in the following GCG formats: sequence, mutiple sequence, and rsf.

(3/23/06) A more robust couterpart of k-means clustering with statistical estimates for micorarray analysis is described in the following papers:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12801869&query_hl=11&itool=pubmed_docsum

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12184810&query_hl=11&itool=pubmed_docsum

3/30/06 I don't like the slider to change arrays in the microarray widow. The identity of an array is a fixed, not a variable quantity. I suggest that a pull-down window for this would be better.

4/7/06 I suggest asking "are you sure" when a user asks to remove a project.

4/26/06 It would be very helpful if the workbench could display an hourglass, or a watch, or a sundial or something, when it ia loading or working - for example when it is loading micorarray files from a remote database.


5/25/06. I just installed Version 1.03. In the Windows menu it says version geWorkbench 1.0 and on top of the geWorkbench GUI its says geWorkbench 1.0. I suggest that all labels give the full workbench version.

5/25/06 The two tutorial sets should be included in the download automatially.

5/25/06 I would like to ammend my recommendation of 4/26/06 to inlcude an estimate of the time a task will take, so that people may use it more easily.

5/25/06 When I spoke to the group, Ken had stated that the intensities in the microarray viewer did not correspond to an image of the chip. In which case the phrase "microarray viewer" is misleading. In fact I am not sure to what the intensities an spacing in microarray viewer corresponds.

5/25/06 I think that "Get bioassays" is a poor command on 2 grounds: 1. I am not used to "bioassays" being used in place of "arrays" or "array data". 2. We are obatining a list, rather than loading the bioassays into the program. What I think we eman then is "list arrays".

Additionally, it is not clear what format the arrays are being loaded (Cel, normalized probeset intensities, etc).

5/25/06 Some indication that a work is in progress should be given while the arrays are being loaded.

5/25/06 I suggest that a dummy new source be made available to the users to learn how to access a remote source and I suggest that intructions for posting a remote source be made available. Doing these things will increase the ease with which users can use the workbench in collaborative projects.

5/30/06 The terms "marker" and "phenotypes" are not optimal. In the microarray world we use "probsets" (affymetrix) or "probes" (glass-slides) instead of "markers". "Arrays" is much more informative than "phenotypes" because there can be several arrays for a phenotype, or arrays can represent different patients rather than a phenotype, or because arrays can correspont to points in a time series. Also, you might want to reserve "phenotype" for instantiations that have precise defintions in a controlled vocabulary.

5/31/06 With respectto the tabular microarray view. There is also a "probe number" for affy chips (1,2,3. ..) based upon its poistion in a sort. It would be useful to have a colum for that. It would also be useful to have seperate, searchable columns for the following 3 items: 1. Probe id. 2. Gene name. 3. Gene defintion.

(If it sounds as if I am thinking of Excel here - I am).

5/31/06. I strongly recommend that there be a way to reverse filtering, by a global undo command or some other means, so that the user may try different filters.

6/14/06 Inclusion in the announcments mailing list should be made an integral part of teh downloading process.

6/14/06 The "expression thresshold filter" instructions shoudl be clearer. stating "Filer values insdie range" is ambiguous in that it is not clear if those values are left after filtering or removed by filtering (I believe that the later is the case). I suggest the language be changed to "remove values inside range". or "flter-out values inside range".

6/29/06 I recently did some Hierarchical Clustering using Cluster 3.0. Instead of simply filtering by absolulte M value, its also enables the user to retain genes that are larger than a given mvalue in a user-specifiiable

  1. of experiments. It also offers the following options:

1. %. present >= of chips (this only works if you use present/absent threshholds rather than statistical noise. 2 .SD gene vector >=X to remove genes with insufficient variability. 3. Max-Min >= another variability filter.

I can see why someone might want to use 2 or 3.

7/17/06 Tabular micorarray format - The column widths on the tabular microarray format should be suffcient to accomodate the whole title of the chip.

7/17/06 The color mosiac only makes sense if the data already has a log2 or other variance-stabilization transformation. As is, an unsuspecting user can look at real values at this can be confusing. Furthermore, heatmaps make teh most sense for log ratio comarisons versus a standard.

8/8/06 The tabular micorarray viewer should be saveable as an Excel spreadsheet. all tables should be saveable as an excel spreadsheets.

8/10/06 A linear or spline fit to the reference line in the scatterplot would be helpful.

8/11/06 A good filter feature would be to give the user an option of accepting or rejecting the filtering based upon the number that survived the filtering prior to acceptance. Also, there should be the option of blowing up the heatmap. In general the functionality of cluster 3.0 (written by our own Michiel de Hoon) and JavaTreeview should be reproduce for clustering.

8/14/06 A word about t-tests. It is very common in the microarray field for experimentalists to not give sufficient numbers of replicates to get good statistics. The word in the statistical community is to use some variant of a Bayesian t-test which pools variances of similar sample sizes to take into compensate for small smaple size. This started with Cyber-T, but the 3 most used and validated ones are: 1. LIMMA (LInear Models for Microarray Analsysis) from Gordon Smyth based on earlier work by Terry Speed and Ingrid 2. SAM (Significance Analsysis for Micorarrays) by Tibrishani and coworkers. 3. The method included in BRBArray tools, by Simon, Radamacher, and cowrkers.

I am told that LIMMA and SAM give similar results to one another and that BRBArrayTools gives somewhat different results to the two former programs, but not necessarily inferior results.

GeneSpring and GeneTraffic also have their own versions of this method. My anecdotal experiene is that GeneTraffic does not match the results from LIMMA.

I recommend that some version(s) of the Bayesian method be included in geWorkbench. Both LIMMA and SAM are available as part of Bioconductor and therefore can be ported to geWorkbench as part of a of a general Bioconductor port.

8/14/06 i strongly recommed that the benjamini-Hochberg False discovery rate correction be offered as an option. In fact all of the options in the current version of AffyLmGUI would find use.

8/14/06 I suggest that the heat map ratehr than the volcnaoc plot be the default display onb the t-test output.

8/15/06 With respect to the multiple t-test, an obvious improvement, alluded to in the functionality write-up is to take into account multiple comparisons. The most basic way to do this is to add terms corresponding to the variance of all of the smaple studied in the denominator of the expression for t. An expresion for this appears in the powerpoint presentation that I gave to the geWorkbench development group. This correct is different and more fundamental than additional multiple comparison corrections Bonferroni, Benjamini-Hochberg etc. Indeed doing Benjamin-Hochberg corrections for multiple tests (probesets, i.e. "markers) and multiple phenotypes simultaneously has not been implemented by the microarray statistical community to my knowledge. So I wouldn't worry about it for geWorkbench. However LIMMA, SAM, and BRBarrayTools each has its own version of correction at the level of the t-test, and I suggest that at least the first two be implemented in geWorkbench.

I realize that the above paragraph might be rather cryptic. I am available to discuss the considerations involved with the developemnt team.

9/12/06 It is important that geWorkbench be able to import micoarray data in the 2 formats used by Entrez (GEO) (The Entrez Gene expression Omnibus databse). please see the Entrez site for format information.

9/12/06 I have had aneecdotal experience that if the latest version of Java is installed after geWorkbench is installed there is a problem and geworkbench has to be uninstalled and then reinsatlled to work. Xiaoqing says that this shouldn;t be the case, but I just thought I would let you know what had apparently happened.

9/13/06 I believe that the Workbench's utility woulf be greatly enhanced if it containded menus for submission of sequences to GenBank and Microarray data to GEO. This is especially important in the latter case where the burden of MIAME compliance is considerable. Furthermore, the time for users to specify their MIAME info is when they first read the data into the Workbench, so that the task becomes spread over and not a burden at the very end when the user has to upload into GEO in order for the manuscript to be cleared for publication. The Workbench should remind the user frequently of unspecified and unnotated files.

9/28/06 A much larger assortment of chip-types should be offered.

9?28/06- A general functionality need is the ability to analyze SNP-chips for loss of heterozygosity, copy number, and whole genome associations. The best package for the first 2 is dChip. I am just learning about whle genome association analysis so that I cannot make a recommendation at this time, but expect to be able to soon. We can talk about SNP-chips if and when you are ready to pursue them.

10/5/06 When I opeb webmatrix.exp, it immediately goes to the ARACNE module. This is confusing, because the tutorial has it go to the microarray viewer.

10/11/06 I suggest that the caption accompanying: Microarray Viewer: Filtering: Missing Value Filter, be changed from "Maximum number of Missing Arrays" tp "Remove markers that are missing in [NUMBER BOX] arrays". It would be much clearer.


10/12/06 That the deviation bound is in raw score units should be mentioned in the GUI. Perhaps, someting along the lines of "Remove markers which vary by more than [NUMBER BOX] raw scores". A filter based on standard deviation would also be useful.

10/16/06 I tried the deviation filter again today and am convinced that the standard deviation filter is a more rational way to deviation filtering than the absolute value of the deviation. This is because it is hard to specify a meaningful standard deviation cutoff for markers, because markers vary in absolute magnitude of mean and range so widely. The standard deviation measure on the other hand scales with magnitude of mean and upper and lower boundaries. It is a much more natural choice than absolute range.

4/20/07 I suggest that inthe open files menu there be an "all files option".

4/21/07 1. Put a vertical line on the left of all menu windows with arrows at the top and bottom :


/\ | | | | \/

If the user doesn't see the arrows he will know to enlarge the window. Or as Ken pointed out, when I sent him the above comment, a scroll bar.

4/25/07 At the risk of sounding picky or avisual the curved around icon doesn't immediately suggest submit to me. How about a green light with the word "Run" under it?

4/25/07 The Blast ouptut does not contain the colored bars denoting similarity that appears in the Blast web-site. Ken writes "I think those are extra services provided by the NCBI Blast website itself, they are not part of the data returned using a remote query." If this problem can be overcome it would make a big difference to users who, as a group, just LOVE those colored bars.

4/27/07 The Option "Columbia BLAST sever" didn't work. I got a "Connection Refursed" error message: I queried the user group. Ken replied that "The Columbia BLAST service is not currently operational. There is no fixed time for its reinstatement. The reason is that the Paracel BLAST machine that we used to provide that search capability failed and was retired. We do hope to set up an interface to our cluster to run those BLAST jobs at some point." I suggest that the option of the Columbia service be removed and not restored unless and until there is actually a service corresponding to it". In the meantime users can use the NCBI service either through the web, the geWorkbench interface, or the GCG Netblast program on Cancercenter, which is especially suitable for running many successive Blast jobs as part of scripts. I set up custom databases for Blast, fasta, and Smith-Waterman searches on cancercenter at user request.

5/01/07 I have it set to BLAT. The text book that appears when the mouse is over the start icon is "Start Blast search" (not blast seach). Icon

5/01/07 All of the non-blast functions (BLAT, HMM, Other Algorithms) under sequence comparison should be removed because the functionality is not available). The Columbia server option should be removed from the BLAST menu for the same reason.

5/04/07. The start (curly arrow) and stop(rectangular sign) are in different places in different menus. In pattern recognition tehy are on the top of the page. In BLAST they are at the bottom of the page. A more uniform look would be helpful.

7/12/07 1.06 SPLASH does not have a user-entered Z-value cutoff so that the user is at the mercy of the system as to how many values are displayed.

?12/07 1.06 the help pages should be searchable.

7/12/07 1.06 I suggest that there be better ways to save patterns. Either, just patterns or patterns highlighted in sequences.

Tutorials Comments

Tutorials comments go here.

The initial download should come with all of the datasets in the tutorial (the cardio set was missing when I installed) OR the tutorial should show where these can be downloaded.

3/30/06: Some mention of what the micorarray viewer does should be included in the manual - i.e that it shows a raw image of the chip.

3/30/06: What it means to merge microarray files should be stated more explicitly.

4/07/06 That the chip recognition message is only shown once should be stated. Alternatively maybe it should be shown each time - but not require an okay button.

4/10/06 How to save a merged affy dataset so that one may open it again shoudl be described more clearly. The following points (courtesy of Ken) should be mentioned (and illustated).

1. The set should be saved with an exp suffix.

2. The set can be reopened with the filter set to "Affymatrix Matrix file".

4/24/06 The tutorials comments for opening a remote site are misleading. It should state: 1. Go is clicked for getting the list of micorarray experiments. 2. "Get Bioassays" is necessary for getting a list of arrays in the

  experiment-not for retreival.

3. "open" will retrieve the selcted bioassays. I found this veyr hard to use and required correspondence with Kem and a visit from Xiaoqing in orfer to learn to use it.

4/26/06 I suggest that the tutorial not mention adding a new site for remote downlaod umtil such sites are commonly available. Otehrwise it just begs questions from the reader/

5/25/06 I suggest that the tutorial pages state to which version of geWorkbench they apply. This is implicit in the label of the window, that appears in the screenshot, but it should also be on the web-page that the user unloads.

5/25/06 I suggest that the tutorial pages be downlaodable as a pdf file.

5/25/06 I suggest that there be a public mailing list where users can be notified of updates.

5/25/06 I suggest to what the intensities and layout on the microarray viewer slide be discussed.

5/30/06 Designating a group of arrays a "case" causes the thumbtack to be labeled red. However, designating a group of arrays as the "control" does not change the color of the thumbtack. I suggest that the color of the thumbtack be changed to green to distinguish it from a group whose nature has not been demonstrated. Also, the designation "case" is used in clinical and epidemiological research. The corresponding term in laboratory research is "experiemnt".

6/13/06 It should be explained that the microarray viewer image is in probset order split across each row and is not an actual image of the slide.

6/14/06 Examples of each the different filter options should be given in the tutorial.

7/17/06 I believe that you are doing the person learning to use geWorkbench a disservice by showing the heat map instructions in the tutorial before you have shown log transformation (or at least am assuming that there is no log trandformation because some of the numbers are so high). Heat maps are most useful relative to a standard and hence this should be used as part of a didactic example in which a log2 ratio standard is used.

7/17/06 I suggest that the instructions in the tutorial for using the scatter plot graph be more detailed and step-by-step.

8/8/06 I suggest that the difference between a project and a workspace be spelled out.

8/10/06 I suggest the tutorial note explicitly the meaning of the buttons necessary to display the heat map in the micorarray display panel.

8/10/06 There should be some discussion as to where array/phenotype labels come from in the tutorial.

8/10/06 I suggest that there be a tutorial example to plotting the array with a subset of a few probesets.

8/10/06 An illustration of using reference line in the scatterplot would be helpful.

8/14/06 A discussion of the interpretaion of heat maps and volcano plots which wouldbe helpful. I would especially appreciate this in the case of volcano plots, because although I can read the axes, I don't really know how to interpret them. Also an example in which filtering by p-value and by fold change should also be given.

9/14/06 It would helpful if the installation instructions would specify that updates of Windows sometimes includes an earlier version of Java than 1.5 and hence should be checked. It should also state that installing geWorkbench followed by Java 1.5 does not work. It should be specifically stated that the Jave insatllation proceed the geWorkbench installation and that if by mistake, Java is installed after geWorkbench, geWorkbench should be uninstalled and then reinstalled in order to work.This cautionary is consistent with a recent ptoblem I have had and its solution.

9/28/06 A picture of selecting the chip-type should be included in the tutorial which deals with the uploading wo wed-matrix2.

10/5/06 A before-and-after picture of the microaarry viewer application of each filter type to the data and a discussion of the color codes and the reasons for applying the filtering would be helpful. This discussion should include screenshots of the table showing the affect of the filtering and the change in the number of genes.

10/11/06 TI suggest changing" Discards all markers that have missing measurements in at least N microarrays, where N is set by the user" to "keeps all markers that have missing measurements in N or fewer arrays where N is set by the user", in order to bring the tutorial into agreement with what the program actually does and what is stated in the actual GUI.

10/12/06 I suggest that an explict demo of the deviation filter be given in the tutorial.

10/16/06 I suggest that the explict demo of the deviation filter include before and after shots of the tabular microarray viewer and include both absolute and standard deviation options.

4/17/07 I suggest that the tutorails be available on the gforge site.

4/18/07 That retrieving the sequences for a list of markers requires expression data to bebe loaded into the system should be stated explicitly in the tutorial.

4/21/07 Frequent mention should be made in the tutorial that on 15" diagonal screens the user should make sure the menu (otions) window is tall enough to display all of the options and if not, raise the user should raise the Window until they are visible. Please see a complmentary note that I will write today in the tutorials window.

5/3/07 The distinction between Min Tokens and Density Tokens could be clearer. I read the paper so I think I get it, but there should be an example at this point.

5/4/07 Frequent references to the existence of the online help as supplementing the tutorial would be helful.

5/4/07 Create session menu for SPLASH: Tutorial should explicitly state that the port must be set to 80 and that any username will work and that no password is required. On my end a different port number has appeared and it dosn;t work with that.

6/8/07 Version 1.06. When I click "Taxonomy Tree" on the blast output, I don't get anything.

7/12/07 1.06 There should be clear examples as to the meaning of the 3 splash input parameters, with pointers to the variables in the paper. Also, there shoudl be more splash examples. The one given is a good first example, from a technical viewpoint but its biological interest is not meaningful to the user. An example such as the ones in the original splash paper and its sequels will exhibit the power fo the method better.