GeWorkbench TODO List

From Informatics

Jump to: navigation, search


This page's initial contents were moved from the geWorkbench 2.2.0 release page.


Contents

geWorkbench 2.5.0 planning list

SkyBase

The lab uses the below interface to create structural models directly from sequences. The pre-built models are no longer needed, they were originally created to demonstrate the range of the model building capability.

http://skybase.c2b2.columbia.edu/pdb60_new/nesg.php#

Documentation wish list

  • In addition to all the outstanding documentation requests,
  • Add a page documenting the format and contents, as relevant to geWorkbench, of the Affy annotation file. E.g., specify the names of the required column headers and the format of data those columns contain. This would allow people to more easily supply their own custom annotation files for geWorkbench.
  • Also, we could list the dependence of individual geWorkbench components on certain annotation columns, e.g. Entrez ID, Swiss prot id etc...
  • Find out why ARACNe only prints "10% done" to console before finishing....
  • How are null values in the data stored and how are they displayed? What do we want?
  • What is the real use of "Only add aligned parts" in the BLAST viewer. This concatenates non-contiguous sequence segments together into one long sequence...
  • Characterize how large a dataset can be loaded in a typical geWorkbench configuration. How does memory use scale with data and annotation file size?
  • Can we increase Hierarchical Clustering number of markers limit with more memory, 64-bit? Or is it just the number of calculations that becomes limiting?

List from 2.3.0 Release

  • Should change name of "alignment" component to "blast".
  • Need better Hierarchical Clustering and SOM examples and screenshots.
  • Cytoscape tutorial - does not cover the new buttons added with Cytoscape 2.8.0, included in geworkbench 2.3.0. release.
  • The following viewers are covered in the tutorial "Viewing a Microarray Dataset" but should be broken out into separate chapters (perhaps) and used to replace older Help chapters:
    • Expression Profiles
    • Microarray Viewer
    • Scatter Plot - more material in old Help than in Wiki - synch up.
    • Tabular View
  • The following Help sections need updating.
    • Marker/Array Help is still old material, need to port from Markers and Arrays tutorials. The reason not done is that the existing Help chapter is much more concise.
    • Sequences - covered also in sequence retriever.
    • Sequence Retriever - also has material in Sequences!
    • Position Histogram - covered in Pattern Discovery. Make new chapter?
  • More on Help
    • "Plots" has scatter plot and volcano plot.
    • check if Promoter help in synch with updated tutorial.
    • disentangle sequences and sequence retriever.
    • are cel processsing files out of color mosaic now?


  • No Help Available yet
    • Grid Services
    • IDEA
    • SkyBase
    • SkyLine
    • GP Module components
    • Volcano Plot

Other suggestions

  • remove ".txt" from network load filters? It loads as .adj, is there for historical reasons.
  • remove "file" from "Files of Type" entries. (?).
  • The t-test permutation code has been changed in release 2.3.0 and should be tested further.

Mention somewhere

  • for large data sets, disable EVD in CCM
  • Cytoscape is loaded by default to avoid a window problem if loaded later.

Array merging and annotation files -

  • The annotation file of the first file loaded is used. That is, if the first file has an annotation file loaded, it will be used for all merged files. If the first array file is loaded without an annotation file, then the merged dataset will not have annotation, even if it is loaded with the second or later files to be merged.

For 2.4.0

  • Release Cupid
  • Separate Markers and Arrays (remove tabs). Both selectors should be visible at all times. Don't need to see search all the time.
  • Fix ARACNe results object
  • Continue network representation design.
  • Need a network compare option.
  • Document, for network loaded, what symbol type chooser exactly does.
  • Bug #1463, something about repeat result node numbering still needs documentation.

TODO Notes from release 1.8.0 wiki

This list will be reduced to the items left open after the 1.8. release that have still not been dealt with.

Release 1.7.0 TODO notes carried over

  1. Add MatrixReduce data to tutorial dataset. (not done in 1.7.0)
  2. Remove unneeded data from tutorial download. (not done in 1.7.0)
  3. Was problem with file save corruption fixed? It affected writing out files that had been read in in EXP (matrix) format. (think so but need to verify)
  4. Include a list of HG-U95 and HG-U133 transcription factors in tutorial data download or with distribution (see Nature Protocols paper). (not done in 1.7.0?)

Release 1.8.0 TODO notes

  1. ARACNe Grid - need to verify that server-side implementation includes Bcell-100 parameter files.
  2. Hierarchical Clustering - When I do hierarchical clustering, the arrays are shown ordered by the array sets activated, rather than the original order of the arrays in the dataset. Need to confirm that the labels and arrays are really staying together correctly when resorted.
  3. MatrixREDUCE shown to work on Windows but not clear if it works on Linux.

For next time

  1. ANOVA - Need to pin down exact details on algorithms - Adjusted Bonferroni, Westfall-Young, and how to explain the interpretation of the alpha value in FDR - is it the confidence in the FDR as you sometimes see mentioned? Is the reported p-value (e.g. Bonferroni) corrected or uncorrected? Check code for details.


Needed changes to Tutorials

Some of these were carried over from the release 1.7.0 Wiki, while others are from 1.8.0.

  1. Expression Profiles - tutorial needed.
  2. EVD - What is the EVD t-test used for/ how is it used? A histogram of t-test statistics? (not done in 1.7.0)
  3. Gene Pattern components need tutorials/Online Help??:
    1. Need to document server settings to use GenePattern modules. Our local GenePattern server is afdev2.c2b2.columbia.edu port 9999.
    2. PCA (GenePattern) - Analysis and Viewer
    3. K-nearest neighbors (GenePattern)
    4. WV - Weighted Voting (GenePattern)
  4. Grid Services
    1. Add detail to tutorial about how caGrid v1.3 uses caTransfer?
    2. Verify that each component offering a grid service has documentation.
    3. Find out and explain how our grid services handle multiple requests e.g. to ARACNe grid service - all run at once, in separate processes?
    4. Explain exactly what is sent to grid - only selected data, or all data with a map? (not done in 1.7.0)
  5. Hierarchical Clustering
    1. Need a more top-level description of the Dendrogram component.
    2. when "Average" linkage is selected, MEV uses a "weighted" average, which reduces the weights of more distant nodes. Does geWorkbench implement any such refinement?
    3. MEV can give priority to markers or arrays (?) when drawing the clusters.
  6. Marker Annotations
    1. add more documentation to Tutorial/Online Help about the caBIO PID sources used.
  7. Project Folders
    1. the File Open list of file types is now alphabetical. If any tutorial / help page depicts this, it should be updated. UPDATE - there is no view of the file types currently depicted....
  8. Scatter plotThere seems to be no tutorial. Online Help exists but needs to be updated to mention the enhanced "tooltip" spot identification added in release 1.7.0. UPDATE - Actually, Scatter Plot is covered in "Viewing a microarray dataset". Could be better.
    1. http://wiki.c2b2.columbia.edu/mantis/view.php?id=1782
    2. Details: A feature had been added to Scatter Plot to allow overlapping points to each display a tooltip. This did not work if many points were overlapping, or if there were too many points in the dataset being compared. If more than 100 points are being compared in the plot, the enhanced tooltip feature is turned off, and only one point will show a tooltip for a given location.
  9. Sequence (Viewer)- tutorial needed.
  10. SOM - The following questions are outstanding on SOM tutorials:
    1. Where did the statement about data for SOM needing to be normalized come from? Is it true?
    2. The formal definition of SOM says dimensionality where it may mean something like "dimensionality N".
    3. The online help mentions neuron and initial coordinates, but now only one set is displayed. Which is it?
Personal tools