GeWorkbench Release 2.0

From Informatics

Jump to: navigation, search

Contents

General notes on geWorkbench release 2.0.*

Release Schedule for 2.0.0

  • geWorkbench 2.0.0 code freeze: May 24, 2010 (actual)
  • Testing concluded:
  • Final release target: June 4, 2010
  • Actual release date: June 9, 2010

Update Releases

  • geWorkbench 2.0.1 - June 25, 2010
  • geWorkgbench 2.0.2 - July 16, 2010

Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Thomas Garben
  • Tech Lead – Zhou Ji
  • Tester – Udo Többen, and the rest of the bunch
  • Test Manager – Udo Többen
  • Technical Writer – Mary VanGinhoven

Things to remember

  • Use Case documents - We would like to update Use Case documents as the underlying application changes. However, this has seldom been accomplished. At this time, the wiki-based tutorials often have the most up-to-date description of current functionality.

Known Issues in Release 2.0.0

  • InstallAnywhere and Norton Internet Security Sonar - Under Windows, InstallAnywere places a file called "Install.exe" in a folder in a path like "C:\Users\ksmith\AppData\Local\Temp\I1276186086\Windows\. This file was seen to be detected and removed by "Norton Sonar", silently terminating the install. Everything below "Temp" is removed after the installation finishes.
  • Welcome Screen - If the user already has a recent version of geWorkbench installed, and has dismissed the "Welcome" screen, it will not be shown when the new geWorkbench is run. This is because it looks in the same property file in the user's .geWorkbench folder.
  • Marker Annotations (bug #2291) - Sometimes the progress bar does not go away after all records apparently retrieved. After this, further retrievals may fail. Need to restart geWorkbench.
  • CNKB - Running CNKB followed by dispatching a hierarchical clustering grid job can produce an error.
  • Color Mosaic -
    • Macintosh: Sorting after t-test doesn't work.
    • Macintosh: After a t-test, the color mosaic itself does not appear instantly - the array names and test and control do appear, but the mosaic itself appears only if the "display" button is toggled off and then on again (Note it should be off by default, this should be checked).
  • KNN and WV - Macintosh: unidentified problems lead to error message.

New Component Detail and Dependencies pages created

Annotation Dependencies - list of dependencies of particular components on particular annotation file columns.

CNKB Data - release status and available interactions for each database.

Major Code Changes in 2.0.0

  1. Synteny (which is a discontinued component) removed from "Alignment" and made into separate component.
  2. BLAT code deleted from "Alignment". It was poorly implemented and no longer reachable.
  3. Sequence Alignment handles more databases (they were not shown in the previous versions and would not work until the recent development); more that 100 java files and jar files are removed; GUI improvement.
  4. Ontologizer 2 command-line jar file was downloaded 4/2/2010. Internal date in the jar file is 3/10/2010.
  5. MRA and t-test were previously located in analysis component. They have now been separated out into a new component.
  6. Updates and rationalization of browser launcher code and Jmol.
  7. CNKB
  8. Filters

List of changes to GUI

Changes that will require updating of tutorials, online help and system tests.

  • New Look and Feel - ""
  • Available Analyses, Normalizers and Filters now shown in pull-down menus rather than small lists. The saved parameter sets are also now in pulldown menus.
  • New result nodes each get unique names by appending sequential numbers to test name.

New components in release 2.0

  • Skyline - (PDB) - A high-throughput comparative modeling pipeline. It is used to find homology models for a protein whose structure has been experimentally determined.
  • Skybase - (FASTA) - SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures.
  • Pudge - (FASTA) - Interface to a protein structure prediction server which integrates tools used at different stages of the structural prediction process.

Other major new features in release 2.0

  • More than 250 "bug reports" were closed. These included many new features, improvements in the usability of numerous components, and actual bug fixes.
  • Java 6 - Moved from Java 5 to Java 6. geWorkbench now requires Java 6. Works on both 32 bit and 64 bit VMs (JREs).
  • Look and Feel - Switched to new, more modern Look and Feel. geWorkbench appearance now consistent across all platforms.
  • CNKB - Revamped interface to allow choice of interactome and data types.
  • File parsers - added
    • MAGE-TAB data matix
    • GEO Soft format - added series (GSE) and curated matrix (GDS). Already had series matrix format.
  • Filtering - completely revamped - now works directly for all modes, allows specification of minimum % matching arrays before filtering occurs.
  • caBIO component updated from 4.2 to 4.3.

Tutorial/Online Help chapters revised and included in release

  • Filtering - New tutorial written and ported to online help.
  • Normalization - New tutorial written and ported to online help.
  • CCM - New tutorial written and ported to online help.

List of other major changes

  • caArray - Improved memory usage on downloads from caArray.
  • CNKB - Can now return markers direct from CNKB without use of Cytoscape.
  • Color Mosaic - enhancements to display (bug 2147)
    • toggle array names on/off
    • search on array name, accession, or label
  • Component Configuration Manager - now can filter display list by categories: Analysis, Viewer, Normalizer, Filter
  • Cytoscape - Corrected mapping between gene names in Cytoscape display and markers in Marker Sets panel (now uses Entrez IDs).
  • Dendrogram - can now create Array subsets as well as marker subsets.
  • Markers and Arrays - Hover text available in Markers and Arrays phenotypes to visualize long names if needed.
  • Marker Annotation - search results can be saved to a text file, including relevant URLs and pathway BioCarta pathway names.
  • File loading - Checking for "out of memory" errors during file loading.
  • GUI - in switching to new L&F, fixed many text highlighting problems that were previously seen on Macintosh only but now appeared on Windows also.
  • File parser menu - The file parser selection menu now shows valid file extensions for each type.
  • Promoter - JASPAR promoter motifs now filterable by taxon.
  • Sequence alignment (BLAST) - many enhancements, including
    • added additional databases to match those listed at NCBI
    • improved handling of results from searches containing long query sequences.

Versions of external files/components included in this release

  • gene_ontology.1_2.obo downloaded 5/24/2010 from geneontology.org.
  • Ontologizer.jar version 2.0, file released 3/10/2010, checked no further updates as of 5/24/2010. We are using the "Command line" jar file.
    • Note - On 5/31/2010, the Ontologizer "Manual" version jar file (which has a GUI) was updated. However, the command line version was still not updated.
  • Jaspar_CORE (http://jaspar.genereg.net/) SQL files last updated on server 10/2009. (/html/DOWNLOAD/jaspar_CORE/non_redundant/all_species/sql_tables)
  • JMOL - component updated to JMOL 12 RC.10.

geWorkbench 2.0.0 Grid Service URLs

External URLs

Internal URLs

These URLs are used within geWorkbench and are not resolved

geWorkbench 2.0.1 Grid Service URLs

External URLs

Internal URLs

References

  • SkyLine Citation: http://www.ncbi.nlm.nih.gov/pubmed/17154423?dopt=Abstract
    • Mirkovi?, N., Li, Z., Parnassa, A., Murray, D. Strategies for high-throughput comparative modeling: Applications to leverage analysis in structural genomics and protein family organization. Proteins. 2007 Mar 1;66(4):766-77.
  • Article on SkyLine and SkyBase:
    • Lee Hunjoong; Li Zhaohui; Silkov Antonina; Fischer Markus; Petrey Donald; Honig Barry; Murray Diana. High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics. Journal of structural and functional genomics 2010;11(1):51-9.

geWorkbench 2.0.0 Web Service URLs

External Service Requirements and Connectivity

Component Web Service Grid Service External availability Platform restrictions
Markus yes 2.0.0: internal only; 2.0.1: external web service or grid service no 64-bit Windows, no Mac
Pudge yes no web service Mac: 64-bit (OSX 10.6+) OK, 32-bit no. Windows: 64-bit no, 32-bit OK
SkyBase no yes grid service
SkyLine no 2.0.0: internal only*; 2.0.1 external grid service no 64-bit Windows
  • Markus and Skyline now available via grid service externally using code in 2.0.1, but not in 2.0.0.

List of Included Components

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • Preferences
  • Project Panel
  • Session manager - no one knows what this is - probably a SOAP interface. But it is definitely needed!

File input formats:

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • PDB Structure Format
  • Tab-delimited (RMA Express Format)

Connectivity

  • caArray2 - updated to support caArray 2.3.0 in release 1.8.0 (released September 2009). The caArray client jar is NOT backwards-compatible with any previous versions.

Data filters:

  • Filtering
  • Affy Detection Call Filter
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter

Normalization:

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values (Normalizer)
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information:


Analyis/Visualization

  • Alignment Results
  • Analysis
  • ANOVA
  • ARACNe2 - adds Adaptive Partitioning algorithm and Preprocessing mode.
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • Cancer Gene Index integration in the Marker Annotations component.
  • CELImageViewer
  • Cellular Networks Knowledge Base
  • Color Mosaic
  • Component Configuration Manager.
  • Cytoscape_V2_4 - updated version of Cytoscape.
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Gene Ontology Enrichment Analysis and Display
  • Hierarchical Clustering Analysis
  • genSpace collaborative framework
  • Image Viewer
  • Jmol
  • Marker Annotations
  • MarkUs - Analysis and Viewer
  • MRA - Master Regulator Analysis
  • MatrixREDUCE
  • Microarray Viewer
  • MINDy - Analysis and Viewer
  • Pattern Discovery
  • Position Histogram
  • Pudge?? - Analysis and Viewer (Browser) - if this is working (Kiran?) we should include. We can create a very simple online help file, essentially pointing to the Pudge documentation at the Honig site (Aris).
  • Promoter
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot
  • GenePattern components
    • PCA (GenePattern) - Analysis and Viewer
    • K-nearest neighbors (GenePattern)
    • SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris).
    • WV - Weighted Voting (GenePattern)

Excluded and Dropped Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

Excluded components

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Still under development:

  • Cancer-GEMS (awaiting further development from NCI)
  • NetBoost
    • EdgeListFileFormat (NetBoost)
  • Evidence Integration
  • MEDUSA

Not actively being developed:

  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • GSEA
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • SMLR - Sparse Multinomial Logistic Regression - implementation by John Watkinson.
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers) (left over from a John Watkinson project).
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • caScript


Dropped components

These components are not expected to be used again.

  • CuteNet (GeneWays)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • GeneOntology (the original component, now replaced by geneontology2/Ontologizer2.0)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • Pattern Discovery Algorithm (association analysis)
  • Patterns (Pattern Panel) - Omit from release - Appears to have been superseded by the Sequence component.
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • Simulation (a student project)


Note - the original "interactions" component was dropped and reimplemented as the Cellular Networks Knowledge Base. It took a brief detour as being called component "interactions2".

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Analysis components - external runtime dependencies

component local external type username/password relay servlet known to work outside campus
ANOVA yes grid grid_default no ?
ARACNe yes grid grid_default no ?
CNKB no servlet some open data yes  ?
MINDy yes grid grid_default no ?
GenSpace local grid genSpace account no ?
Hierarchical Clustering yes grid grid_default no  ?
KNN no GenePattern ??? no ?
MarkUs no grid open no ?
MRA local no - no not applicable
MatrixREDUCE local grid grid_default no  ?
PCA no GenePattern ??? no  ?
PUDGE no web open no  ?
SkyLine no grid grid_default no  ?
SkyBase no grid grid_default no  ?
SOM yes grid grid_default no  ?
SVM no GenePattern ??? no  ?
WV no GenePattern ??? no  ?

TODO Notes

Done

  1. Release Process
    1. System testing should be done on Installer-built releases if practical. At least the installer version needs to be carefully tested early on, not only just at the end.
  2. Release files
    1. Release Notes and license were not included in 1.8.0 release. Add. - Done.
    2. Release Notes - need better instructions on Java requirements. - Done. Online version on installation page even better.
    3. Cardiogenomics MAS5 files were omitted from release 1.8.0. Add. - Done.
  3. Filtering
    1. Add to documentation that filtering does not respect marker set activation - it always works on all markers.
    2. The reference for Quantile Normalization (Bolstad 2003) was added to the online help. It, and any other needed references, should be added to the very light normalization tutorial. (Done - new tutorial/online help written).
  4. Analysis
    1. MINDy - should gray out entirely the MINDy unconditional calculation as changing the settings has no effect. (Done). The tutorial states this but could be made more clear.
  5. Normalization Panel - rename to "Normalization" in CCM.
  6. Tutorials
    1. Improve documentation on handling of Marker and Array sets (bug 1687) - Done post-release.
    2. Make sure each file type is fully described.
    3. Project Folders - the File Open list of file types is now alphabetical. If any tutorial / help page depicts this, it should be updated. Done - tutorial updated post-release.
    4. Normalization - two of the components are missing online help - array based centering and mean-variance normalizer - all screenshots are bad. http://wiki.c2b2.columbia.edu/mantis/view.php?id=1948 - Done - Normalization tutorial completely rewritten and ported to online help.
  7. Online Help
    1. CNKB – full update needed - done in 2.0.1.
    2. MINDy – full update done in 2.0.2.

Items deferred to a future release

  1. Califano lab enhancements
    1. wiki page needs an Evidence Integration page.
    2. Califano lab is using old AMDeC website still for Bcell interactome. Should be moved to their Wiki.

Not finished for release 2.0.0

  1. Ontologzier 2.0 - update license to include Ontologizer 2.0 BSD license terms. - done? Mention License but not terms....
  2. MATKC
    1. Update geWorkbench Roadmap periodically.
  3. ARACNe
    1. Add "pro" tips on ARACNe usage from Manjunath.
  4. Tutorials
    1. The caArray tutorial needs to be transferred to online help. It is currently under Project Panel -> Open Dataset
    2. Document how missing values are detected, stored and represented.
    3. Pudge tutorial needs to be more extensive.
    4. Update Grid Services screenshot?
    5. Note somewhere that if a subset of Markers has been activated, but is not visible (because Arrays is on top) the user may forget about the activated markers and make a mistake.
  5. Manual
    1. Update Pattern Discovery page in Manual based on revised tutorial.
    2. Add section on Marker and Array sets in chapter 3?
  6. Grid Services
    1. Expose current grid services; right now we are still only exposing geWorkbench 1.5 services.
  7. Promoter
    1. Matching algorithm needs to be given a statistical basis.
    2. More recent promoter set available, e.g. 14K set in Elkon paper.
    3. Document how upstream/downstream indications are used on display. Where do they come from, are they used correctly?
  8. Analysis Panel -> Analysis (Done).
    1. Can we correlate the various licenses to particular components. Should list in CCM?
  9. Properties files
    1. Document that all recent versions of geWorkbench use the same properties files. However, this itself can be changed.
    2. Document how the properties files work.
  10. Sequence Retriever
    1. How can we use sequence retriever outside of the context of a microarray dataset. Can we add ability to query genes by name directly, outside of microarray context?
  11. GEO Soft parsers
    1. Document valid Affy column headers: 'ABS_CALL' or 'DETECTION_CALL' and 'DETECTION_P' or 'DETECTION P-VALUE'
  12. Known java problems:
    1. EDT exception - due to background threads trying to alter GUI. See bug 2224.
  13. Hierarchical Clustering - Euclidean distance metric. - Document that if Euclidean metric is use, the data should be normalized first. See bug 148.
  14. MeV - need a list of components that came from MEV code.
    1. t-test
  15. caArray/Marker Annotations - bug 1956 contains comments about different ways to set gene names in caArray. Needs to be looked at again.

Release 1.8.0 TODO notes carried over

  1. ARACNe Grid - need to verify that server-side implementation includes Bcell-100 parameter files.
  2. ARACNe/MINDy Need to check on migration to new Califano lab page on Wiki.
  3. caArray - Our local caArray is at afapp1.c2b2.columbia.edu port 38080 (web interface).
  4. Color Mosaic - All Markers and All Arrays checkboxes appear to be disabled - oh this is only for ANOVA display.
  5. Hierarchical Clustering - When I do hierarchical clustering, the arrays are shown ordered by the array sets activated, rather than the original order of the arrays in the dataset. Need to confirm that the labels and arrays are really staying together correctly when resorted.
  6. MatrixREDUCE shown to work on Windows but not clear if it works on Linux.

Release 1.7.0 TODO notes carried over

  1. Add MatrixReduce data to tutorial dataset. (not done in 1.7.0)
  2. Remove unneeded data from tutorial download. (not done in 1.7.0)
  3. Was problem with file save corruption fixed? It affected writing out files that had been read in in EXP (matrix) format. (think so but need to verify)
  4. Include a list of HG-U95 and HG-U133 transcription factors in tutorial data download or with distribution (see Nature Protocols paper). (not done in 1.7.0?)

For next time

  1. ANOVA - Need to pin down exact details on algorithms - Adjusted Bonferroni, Westfall-Young, and how to explain the interpretation of the alpha value in FDR - is it the confidence in the FDR as you sometimes see mentioned? Is the reported p-value (e.g. Bonferroni) corrected or uncorrected? Check code for details.

Documentation changes

Changes included in release 2.0.2 Online Help

These changes made to Wiki and transferred to Online Help.

  1. MINDy - The wiki tutorial was completely rewritten with new screenshots to match all the changes (most were made in release 1.8).

Changes included in release 2.0.1 Online Help

These changes made to Wiki and then transferred to Online Help.

  1. CNKB - material completely revised to reflect changes to component (multiple interactomes, choices of interaction types etc.).

Changes included in release 2.0.0 Online Help

These changes made to Wiki and then transferred to Online Help.

  1. CCM – update needed. Buttons have changed, L&F and highlighting has changed. Buttons removed. Complete rewrite ported from new Tutorial.
  2. Filtering - Complete rewrite ported from new Tutorial, based on new implementation.
  3. Normalization - Complete rewrite ported from new Tutorial..

Release 2.0 new material - didn't get into release Online Help

  1. MatrixREDUCE does not work for specific combinations of machines and options. This should be noted in documentation, as no solution has yet been found. The specific problems are detailed in Mantis bug #1555 "MatrixReduce cannot run":
    1. MatrixReduce can run on Mac/Linux only when Parameter Topological Pattern is set to "Load from file".
    2. MatrixReduce runs on PC either under "Load from file" or "Specify Pattern".
  2. 32 and 64 bit problems.
    1. Pudge - bug #2136 - Pudge browser can run on 64-bit mac (mac osx 10.6), but not on 32-bit mac. Tests on macs dar1 and common1 run well. Pudge does not run on 64 bit windows. Note - need to test if it will work on 32-bit JRE on 64-bit windows.
    2. Markus browser - bug #2136 - Markus browser cannot run on macs until the applet loading problem is solved. Does not run on 64 bit windows.
  3. Online Help changes needed(from system test)
    1. Promoter
      1. update URL: The following sentence is INCORRECT: The datafile used "MATRIX_DATA.txt" can be found at http://jaspar.genereg.net/html/DOWNLOAD/mySQL/JASPAR_CORE_2008/.
      2. update screenshot of Parameters tab. Sequence tab screenshot mis-sized.
      3. Remove external links.
    2. Pattern Discovery – out of date, must be replaced. (DONE in release 2.1.0)
    3. Project Panel – "open dataset" needs updating for file names (tab delimited)
    4. Pudge - incomplete
    5. Sequence Retriever – hey what is the deal with the blue markers in the first picture?
    6. T-test – external links are shown, should not be. Same for volcano plot.

Changes to Wiki tutorials subsequent to 1.8.0 release

The relevant Online Help pages will need to be updated.

  1. Pattern Discovery tutorial completely rewritten. Switched from a DNA example back to a protein (histone) example.
  2. MINDy - advanced params screenshot and text updated. (Done - New MINDy tutorial in 2.0.2)

Completed changes to Wiki tutorials subsequent to 1.7.0 release

The relevant Online Help pages will need to be updated.


  1. Color Mosaic tutorial added, starting with material in User Manual.
  2. Cytoscape
    1. tutorial was updated to describe network create/destroy right-click menu commands and how clicking on an adjacency matrix in the Project Folders component recreates the network. mantis bug 1770.
    2. bugs 1728, 1743 and 1752 - a description of set operations and how the may result in unexpected results due to the many-to-many relationships of markers and genes was added.
  3. Grid Services - Added a "Services" section to each analysis component tutorial for which a grid service exists (except Pudge).
  4. Hierarchical Clustering - A completely new tutorial on Hierarchical Clustering was written, starting from the ANOVA tutorial result.
  5. MatrixREDUCE - tutorial was updated after the 1.7 release - may need to update online help.
  6. SOM - The SOM entry (previously part of the Clustering entry) was completely rewritten, including detailed descriptions of the parameters taken from the online-help. The SOM example also starts with the ANOVA result.
  7. Analysis - There is a new section in the tutorials for Analysis which has no matching Online Help chapter. It describes e.g. the way saved parameters are highlighted if matched.

Needed changes to Tutorials

Cummulative list starting with Release 1.7....

  1. Color Mosaic
    1. Don't know what Sort is supposed to do, and
    2. Export does not appear to work. Does not work from ANOVA, and
    3. Image Snapshot does not seem to work from main Color Mosaic but does work if displaying from ANOVA.
  2. EVD - What is the EVD t-test used for/ how is it used? A histogram of t-test statistics? (not done in 1.7.0)
  3. Gene Pattern components need tutorials/Online Help??:
    1. Need to document server settings to use GenePattern modules. Our local GenePattern server is afdev2.c2b2.columbia.edu port 9999.
    2. PCA (GenePattern) - Analysis and Viewer
    3. K-nearest neighbors (GenePattern)
    4. SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris). http://wiki.c2b2.columbia.edu/mantis/view.php?id=474
    5. WV - Weighted Voting (GenePattern)
  4. Grid Services
    1. Add detail to tutorial about how caGrid v1.3 uses caTransfer?
    2. Verify that each component offering a grid service has documentation.
    3. Find out and explain how our grid services handle multiple requests e.g. to ARACNe grid service - all run at once, in separate processes?
    4. Explain exactly what is sent to grid - only selected data, or all data with a map? (not done in 1.7.0)
  5. Hierarchical Clustering
    1. Need a more top-level description of the Dendrogram component.
    2. when "Average" linkage is selected, MEV uses a "weighted" average, which reduces the weights of more distant nodes. Does geWorkbench implement any such refinement?
    3. MEV can give priority to markers or arrays (?) when drawing the clusters.
  6. Marker Selection in some components - The way marker filtering is done has changed to use a built in set selection feature in the MINDy viewer, rather than using activated marker sets. See bug 1673.
  7. Markus
    1. Make sure any tutorials include the final URL.
    2. tutorial is still rough. Structure should be brought more into line with others.
    3. to actually run Mark-Us, one needs a grid password. What is our policy on this?
  8. Position Histogram - There is no online documentation. How does it align sequences? (not done in 1.7.0)
  9. Pudge - did not add a services section to Pudge tutorial because not sure if it is actually used/available.
  10. Scatter plotThere seems to be no tutorial. Online Help exists but needs to be updated to mention the enhanced "tooltip" spot identification added in release 1.7.0.
    1. http://wiki.c2b2.columbia.edu/mantis/view.php?id=1782
    2. Details: A feature had been added to Scatter Plot to allow overlapping points to each display a tooltip. This did not work if many points were overlapping, or if there were too many points in the dataset being compared. If more than 100 points are being compared in the plot, the enhanced tooltip feature is turned off, and only one point will show a tooltip for a given location.
  11. Sequence (Viewer)- tutorial needed.
  12. SOM - The following questions are outstanding on SOM tutorials:
    1. Where did the statement about data for SOM needing to be normalized come from? Is it true?
    2. The formal definition of SOM says dimensionality where it may mean something like "dimensionality N".
    3. The online help mentions neuron and initial coordinates, but now only one set is displayed. Which is it?

Tutorial and Help change status table

This table currently just copied from release 1.8.0. Not yet updated for release 2.0.

component Tutorial Online-Help in synch further changes needed assigned to
Analysis yes no no no Ken
ANOVA yes yes yes no
ARACNe yes yes yes no
BLAST/Seq. Align yes yes yes no
caArray "remote data sources (caArray)" "Project Panel - Remote Data Source"  ?
CEL imager in Viewing a Microarray yes ?  ?
Cellular Network KB yes yes yes no
Clustering (SOM and HC) individual yes no  ? Ken
Color Mosaic yes yes no yes assign
Component Configuration Manager yes yes  ?  ?
Cytoscape yes yes yes no
Dataset Annotation in "Project Details" "Comments" need to synch up names
Dataset History in "Project Details" "History Panel" need to synch up names
Expression Profiles no yes no  ?
Expression Value Distribution yes yes  ?  ?
Experiment Information in "Project Details" yes no  ?
Filtering yes yes  ?  ?
Gene Ontology no no - create Ken
Gene Pattern Components Classification - KNN and WV  ? no  ?
genSpace no yes no  ?
Hierarchical Clustering yes yes no yes Ken
JMol yes yes  ? ?
Marker Annotations yes yes yes yes Ken
Markers/Phenotypes/Arrays ? yes  ? yes
Mark-Us yes yes  ? yes Ken
Master Regulator Analysis yes yes yes no
MatrixReduce yes yes no no
Menu
Microarray Viewer yes - see Viewing a microarray dataset yes  ?  ?
MINDy yes yes  ? Yes Ken
Normalizers yes yes no yes Aris
Online Help
Pattern Discovery yes yes  ? yes
Position Histogram no yes  ?  ?
Preferences ?  ?
Principal Component Analysis no no is this gene pattern?
Project Folders "Projects and Data Files" "Project Panel"  ? yes
Promoter yes yes yes no
Pudge yes yes looks yes  ?
Scatter Plot no yes no yes
Self Organizing Maps yes yes no no Ken
Sequence Panel no no assign
Sequence Retriever yes yes  ?
Services (Grid) yes no no yes Ken
t-Test yes yes  ? yes Ken
Tabular Microarray Viewer "Viewing a Microarray Dataset" yes
Personal tools