GeWorkbench Release 1.6

From Informatics

Jump to: navigation, search

Contents


Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Zhou
  • Tech Lead – Kiran Keshav
  • Tester – Bernd Jagla, and the rest of the bunch
  • Test Manager – Bernd Jagla
  • Technical Writer – Mary VanGinhoven


Final Release Date

geWorkbench v1.6 was released on October 24, 2008. Builds for Windows, Linux and MacOSX were created using InstallAnywhere2008. In addition, a generic ZIP file was created which can be used on any platform. The installation packages are available at:

https://gforge.nci.nih.gov/frs/?group_id=78

Subsequent point releases

Subsequent releases in the 1.6 series were:

geWorkbench v1.6.1 November 7th, 2008.

geWorkbench v1.6.2 November 14th, 2008.

geWorkbench v1.6.3 January 8th, 2009.

Fixed in this release

There were 99 bug fixes reported in release 1.6.0. Some of the highlights are listed below.


  • Fixed a problem (caused by a change in a server-side URL) with retrieving annotations for genes in Biocarta pathway diagrams ( bug 1577).
  • The default caArray server was set to the production server at NCI (array.nci.nih.gov, port 8080) (bug 1602). The URL for the staging array was updated to array-stage.nci.nih.gov.
  • An incorrect argument was being sent to NCBI's BLAST server. Due to recent changes there implementing stricter checking, this was caught and blastn would no longer run. The argument was COMPOSITION_BASED_STATISTICS, and would previously just been ignored(bug 1597). That option, according to the NCBI Blast error message, should only be used for blastp or tblastn. (bug 1597).
  • Corrected a problem where, when using the adjusted Bonferroni correction, or the Westphal-Young with MaxT, only values with positive fold-changes were returned and displayed (bug 1603).
  • Added a feature whereby the user is warned before any operation that will alter the dataset, e.g. before filtering out markers, or before a log2 transformation.
  • Added a feature to allow adding a new empty marker set. This can then be used to receive markers selected interactively in Cytoscape (bug 1541).
  • Fixed a problem displaying patterns in the sequence viewer after running Pattern Discovery (SPLASH) (bug 1415).
  • Fixed a problem with displaying adjacency matrices generated by ARACNE in the Cytoscape component (bug 1449).


  • Numerous changes were made to improve responsiveness, including when
    • selecting a marker in a large dataset (bug 1346),
    • right-clicking on Project with a large dataset (bug 1337),
    • saving a workspace (bug 1525), and
    • starting an analysis (bug 1544).

The remaining bugs, not listed here in detail, were primarily internal issues within geWorkbench, verification of parameters and set selections before beginning a calculation, improvements to the GUI, and corrections to the grid implementations of analytical services (Hierarchical Clustering, SOM, ANOVA etc).

Major outstanding issues

  • CNKB has a hard-coded URL only reachable inside C2B2. A servlet mechanism is being developed to provide indirect access from outside to the database (released in v 1.6.1).
  • How to handle log-normalization of data for volcano plot [outstanding].
  • t-test p-value/t statistic display in color mosaic component [no changes].
  • getting rid of caBIO jar dependencies??? [outstanding]
  • update spreadsheet of modules and their documentation and system test status etc....[done]
  • Documentation/Tutorials outstanding for existing modules...[some new added]
  • caArray download needs annotation file....[outstanding]
  • Sequence ambiguity codes [outstanding]
  • caGrid 1.2 migration of grid services [outstanding]
  • Netboost? [Not included this release]
  • GO term component to use single GO file [not done: GO term component withdrawn]

Wish list - now or future versions

  • Copy function for Marker/Array sets. Especially for marker sets, as these may be formed by double-clicking markers into the set, but then can't preserve them into a named set...

Major Changes

  • GO Terms component - withdrawn from version 1.6. To be redesigned and included in next release.
  • Added Mindy component.
  • Color Mosaic - new right-click actions

List of Included Components

<Comment>Every included component should have a dependency sheet listing any external files, executables etc. that are required for it to function, and their expected location (geWorkbench root, data etc).</Comment>

A spreadsheet File:GeWorkbench1.6-component status.xls (NOTE this file is not the latest - see Sharepoint) showing detailed release status as of version 1.6 will be available here and on Sharepoint under Release Process.

For modules dependencies, please see Additional necessary files included in distribution.


New Modules

  • Mindy

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • preferences
  • Project Panel
  • Session Mgr

File input filters:

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • RMA Express Format

Connectivity

  • caArray v2.1 - download data from caArray version 2.1.x


Data filters:

  • Filtering
  • Affy Detection Call Filter
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter
  • PDB Structure Format

Normalization:

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information:


Analyis/Visualization

  • Alignment Results
  • Analysis
  • ANOVA
  • ARACNE
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • CELImageViewer
  • Cellular Networks Knowledge Base
  • Color Mosaic
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Fast Hierarchical Clustering Analysis
  • Gene Ontology
  • Image Viewer
  • Jmol
  • Marker Annotations
  • MatrixREDUCE
  • Microarray Viewer
  • Mindy
  • Pattern Discovery
  • Patterns (Pattern Panel)
  • Position Histogram
  • Promoter
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • SPLASH Patterns
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot
  • GenePattern components
    • PCA
    • Weighted Voting
    • K-nearest neighbors

Excluded Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

  • GO Terms - removed in version 1.6, to be redesigned and reintroduced in the next release.
  • Master Regulator Analysis (MRA) - under development.
  • Cancer-GEMS (awaiting further development from NCI)
  • Cytoscape_V2_4 (still some problems)
  • NetBoost
  • EdgeListFileFormat (NetBoost)
  • MEDUSA
  • SkyLine
  • GeneWays
  • Evidence Integration
  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • Simulation (a student project)

In addition, the following are excluded:

  • \geworkbench\lib\Simulation_libs
  • \geworkbench\lib\caArrayMageom

Dropped components

These components are not expected to be used again.

  • Pattern Discovery Algorithm (association analysis)
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • Interactions (early version of CNKB)

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Functionality Modifications

CVS Tag Info

  • geworkbench_1_6
  • geworkbench_1_6_1
  • geworkbench_1_6_2f (there were CVS problems, do not use any other tag for release 1.6.2)

Building the Application

Check out the new geworkbench_1_6 branch to a new directory.

For Testing

The following steps should be followed to set up geWorkbench for testing.

Release Engineer:

  • The file conf\all_release.xml needs to be updated to include the components that are part of the release.
  • Verify that the components are listed in the correct order in all_release.xml. Remember that there is a conflict between Genspace and another component whose resolution depends on the correct order. (Include details here....)
  • The target createDist within the file build.xml needs to be updated so that only components that are part of the release are copied into the file ..\cleanFolder.

Testers

After the release engineer has properly configured the two files all_release.xml and build.xml (above), the testers should do the following:

  1. Check out the release branch/tag from CVS into a new directory.
  2. Change to the new directory and run “ant createDist”. This step will create a folder named "cleanFolder" at the same level as the directory where the CVS code was extracted into. It will put into cleanFolder a new (simple) build.xml designed for running the application in test mode. Only the all_release.xml configuration file will be included in cleanFolder/conf/. (In case of doubt perform "ant clean" before the "ant createDist")
  3. Change directory to ..\cleanFolder and start the app there by running “ant run”. The application will use the all_release.xml to load components.

Release-specific versions of system tests are stored in Sharepoint: https://sharepoint.c2b2.columbia.edu/c2b2/Testing/

Procedures for running the system tests are found on the Wiki: http://wiki.c2b2.columbia.edu/informatics/index.php/System_tests#Best_practices_for_System_tests

Also, if a script fails and you believe it is a defect in geWorkbench, please check if the defect is already described in Mantis

If you believe there is a defect in the System Test please send e-mail to the test lead for further investigation.

Some of the System Tests may need to be updated due to changes in the GUI of geWorkbench since the previous release. Please make note of such cases and send an e-mail to the System Test lead.

For Release

  • The Release Engineer should update the date in the "version info" to the actual build date, and make sure the version number is correct.
  • To create a final distribution folder go to the new directory where the CVS code was extracted and run "ant createCleanDist". This task will clean and rebuild the application into cleanFolder.

System Testing

Table with assigned system tests. The name of the file (word document), the assigned tester, the relative location on share point and the names of the data files are given.

System test Assigned tester location (relative to link estimated date of completion
AnovaMinmicroarrays\Analysis\anova
AracneMarymicroarrays\Analysis\aracne
House keeping gene normalizerChristinemicroarrays\Normalization\house keeping gene normalizer
Log2 transformArismicroarrays\Normalization\Log2 transformation
scatter plotBerndmicroarrays\scatter plot
pattern discoveryBerndpattern discovery
SOMBerndmicroarrays\Analysis\SOM
missing value Berndmicroarrays\Normalization\missing value computations
2 channel threshold filterMichael\microarrays\filtering\2 channel threshold filter
Dataset annotationsChristineGeneral\Dataset annotations
T-testChristinemicroarrays\Analysis\t-test
Affy detection filterChristinemicroarrays\filtering\Affy detection call filter
MatrixReduceMichaelmicroarrays\Analysis\matrix reduce
Marker based centeringMichaelmicroarrays\Normalization\Marker based centering
Color mosaicMichaelmicroarrays\color mosaic
expression profilesMarkmicroarrays\expression profiles
Hierarchical clusteringMarkmicroarrays\Analysis\Hierarchical clustering
MindyMarkmicroarrays\Analysis\MINDY
Array based centeringMarkmicroarrays\Normalization\array based centering
deviation filterKenmicroarrays\filtering\deviation filter
Gene ontologyKenmicroarrays\Gene Ontology
Tabular microarray viewerKenmicroarrays\Tabular Microarray Viewer
BLASTKensequences\analysis area\alignment\BLAST
PreferencesMinGeneral\Preferences
Genepix flags filterMinmicroarrays\filtering\Genepix flags filter
sequence retrieverMinmicroarrays\sequence retriever
Marker setsMaryGeneral\Selection
Expression theshold filterMarymicroarrays\filtering\Expression threshold filter
Promoter panelMarysequences\visual area\Promoter
File formatsZhouGeneral\menu\File
Microarray viewerZhoumicroarrays\Microarray Viewer
Mean Variance normalizerZhoumicroarrays\Normalization\mean variance normalizer
PCAPavelmicroarrays\Analysis\PCA
caArrayPavelGeneral\menu\File\caarray
Cell imagerPavelmicroarrays\CEL imager
Quantile normalizerArismicroarrays\Normalization\quantile normalization
Cellular Network Knowledge baseArismicroarrays\Cellular Network KB
Marker AnnotationsArismicroarrays\marker annotations



For results, see http://afdev/systemtest/BrowseLogs.php

Release

Date

geWorkbench 1.6 was released on October 24, 2008

Lessons Learned

System Test Scripts

Even more time is needed to provide accurate system tests. There have been a lot of GUI changes due mostly to customer requests (normalization, filtering procedures) that could be updated before the release.

System Test Process

  • Improvements on the status page are needed to reflect changes made to the system tests. I need to be able to update/add comments to the annotations of the individual system tester. Since the system test "can" have flaws, I usually go through the comments on the status page to verify that the given comment is about a defect in geWorkbench or a defect in the system test.
  • There were problems with naming of system tests: Currently the file name is being used to store in the database to link to component tested. Users have been renaming the files and therefore we got multiple entries for a given component. This causes problems...

Somehow the filename should not be used but rather a variable set in the system test itself. This requires major changes.

  • We need to verify that all components are listed in the system test results page after the system version is created.

Release Build Process

Build scripts were better automated to:

  1. create final distribution files for each platform that have the proper name, e.g. geWorkbench_v1.6.0_Windows_installer_with_JRE1.5.exe


For Windows and Macintosh, only distributions including the JRE 1.5 were distributed. There are observed problems with Java 1.6.

  1. caArray connectivity via the Java API does not work under JRE 1.6.
  2. geWorkbench occasionall freezes up at apparently random moments under JRE 1.6.

GUI/Functionality changes

Grid Server and urls corresponding to this release

Other post-release Notes/Suggestions

  • It was noted that in the Windows control panel Add/Remove Programs component, the program was registered just as "geWorkbench". Thus if more than one version is installed it would be hard to tell which one is the one you want to remove. The name registered in the program manager should reflect the version number. (fixed in v1.6.1).
  • Same comment for the program location/name and the program group name. These should include the version number if they do not already. (fixed in v1.6.1).
  • Also, we need to review if any conflicts can arise between different versions of geWorkbench. What if 1.6 is installed over 1.5 without uninstalling 1.5?


(updated from version 1.5)

  • Richard's comments need to be reviewed.
  • The control of calculations once launched seems uncertain. Which calculations are actually stopped when canceled? Which keep running, using CPU, even though no result will be returned? Local vs Grid?
  • GeneOntology - The Table View pane is a dead-end. (Can't associate displayed GO terms with individual markers. Can't return anything to Markers component). What about the tree view? It cannot be correlated with table view???
  • SOM zoom-in - check if working correctly.
  • MatrixReduce was reworked on-the-fly. Need to update System tests and Use-Case documents?
  • We should be able to filter out the top +- X percentage points of expression data. Currently we can only filter on various absolute values.
  • Note - any filtering operation (operation that changes the dataset) after an analysis node has been created will invalidate the analysis results. But no warning is given. (A warning has been added).
  • We should reexamine what data files are included in the distribution. If we had a live update of gene ontology files, maybe we would not even need to included them at all in the distribution. They could install themselves.
Personal tools