From Informatics

1 Role Assignments
2 Final Release Date
3 Subsequent point releases
4 Fixed in this release
5 Major outstanding issues
6 Wish list - now or future versions
7 Major Changes
8 List of Included Components
9 Excluded Components
- 9.1 Dropped components
10 Externally supplied components
11 Functionality Modifications
12 CVS Tag Info
13 Building the Application
- 13.1 For Testing
  - 13.1.1 Release Engineer:
  - 13.1.2 Testers
- 13.2 For Release
14 System Testing
15 Release
16 Other post-release Notes/Suggestions

Role Assignments

Release Manager – Kenneth Smith
Release Engineer – Zhou
Tech Lead – Kiran Keshav
Tester – Bernd Jagla, and the rest of the bunch
Test Manager – Bernd Jagla
Technical Writer – Mary VanGinhoven

Final Release Date

geWorkbench v1.6 was released on October 24, 2008. Builds for Windows, Linux and MacOSX were created using InstallAnywhere2008. In addition, a generic ZIP file was created which can be used on any platform. The installation packages are available at:

https://gforge.nci.nih.gov/frs/?group_id=78

Subsequent point releases

Subsequent releases in the 1.6 series were:

geWorkbench v1.6.1 November 7th, 2008.

geWorkbench v1.6.2 November 14th, 2008.

geWorkbench v1.6.3 January 8th, 2009.

Fixed in this release

There were 99 bug fixes reported in release 1.6.0. Some of the highlights are listed below.

Fixed a problem (caused by a change in a server-side URL) with retrieving annotations for genes in Biocarta pathway diagrams ( bug 1577).
The default caArray server was set to the production server at NCI (array.nci.nih.gov, port 8080) (bug 1602). The URL for the staging array was updated to array-stage.nci.nih.gov.
An incorrect argument was being sent to NCBI's BLAST server. Due to recent changes there implementing stricter checking, this was caught and blastn would no longer run. The argument was COMPOSITION_BASED_STATISTICS, and would previously just been ignored(bug 1597). That option, according to the NCBI Blast error message, should only be used for blastp or tblastn. (bug 1597).
Corrected a problem where, when using the adjusted Bonferroni correction, or the Westphal-Young with MaxT, only values with positive fold-changes were returned and displayed (bug 1603).
Added a feature whereby the user is warned before any operation that will alter the dataset, e.g. before filtering out markers, or before a log2 transformation.
Added a feature to allow adding a new empty marker set. This can then be used to receive markers selected interactively in Cytoscape (bug 1541).
Fixed a problem displaying patterns in the sequence viewer after running Pattern Discovery (SPLASH) (bug 1415).
Fixed a problem with displaying adjacency matrices generated by ARACNE in the Cytoscape component (bug 1449).

Numerous changes were made to improve responsiveness, including when
- selecting a marker in a large dataset (bug 1346),
- right-clicking on Project with a large dataset (bug 1337),
- saving a workspace (bug 1525), and
- starting an analysis (bug 1544).

The remaining bugs, not listed here in detail, were primarily internal issues within geWorkbench, verification of parameters and set selections before beginning a calculation, improvements to the GUI, and corrections to the grid implementations of analytical services (Hierarchical Clustering, SOM, ANOVA etc).

Major outstanding issues

CNKB has a hard-coded URL only reachable inside C2B2. A servlet mechanism is being developed to provide indirect access from outside to the database (released in v 1.6.1).

How to handle log-normalization of data for volcano plot [outstanding].
t-test p-value/t statistic display in color mosaic component [no changes].
getting rid of caBIO jar dependencies??? [outstanding]
update spreadsheet of modules and their documentation and system test status etc....[done]
Documentation/Tutorials outstanding for existing modules...[some new added]
caArray download needs annotation file....[outstanding]
Sequence ambiguity codes [outstanding]
caGrid 1.2 migration of grid services [outstanding]
Netboost? [Not included this release]
GO term component to use single GO file [not done: GO term component withdrawn]

Wish list - now or future versions

Copy function for Marker/Array sets. Especially for marker sets, as these may be formed by double-clicking markers into the set, but then can't preserve them into a named set...

Major Changes

GO Terms component - withdrawn from version 1.6. To be redesigned and included in next release.
Added Mindy component.
Color Mosaic - new right-click actions

List of Included Components

<Comment>Every included component should have a dependency sheet listing any external files, executables etc. that are required for it to function, and their expected location (geWorkbench root, data etc).</Comment>

A spreadsheet File:GeWorkbench1.6-component status.xls (NOTE this file is not the latest - see Sharepoint) showing detailed release status as of version 1.6 will be available here and on Sharepoint under Release Process.

For modules dependencies, please see Additional necessary files included in distribution.

New Modules

Mindy

Data Managmenent:

Arrays/Phenotypes
Markers
preferences
Project Panel
Session Mgr

File input filters:

Affy File Format
CEL File Loader
Exp. Format
FASTA Format
Genepix File Format
RMA Express Format

Connectivity

caArray v2.1 - download data from caArray version 2.1.x

Data filters:

Filtering
Affy Detection Call Filter
Deviation Filter
Expression Threshold Filter
Genepix Filter (Two channel filter)
Genepix Flag Filter
Missing Values Filter
PDB Structure Format

Normalization:

HouseKeeping Genes Normalizer
Normalization
Log2 Tranformation
Marker Centering Normalizer
Mean Variance Normalizer
Missing Values
Microarray Centering Normalizer
Quantile Normalizer
Threshold Normalizer

Experiment Information:

Dataset Annotation
Dataset History
Experiment Info
Version Infomation

Analyis/Visualization

Alignment Results
Analysis
ANOVA
ARACNE
caBIO Pathways (this has been integrated in the Marker Annotations component)
CELImageViewer
Cellular Networks Knowledge Base
Color Mosaic
Dendrogram
Expression Profiles
Expression Value Distribution
Fast Hierarchical Clustering Analysis
Gene Ontology
Image Viewer
Jmol
Marker Annotations
MatrixREDUCE
Microarray Viewer
Mindy
Pattern Discovery
Patterns (Pattern Panel)
Position Histogram
Promoter
Scatter Plot
Sequence
Sequence Alignment
Sequence Retriever
SOM Analysis
SOM Clusters
SPLASH Patterns
t Test Analysis
Tabular Microarray Viewer
Volcano Plot
GenePattern components
- PCA
- Weighted Voting
- K-nearest neighbors

Excluded Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

GO Terms - removed in version 1.6, to be redesigned and reintroduced in the next release.
Master Regulator Analysis (MRA) - under development.
Cancer-GEMS (awaiting further development from NCI)
Cytoscape_V2_4 (still some problems)
NetBoost
EdgeListFileFormat (NetBoost)
MEDUSA
SkyLine
GeneWays
Evidence Integration
GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
SVM Format (in \geworkbench\src\org\geworkbench\components\parsers)
Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
t-profiler
Simulation (a student project)

In addition, the following are excluded:

\geworkbench\lib\Simulation_libs
\geworkbench\lib\caArrayMageom

Dropped components

These components are not expected to be used again.

Pattern Discovery Algorithm (association analysis)
Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
Interactions (early version of CNKB)

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

FitModel binary is compiled manually as follows
- gcc -c -O2 -mno-cygwin -funroll-loops *.c
- gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
- gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)

API jar: The Java API jar is created with the makefile, command "make jar".
FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.

The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Functionality Modifications

CVS Tag Info

geworkbench_1_6
geworkbench_1_6_1
geworkbench_1_6_2f (there were CVS problems, do not use any other tag for release 1.6.2)

Building the Application

Check out the new geworkbench_1_6 branch to a new directory.

For Testing

The following steps should be followed to set up geWorkbench for testing.

Release Engineer:

The file conf\all_release.xml needs to be updated to include the components that are part of the release.
Verify that the components are listed in the correct order in all_release.xml. Remember that there is a conflict between Genspace and another component whose resolution depends on the correct order. (Include details here....)
The target createDist within the file build.xml needs to be updated so that only components that are part of the release are copied into the file ..\cleanFolder.

Testers

After the release engineer has properly configured the two files all_release.xml and build.xml (above), the testers should do the following:

Check out the release branch/tag from CVS into a new directory.
Change to the new directory and run “ant createDist”. This step will create a folder named "cleanFolder" at the same level as the directory where the CVS code was extracted into. It will put into cleanFolder a new (simple) build.xml designed for running the application in test mode. Only the all_release.xml configuration file will be included in cleanFolder/conf/. (In case of doubt perform "ant clean" before the "ant createDist")
Change directory to ..\cleanFolder and start the app there by running “ant run”. The application will use the all_release.xml to load components.

Release-specific versions of system tests are stored in Sharepoint: https://sharepoint.c2b2.columbia.edu/c2b2/Testing/

Procedures for running the system tests are found on the Wiki: http://wiki.c2b2.columbia.edu/informatics/index.php/System_tests#Best_practices_for_System_tests

Also, if a script fails and you believe it is a defect in geWorkbench, please check if the defect is already described in Mantis

if not in Mantis file a new bug in Mantis
enter the bug number in the status page for the system tests (http://afdev/systemtest/BrowseLogs.php)

If you believe there is a defect in the System Test please send e-mail to the test lead for further investigation.

Some of the System Tests may need to be updated due to changes in the GUI of geWorkbench since the previous release. Please make note of such cases and send an e-mail to the System Test lead.

For Release

The Release Engineer should update the date in the "version info" to the actual build date, and make sure the version number is correct.
To create a final distribution folder go to the new directory where the CVS code was extracted and run "ant createCleanDist". This task will clean and rebuild the application into cleanFolder.

System Testing

Table with assigned system tests. The name of the file (word document), the assigned tester, the relative location on share point and the names of the data files are given.

System test	Assigned tester	location (relative to link
Anova	Min	microarrays\Analysis\anova
Aracne	Mary	microarrays\Analysis\aracne
House keeping gene normalizer	Christine	microarrays\Normalization\house keeping gene normalizer
Log2 transform	Aris	microarrays\Normalization\Log2 transformation
scatter plot	Bernd	microarrays\scatter plot
pattern discovery	Bernd	pattern discovery
SOM	Bernd	microarrays\Analysis\SOM
missing value	Bernd	microarrays\Normalization\missing value computations
2 channel threshold filter	Michael	\microarrays\filtering\2 channel threshold filter
Dataset annotations	Christine	General\Dataset annotations
T-test	Christine	microarrays\Analysis\t-test
Affy detection filter	Christine	microarrays\filtering\Affy detection call filter
MatrixReduce	Michael	microarrays\Analysis\matrix reduce
Marker based centering	Michael	microarrays\Normalization\Marker based centering
Color mosaic	Michael	microarrays\color mosaic
expression profiles	Mark	microarrays\expression profiles
Hierarchical clustering	Mark	microarrays\Analysis\Hierarchical clustering
Mindy	Mark	microarrays\Analysis\MINDY
Array based centering	Mark	microarrays\Normalization\array based centering
deviation filter	Ken	microarrays\filtering\deviation filter
Gene ontology	Ken	microarrays\Gene Ontology
Tabular microarray viewer	Ken	microarrays\Tabular Microarray Viewer
BLAST	Ken	sequences\analysis area\alignment\BLAST
Preferences	Min	General\Preferences
Genepix flags filter	Min	microarrays\filtering\Genepix flags filter
sequence retriever	Min	microarrays\sequence retriever
Marker sets	Mary	General\Selection
Expression theshold filter	Mary	microarrays\filtering\Expression threshold filter
Promoter panel	Mary	sequences\visual area\Promoter
File formats	Zhou	General\menu\File
Microarray viewer	Zhou	microarrays\Microarray Viewer
Mean Variance normalizer	Zhou	microarrays\Normalization\mean variance normalizer
PCA	Pavel	microarrays\Analysis\PCA
caArray	Pavel	General\menu\File\caarray
Cell imager	Pavel	microarrays\CEL imager
Quantile normalizer	Aris	microarrays\Normalization\quantile normalization
Cellular Network Knowledge base	Aris	microarrays\Cellular Network KB
Marker Annotations	Aris	microarrays\marker annotations

For results, see http://afdev/systemtest/BrowseLogs.php

Release

Date

geWorkbench 1.6 was released on October 24, 2008

Lessons Learned

System Test Scripts

Even more time is needed to provide accurate system tests. There have been a lot of GUI changes due mostly to customer requests (normalization, filtering procedures) that could be updated before the release.

System Test Process

Improvements on the status page are needed to reflect changes made to the system tests. I need to be able to update/add comments to the annotations of the individual system tester. Since the system test "can" have flaws, I usually go through the comments on the status page to verify that the given comment is about a defect in geWorkbench or a defect in the system test.

There were problems with naming of system tests: Currently the file name is being used to store in the database to link to component tested. Users have been renaming the files and therefore we got multiple entries for a given component. This causes problems...

Somehow the filename should not be used but rather a variable set in the system test itself. This requires major changes.

We need to verify that all components are listed in the system test results page after the system version is created.

Release Build Process

Build scripts were better automated to:

create final distribution files for each platform that have the proper name, e.g. geWorkbench_v1.6.0_Windows_installer_with_JRE1.5.exe

For Windows and Macintosh, only distributions including the JRE 1.5 were distributed. There are observed problems with Java 1.6.

caArray connectivity via the Java API does not work under JRE 1.6.
geWorkbench occasionall freezes up at apparently random moments under JRE 1.6.

GUI/Functionality changes

Grid Server and urls corresponding to this release

Server: cagridnode.c2b2.columbia.edu
Index Service URL: http://cagridnode.c2b2.columbia.edu:8080/v1.6/wsrf/services/DefaultIndexService
Dispatcher URL: http://cagridnode.c2b2.columbia.edu:8080/v1.6/wsrf/services/cagrid/Dispatcher

GeWorkbench Release 1.6