From Informatics

Role Assignments

Release Manager – Kenneth Smith
Release Engineer – Xiaoqing Zhang
Tech Lead – Kiran Keshav
Tester – Bernd Jagla
Test Manager – Bernd Jagla
Technical Writer – Mary VanGinhoven

List of Included Components

<Comment>Every included component should have a dependency sheet listing any external files, executables etc. that are required for it to function, and their expected location (geWorkbench root, data etc).</Comment>

A spreadsheet File:GeWorkbench1.5-component status.xls showing detailed release status as of version 1.5beta is available here and on Sharepoint under Release Process.

For modules dependencies, please see Additional necessary files included in distribution.

New Modules

caArray v2.0
ANOVA
ARACNE
MatrixREDUCE
Cellular Networks Knowledge Base
GenePattern components
- PCA
- Weighted Voting
- K-nearest neighbors

Data Managmenent:

Arrays/Phenotypes
Markers
preferences
Project Panel
Session Mgr

File input filters:

Affy File Format
CEL File Loader
Exp. Format
FASTA Format
Genepix File Format

RMA Express Format

Data filters:

Filtering
Affy Detection Call Filter
Deviation Filter
Expression Threshold Filter
Genepix Filter (Two channel filter)
Genepix Flag Filter
Missing Values Filter
PDB Structure Format

Normalization:

HouseKeeping Genes Normalizer
Normalization
Log2 Tranformation
Marker Centering Normalizer
Mean Variance Normalizer
Missing Values
Microarray Centering Normalizer
Quantile Normalizer
Threshold Normalizer

Experiment Information:

Dataset Annotation
Dataset History
Experiment Info
Version Infomation

Analyis/Visualization

Alignment Results
Analysis
caBIO Pathways (this has been integrated in the Marker Annotations component)
CELImageViewer
Color Mosaic
Dendrogram
Expression Profiles
Expression Value Distribution
Fast Hierachical Clustering Analysis
Gene Ontology
Image Viewer
Jmol
Marker Annotations
Microarray Viewer
Pattern Discovery
Patterns (Pattern Panel)
Position Histogram
Promoter
Scatter Plot
Sequence
Sequence Alignment
Sequence Retriever
SOM Analysis
SOM Clusters
SPLASH Patterns
t Test Analysis
Tabular Microarray Viewer
Volcano Plot

Excluded Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Cancer-GEMS (awaiting further development from NCI)
Cytoscape_V2_4 (still some problems)
Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
NetBoost
EdgeListFileFormat (NetBoost)
MEDUSA
Mindy
SkyLine
GeneWays
Evidence Integration
Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
Interactions (early version of CNKB)
Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
Pattern Discovery Algorithm (association analysis)
Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
SVM Format (in \geworkbench\src\org\geworkbench\components\parsers)
Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
t-profiler
Simulation (a student project)

In addition, the following are excluded:

\geworkbench\lib\Simulation_libs
\geworkbench\lib\caArrayMageom

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

FitModel binary is compiled manually as follows
- gcc -c -O2 -mno-cygwin -funroll-loops *.c
- gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
- gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)

API jar: The Java API jar is created with the makefile, command "make jar".
FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.

The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Other Mysteries

/lib

The entire geworkbench /lib directory is included in the distribution. We do not necessarily know what each file does, there may be no-longer needed libraries there. The contents should be annotated and needed files determined.

Functionality Modifications

Annotations Panel

Integration with new version of caCORE API, v3.1. As part of integrating with the new caBIO API, we have also made a number of changes in how the API is used. In the past, we were retrieving gene annotations using as a search term the Affy probeset id (e.g., 31335_at). Unfortunately, the caBIO servers contain probe-based information only for the HU133 chip. To alleviate this problem, we now use the following search approach:
- If the input dataset was associated with an annotations file when it was opened, then we retrieve the HUGO gene symbol associated with a marker (e.g., for marker 31335_at the HUGO symbol is IGF1R) and search caBIO using this gene symbol as a query.
- If the input dataset does not have an associated annotation file, then we do the caBIO search using the marker name. In this case we are restricted, as the only markers for which we will be able to retrieve information are the ones in the HU133 chip.
Browser access to CGAP gene annotations. In the past, clicking on a gene name hyperlink would directly bring up the corresponding CGAP page. Now, the users are provided with an option; namely then are asked which of the supported CGAP organisms (human or mouse) they want to retrieve info for. In the (near) future we plan to provide additional options here, such as searching Entrez Gene instead of CGAP.
Extract markers/genes from pathways. In the past, the only operation available for BioCarta pathways was the ability to visualize the pathway image in the caBIO Pathways component. Now, 2 more options are avaible:
- Add pathway genes to set. Selecting this option results in retrieving the HUGO sysmbols of all genes that comprise the pathway. For each such symbol XXX the application will try to find if the currently selected microarray set has a marker whose associated gene is XXX (obviously this will work only if the mocroarray set has been associated with an annotations file). If one (or more) such markers exist, then they will be placed in a marker set which will be named after the pathway and will be added in the Markers panel.
- Export genes to CVS: Information about all genes in the pathway is exported to a text file. The file contains as many rows as the genes extracted and each row contains 2 comma separated values: (1) a gene symbol, and (2) the description associated with that gene.

CVS Tag Info

geWorkbench-v1_5_0

Building the Application

Check out the new geWorkbench1_5_0 branch to a new directory.

For Testing

Go to the new directory and run `ant createDist`. This step will create a folder named "cleanFolder" at the same level as the directory where the CVS code was extracted into. It will put into cleanFolder a new (simple) build.xml designed for running the application in test mode. Only the all_release.xml configuration file will be included in cleanFolder/conf/.
Go to the cleanFolder directory and run "ant" to start the application; the application will use the all_release.xml to load components.

For Release

To create a final distribution folder go to the new directory where the CVS code was extracted and run "ant createCleanDist". This task will clean and rebuild the application into cleanFolder.

System Testing

Table with assigned system tests. The name of the file (word document), the assigned tester, the relative location on share point and the names of the data files are given.

System test	Assigned tester	location (relative to link	estimated date of completion
Anova	Aris	microarrays\Analysis\anova
Aracne	Aris	microarrays\Analysis\aracne	First half of system test completed (non-grid portion) 6/2/2008 by KCS. Non-fatal bugs found with progress bar [1] and network viewing in Cytoscape [2]
House keeping gene normalizer	Aris	microarrays\Normalization\house keeping gene normalizer
Log2 transform	Bernd	microarrays\Normalization\Log2 transformation
scatter plot	Bernd	microarrays\scatter plot
pattern discovery	Bernd	pattern discovery
SOM	Christine	microarrays\Analysis\SOM
missing value	Christine	microarrays\Normalization\missing value computations
2 channel threshold filter	Christine	\microarrays\filtering\2 channel threshold filter
Dataset annotations	Dimitry	General\Dataset annotations
T-test	Dimitry	microarrays\Analysis\t-test
Affy detection filter	Dimitry	microarrays\filtering\Affy detection call filter
MatrixReduce	Ken	microarrays\Analysis\matrix reduce	A new binary has been created, statically linked, and which generates a Bussemaker-compliant display. However, the old system test no longer can be run using the (very minimal) test dataset. A run with a full dataset succeeds. (KCS)
Marker based centering	Ken	microarrays\Normalization\Marker based centering	Passed 6/4/2008 following median calculation fix . (KCS)
Color mosaic	Ken	microarrays\color mosaic	"passed 5/28/2008 - but since then, bugs have arisen during fixing handling of "All Markers/All Arrays" checkboxes (6/3/2008) - systest will need revision after bugfix (All Arrays/All Markers off by default)" (KCS)
expression profiles	Ken	microarrays\expression profiles	passed 5/28/2008. System test script needs slight correction. (1) how coordinates are written, (2) this is not a test of array-based centering. (KCS)
Hierarchical clustering	Kiran	microarrays\Analysis\Hierarchical clustering
Mindy	Kiran	microarrays\Analysis\MINDY
Array based centering	Kiran	microarrays\Normalization\array based centering
deviation filter	Mark	microarrays\filtering\deviation filter	passed 6/17/2008
Gene ontology	Mark	microarrays\Gene Ontology	6/17/2008 - got different result from the script.
Tabular microarray viewer	Mark	microarrays\Tabular Microarray Viewer	passed 6/17/2008
BLAST	Mary	sequences\analysis area\alignment\BLAST
Preferences	Mary	General\Preferences
Genepix flags filter	Mike	microarrays\filtering\Genepix flags filter
sequence retriever	Mike	microarrays\sequence retriever
Marker sets	Min	General\Selection
Expression theshold filter	Min	microarrays\filtering\Expression threshold filter
Promoter panel	Min	sequences\visual area\Promoter
File formats	Pavel	General\menu\File
Microarray viewer	Pavel	microarrays\Microarray Viewer
Mean Variance normalizer	Xiaoqing	microarrays\Normalization\mean variance normalizer
PCA	Xiaoqing	microarrays\Analysis\PCA
caArray	Xiaoqing	General\menu\File\caarray	6/2/2008 - have extensively tested but not performed formal system test script. (KCS)
Cell imager	Zhou	microarrays\CEL imager
Quantile normalizer	Zhou	microarrays\Normalization\quantile normalization
Cellular Network Knowledge base	Zhou	microarrays\Cellular Network KB
Marker Annotations	Michael	microarrays\marker annotations

For results, see http://afdev/systemtest/BrowseLogs.php

Release

Date

geWorkbench 1.5 (aka geWorkbench 1.1) was released on July 3rd, 2008

Lessons Learned

System Test Scripts

Expererience with the system test scripts led to recommended changes. Among these were that

default parameters should be stated, as they could change with time within the application, or one might want to repeat a portion of a test and no longer know what the original conditions were.
The parameters should be periodically stated during the course of a long test script, as otherwise it becomes very difficult to rerun a portion of a test without starting over. (Typically, one or a few parameters are changed with each step).
Some of the scripts test boundary conditions (e.g. marker centering normalizer) which allowed errors in the calculation to be found. Such edge tests are very valuable.
Fully validated test scripts for new/changed components should be available before the next release.

System Test Process

Should the person running a test script report any bugs into the bug-tracking system? Presumably yes. Problems with the script itself can be reported in the script results. (I think not all bugs seen in testing have been reported - need to review results).

Release Build Process

With this release we changed to a system whereby, in most cases, files are included only if known to be needed. This allowed us to identify some hidden dependencies.
When a new build of geWorkbench is run for the first time, genSpace creates a new file (genspace.xml) under the distribution/conf directory. This file should not be included in the distribution. If it is, the user will not be asked if he/she wants to use genSpace.
The Analysis components were arranged in alphabetical order in the file (all.xml??) so that they would be ordered in the Analysis menu.

GUI/Functionality changes

A new category "Documentation" should be created in Mantis and any changes to a component's GUI or functionality should be reported/noted there, so that we can remember that documentation needs to be updated.

GeWorkbench Release 1.5