GeWorkbench Release 1.5
From Informatics
Contents |
Role Assignments
- Release Manager – Kenneth Smith
- Release Engineer – Xiaoqing Zhang
- Tech Lead – Kiran Keshav
- Tester – Bernd Jagla
- Test Manager – Bernd Jagla
- Technical Writer – Mary VanGinhoven
List of Included Components
<Comment>Every included component should have a dependency sheet listing any external files, executables etc. that are required for it to function, and their expected location (geWorkbench root, data etc).</Comment>
A spreadsheet File:GeWorkbench1.5-component status.xls showing detailed release status as of version 1.5beta is available here and on Sharepoint under Release Process.
For modules dependencies, please see Additional necessary files included in distribution.
New Modules
- caArray v2.0
- ANOVA
- ARACNE
- MatrixREDUCE
- Cellular Networks Knowledge Base
- GenePattern components
- PCA
- Weighted Voting
- K-nearest neighbors
Data Managmenent:
- Arrays/Phenotypes
- Markers
- preferences
- Project Panel
- Session Mgr
File input filters:
- Affy File Format
- CEL File Loader
- Exp. Format
- FASTA Format
- Genepix File Format
- RMA Express Format
Data filters:
- Filtering
- Affy Detection Call Filter
- Deviation Filter
- Expression Threshold Filter
- Genepix Filter (Two channel filter)
- Genepix Flag Filter
- Missing Values Filter
- PDB Structure Format
Normalization:
- HouseKeeping Genes Normalizer
- Normalization
- Log2 Tranformation
- Marker Centering Normalizer
- Mean Variance Normalizer
- Missing Values
- Microarray Centering Normalizer
- Quantile Normalizer
- Threshold Normalizer
Experiment Information:
- Dataset Annotation
- Dataset History
- Experiment Info
- Version Infomation
Analyis/Visualization
- Alignment Results
- Analysis
- caBIO Pathways (this has been integrated in the Marker Annotations component)
- CELImageViewer
- Color Mosaic
- Dendrogram
- Expression Profiles
- Expression Value Distribution
- Fast Hierachical Clustering Analysis
- Gene Ontology
- Image Viewer
- Jmol
- Marker Annotations
- Microarray Viewer
- Pattern Discovery
- Patterns (Pattern Panel)
- Position Histogram
- Promoter
- Scatter Plot
- Sequence
- Sequence Alignment
- Sequence Retriever
- SOM Analysis
- SOM Clusters
- SPLASH Patterns
- t Test Analysis
- Tabular Microarray Viewer
- Volcano Plot
Excluded Components
The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.
The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.
- Cancer-GEMS (awaiting further development from NCI)
- Cytoscape_V2_4 (still some problems)
- Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
- NetBoost
- EdgeListFileFormat (NetBoost)
- MEDUSA
- Mindy
- SkyLine
- GeneWays
- Evidence Integration
- Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
- GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
- Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
- Interactions (early version of CNKB)
- Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
- Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
- Pattern Discovery Algorithm (association analysis)
- Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
- SVM Format (in \geworkbench\src\org\geworkbench\components\parsers)
- Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
- t-profiler
- Simulation (a student project)
In addition, the following are excluded:
- \geworkbench\lib\Simulation_libs
- \geworkbench\lib\caArrayMageom
Externally supplied components
The following components originate external to the geWorkbench source tree:
MatrixReduce
Source
MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.
Compiling
MatrixReduce is compiled using the following commands:
- FitModel binary is compiled manually as follows
- gcc -c -O2 -mno-cygwin -funroll-loops *.c
- gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
- gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
- API jar: The Java API jar is created with the makefile, command "make jar".
- FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
- FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
- The API jar is created with the makefile under MatrixREDUCE's top directory.
Notes
See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316
Aracne.jar for MINDY
Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.
The location of the external ARACNE code is:
The version of the external ARACNE code is:
Cytoscape
Any other components?
Other Mysteries
/lib
The entire geworkbench /lib directory is included in the distribution. We do not necessarily know what each file does, there may be no-longer needed libraries there. The contents should be annotated and needed files determined.
Functionality Modifications
Annotations Panel
- Integration with new version of caCORE API, v3.1. As part of integrating with the new caBIO API, we have also made a number of changes in how the API is used. In the past, we were retrieving gene annotations using as a search term the Affy probeset id (e.g., 31335_at). Unfortunately, the caBIO servers contain probe-based information only for the HU133 chip. To alleviate this problem, we now use the following search approach:
- If the input dataset was associated with an annotations file when it was opened, then we retrieve the HUGO gene symbol associated with a marker (e.g., for marker 31335_at the HUGO symbol is IGF1R) and search caBIO using this gene symbol as a query.
- If the input dataset does not have an associated annotation file, then we do the caBIO search using the marker name. In this case we are restricted, as the only markers for which we will be able to retrieve information are the ones in the HU133 chip.
- Browser access to CGAP gene annotations. In the past, clicking on a gene name hyperlink would directly bring up the corresponding CGAP page. Now, the users are provided with an option; namely then are asked which of the supported CGAP organisms (human or mouse) they want to retrieve info for. In the (near) future we plan to provide additional options here, such as searching Entrez Gene instead of CGAP.
- Extract markers/genes from pathways. In the past, the only operation available for BioCarta pathways was the ability to visualize the pathway image in the caBIO Pathways component. Now, 2 more options are avaible:
- Add pathway genes to set. Selecting this option results in retrieving the HUGO sysmbols of all genes that comprise the pathway. For each such symbol XXX the application will try to find if the currently selected microarray set has a marker whose associated gene is XXX (obviously this will work only if the mocroarray set has been associated with an annotations file). If one (or more) such markers exist, then they will be placed in a marker set which will be named after the pathway and will be added in the Markers panel.
- Export genes to CVS: Information about all genes in the pathway is exported to a text file. The file contains as many rows as the genes extracted and each row contains 2 comma separated values: (1) a gene symbol, and (2) the description associated with that gene.
CVS Tag Info
geWorkbench-v1_5_0
Building the Application
Check out the new geWorkbench1_5_0 branch to a new directory.
For Testing
- Go to the new directory and run `ant createDist`. This step will create a folder named "cleanFolder" at the same level as the directory where the CVS code was extracted into. It will put into cleanFolder a new (simple) build.xml designed for running the application in test mode. Only the all_release.xml configuration file will be included in cleanFolder/conf/.
- Go to the cleanFolder directory and run "ant" to start the application; the application will use the all_release.xml to load components.
For Release
- To create a final distribution folder go to the new directory where the CVS code was extracted and run "ant createCleanDist". This task will clean and rebuild the application into cleanFolder.
System Testing
Table with assigned system tests. The name of the file (word document), the assigned tester, the relative location on share point and the names of the data files are given.
System test | Assigned tester | location (relative to link | estimated date of completion |
---|---|---|---|
Anova | Aris | microarrays\Analysis\anova | |
Aracne | Aris | microarrays\Analysis\aracne | First half of system test completed (non-grid portion) 6/2/2008 by KCS. Non-fatal bugs found with progress bar [1] and network viewing in Cytoscape [2] |
House keeping gene normalizer | Aris | microarrays\Normalization\house keeping gene normalizer | |
Log2 transform | Bernd | microarrays\Normalization\Log2 transformation | |
scatter plot | Bernd | microarrays\scatter plot | |
pattern discovery | Bernd | pattern discovery | |
SOM | Christine | microarrays\Analysis\SOM | |
missing value | Christine | microarrays\Normalization\missing value computations | |
2 channel threshold filter | Christine | \microarrays\filtering\2 channel threshold filter | |
Dataset annotations | Dimitry | General\Dataset annotations | |
T-test | Dimitry | microarrays\Analysis\t-test | |
Affy detection filter | Dimitry | microarrays\filtering\Affy detection call filter | |
MatrixReduce | Ken | microarrays\Analysis\matrix reduce | A new binary has been created, statically linked, and which generates a Bussemaker-compliant display. However, the old system test no longer can be run using the (very minimal) test dataset. A run with a full dataset succeeds. (KCS) |
Marker based centering | Ken | microarrays\Normalization\Marker based centering | Passed 6/4/2008 following median calculation fix . (KCS) |
Color mosaic | Ken | microarrays\color mosaic | "passed 5/28/2008 - but since then, bugs have arisen during fixing handling of "All Markers/All Arrays" checkboxes (6/3/2008) - systest will need revision after bugfix (All Arrays/All Markers off by default)" (KCS) |
expression profiles | Ken | microarrays\expression profiles | passed 5/28/2008. System test script needs slight correction. (1) how coordinates are written, (2) this is not a test of array-based centering. (KCS) |
Hierarchical clustering | Kiran | microarrays\Analysis\Hierarchical clustering | |
Mindy | Kiran | microarrays\Analysis\MINDY | |
Array based centering | Kiran | microarrays\Normalization\array based centering | |
deviation filter | Mark | microarrays\filtering\deviation filter | passed 6/17/2008 |
Gene ontology | Mark | microarrays\Gene Ontology | 6/17/2008 - got different result from the script. |
Tabular microarray viewer | Mark | microarrays\Tabular Microarray Viewer | passed 6/17/2008 |
BLAST | Mary | sequences\analysis area\alignment\BLAST | |
Preferences | Mary | General\Preferences | |
Genepix flags filter | Mike | microarrays\filtering\Genepix flags filter | |
sequence retriever | Mike | microarrays\sequence retriever | |
Marker sets | Min | General\Selection | |
Expression theshold filter | Min | microarrays\filtering\Expression threshold filter | |
Promoter panel | Min | sequences\visual area\Promoter | |
File formats | Pavel | General\menu\File | |
Microarray viewer | Pavel | microarrays\Microarray Viewer | |
Mean Variance normalizer | Xiaoqing | microarrays\Normalization\mean variance normalizer | |
PCA | Xiaoqing | microarrays\Analysis\PCA | |
caArray | Xiaoqing | General\menu\File\caarray | 6/2/2008 - have extensively tested but not performed formal system test script. (KCS) |
Cell imager | Zhou | microarrays\CEL imager | |
Quantile normalizer | Zhou | microarrays\Normalization\quantile normalization | |
Cellular Network Knowledge base | Zhou | microarrays\Cellular Network KB | |
Marker Annotations | Michael | microarrays\marker annotations |
For results, see http://afdev/systemtest/BrowseLogs.php
Release
Date
geWorkbench 1.5 (aka geWorkbench 1.1) was released on July 3rd, 2008
Lessons Learned
System Test Scripts
Expererience with the system test scripts led to recommended changes. Among these were that
- default parameters should be stated, as they could change with time within the application, or one might want to repeat a portion of a test and no longer know what the original conditions were.
- The parameters should be periodically stated during the course of a long test script, as otherwise it becomes very difficult to rerun a portion of a test without starting over. (Typically, one or a few parameters are changed with each step).
- Some of the scripts test boundary conditions (e.g. marker centering normalizer) which allowed errors in the calculation to be found. Such edge tests are very valuable.
- Fully validated test scripts for new/changed components should be available before the next release.
System Test Process
- Should the person running a test script report any bugs into the bug-tracking system? Presumably yes. Problems with the script itself can be reported in the script results. (I think not all bugs seen in testing have been reported - need to review results).
Release Build Process
- With this release we changed to a system whereby, in most cases, files are included only if known to be needed. This allowed us to identify some hidden dependencies.
- When a new build of geWorkbench is run for the first time, genSpace creates a new file (genspace.xml) under the distribution/conf directory. This file should not be included in the distribution. If it is, the user will not be asked if he/she wants to use genSpace.
- The Analysis components were arranged in alphabetical order in the file (all.xml??) so that they would be ordered in the Analysis menu.
GUI/Functionality changes
- A new category "Documentation" should be created in Mantis and any changes to a component's GUI or functionality should be reported/noted there, so that we can remember that documentation needs to be updated.
Other post-release Notes/Suggestions
- Richard's comments need to be reviewed.
- The control of calculations once launched seems uncertain. Which calculations are actually stopped when canceled? Which keep running, using CPU, even though no result will be returned? Local vs Grid?
- GeneOntology - The Table View pane is a dead-end. (Can't associate displayed GO terms with individual markers. Can't return anything to Markers component). What about the tree view? It cannot be correlated with table view???
- SOM zoom-in - check if working correctly.
- MatrixReduce was reworked on-the-fly. Need to update System tests and Use-Case documents?
- We should be able to filter out the top +- X percentage points of expression data. Currently we can only filter on various absolute values.
- Note - any filtering operation (operation that changes the dataset) after an analysis node has been created will invalidate the analysis results. But no warning is given.
- We should reexamine what data files are included in the distribution. If we had a live update of gene ontology files, maybe we would not even need to included them at all in the distribution. They could install themselves.