GeWorkbench Release 1.5

From Informatics

Jump to: navigation, search

Contents


Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Xiaoqing Zhang
  • Tech Lead – Kiran Keshav
  • Tester – Bernd Jagla
  • Test Manager – Bernd Jagla
  • Technical Writer – Mary VanGinhoven

List of Included Components

<Comment>Every included component should have a dependency sheet listing any external files, executables etc. that are required for it to function, and their expected location (geWorkbench root, data etc).</Comment>

A spreadsheet File:GeWorkbench1.5-component status.xls showing detailed release status as of version 1.5beta is available here and on Sharepoint under Release Process.

For modules dependencies, please see Additional necessary files included in distribution.

New Modules

  • caArray v2.0
  • ANOVA
  • ARACNE
  • MatrixREDUCE
  • Cellular Networks Knowledge Base
  • GenePattern components
    • PCA
    • Weighted Voting
    • K-nearest neighbors

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • preferences
  • Project Panel
  • Session Mgr

File input filters:

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • RMA Express Format

Data filters:

  • Filtering
  • Affy Detection Call Filter
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter
  • PDB Structure Format

Normalization:

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information:

Analyis/Visualization

  • Alignment Results
  • Analysis
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • CELImageViewer
  • Color Mosaic
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Fast Hierachical Clustering Analysis
  • Gene Ontology
  • Image Viewer
  • Jmol
  • Marker Annotations
  • Microarray Viewer
  • Pattern Discovery
  • Patterns (Pattern Panel)
  • Position Histogram
  • Promoter
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • SPLASH Patterns
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot

Excluded Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

  • Cancer-GEMS (awaiting further development from NCI)
  • Cytoscape_V2_4 (still some problems)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • NetBoost
  • EdgeListFileFormat (NetBoost)
  • MEDUSA
  • Mindy
  • SkyLine
  • GeneWays
  • Evidence Integration
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Interactions (early version of CNKB)
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • Pattern Discovery Algorithm (association analysis)
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • Simulation (a student project)

In addition, the following are excluded:

  • \geworkbench\lib\Simulation_libs
  • \geworkbench\lib\caArrayMageom

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Other Mysteries

/lib

The entire geworkbench /lib directory is included in the distribution. We do not necessarily know what each file does, there may be no-longer needed libraries there. The contents should be annotated and needed files determined.


Functionality Modifications

Annotations Panel

  • Integration with new version of caCORE API, v3.1. As part of integrating with the new caBIO API, we have also made a number of changes in how the API is used. In the past, we were retrieving gene annotations using as a search term the Affy probeset id (e.g., 31335_at). Unfortunately, the caBIO servers contain probe-based information only for the HU133 chip. To alleviate this problem, we now use the following search approach:
    • If the input dataset was associated with an annotations file when it was opened, then we retrieve the HUGO gene symbol associated with a marker (e.g., for marker 31335_at the HUGO symbol is IGF1R) and search caBIO using this gene symbol as a query.
    • If the input dataset does not have an associated annotation file, then we do the caBIO search using the marker name. In this case we are restricted, as the only markers for which we will be able to retrieve information are the ones in the HU133 chip.
  • Browser access to CGAP gene annotations. In the past, clicking on a gene name hyperlink would directly bring up the corresponding CGAP page. Now, the users are provided with an option; namely then are asked which of the supported CGAP organisms (human or mouse) they want to retrieve info for. In the (near) future we plan to provide additional options here, such as searching Entrez Gene instead of CGAP.
  • Extract markers/genes from pathways. In the past, the only operation available for BioCarta pathways was the ability to visualize the pathway image in the caBIO Pathways component. Now, 2 more options are avaible:
    • Add pathway genes to set. Selecting this option results in retrieving the HUGO sysmbols of all genes that comprise the pathway. For each such symbol XXX the application will try to find if the currently selected microarray set has a marker whose associated gene is XXX (obviously this will work only if the mocroarray set has been associated with an annotations file). If one (or more) such markers exist, then they will be placed in a marker set which will be named after the pathway and will be added in the Markers panel.
    • Export genes to CVS: Information about all genes in the pathway is exported to a text file. The file contains as many rows as the genes extracted and each row contains 2 comma separated values: (1) a gene symbol, and (2) the description associated with that gene.



CVS Tag Info

geWorkbench-v1_5_0

Building the Application

Check out the new geWorkbench1_5_0 branch to a new directory.

For Testing

  • Go to the new directory and run `ant createDist`. This step will create a folder named "cleanFolder" at the same level as the directory where the CVS code was extracted into. It will put into cleanFolder a new (simple) build.xml designed for running the application in test mode. Only the all_release.xml configuration file will be included in cleanFolder/conf/.
  • Go to the cleanFolder directory and run "ant" to start the application; the application will use the all_release.xml to load components.

For Release

  • To create a final distribution folder go to the new directory where the CVS code was extracted and run "ant createCleanDist". This task will clean and rebuild the application into cleanFolder.


System Testing

Table with assigned system tests. The name of the file (word document), the assigned tester, the relative location on share point and the names of the data files are given.

System test Assigned tester location (relative to link estimated date of completion
AnovaArismicroarrays\Analysis\anova
AracneArismicroarrays\Analysis\aracneFirst half of system test completed (non-grid portion) 6/2/2008 by KCS. Non-fatal bugs found with progress bar [1] and network viewing in Cytoscape [2]
House keeping gene normalizerArismicroarrays\Normalization\house keeping gene normalizer
Log2 transformBerndmicroarrays\Normalization\Log2 transformation
scatter plotBerndmicroarrays\scatter plot
pattern discoveryBerndpattern discovery
SOMChristinemicroarrays\Analysis\SOM
missing value Christinemicroarrays\Normalization\missing value computations
2 channel threshold filterChristine\microarrays\filtering\2 channel threshold filter
Dataset annotationsDimitryGeneral\Dataset annotations
T-testDimitrymicroarrays\Analysis\t-test
Affy detection filterDimitrymicroarrays\filtering\Affy detection call filter
MatrixReduceKenmicroarrays\Analysis\matrix reduceA new binary has been created, statically linked, and which generates a Bussemaker-compliant display. However, the old system test no longer can be run using the (very minimal) test dataset. A run with a full dataset succeeds. (KCS)
Marker based centeringKenmicroarrays\Normalization\Marker based centeringPassed 6/4/2008 following median calculation fix . (KCS)
Color mosaicKenmicroarrays\color mosaic"passed 5/28/2008 - but since then, bugs have arisen during fixing handling of "All Markers/All Arrays" checkboxes (6/3/2008) - systest will need revision after bugfix (All Arrays/All Markers off by default)" (KCS)
expression profilesKenmicroarrays\expression profiles passed 5/28/2008. System test script needs slight correction. (1) how coordinates are written, (2) this is not a test of array-based centering. (KCS)
Hierarchical clusteringKiranmicroarrays\Analysis\Hierarchical clustering
MindyKiranmicroarrays\Analysis\MINDY
Array based centeringKiranmicroarrays\Normalization\array based centering
deviation filterMarkmicroarrays\filtering\deviation filter passed 6/17/2008
Gene ontologyMarkmicroarrays\Gene Ontology 6/17/2008 - got different result from the script.
Tabular microarray viewerMarkmicroarrays\Tabular Microarray Viewer passed 6/17/2008
BLASTMarysequences\analysis area\alignment\BLAST
PreferencesMaryGeneral\Preferences
Genepix flags filterMikemicroarrays\filtering\Genepix flags filter
sequence retrieverMikemicroarrays\sequence retriever
Marker setsMinGeneral\Selection
Expression theshold filterMinmicroarrays\filtering\Expression threshold filter
Promoter panelMinsequences\visual area\Promoter
File formatsPavelGeneral\menu\File
Microarray viewerPavelmicroarrays\Microarray Viewer
Mean Variance normalizerXiaoqingmicroarrays\Normalization\mean variance normalizer
PCAXiaoqingmicroarrays\Analysis\PCA
caArrayXiaoqingGeneral\menu\File\caarray6/2/2008 - have extensively tested but not performed formal system test script. (KCS)
Cell imagerZhoumicroarrays\CEL imager
Quantile normalizerZhoumicroarrays\Normalization\quantile normalization
Cellular Network Knowledge baseZhoumicroarrays\Cellular Network KB
Marker AnnotationsMichaelmicroarrays\marker annotations



For results, see http://afdev/systemtest/BrowseLogs.php


Release

Date

geWorkbench 1.5 (aka geWorkbench 1.1) was released on July 3rd, 2008

Lessons Learned

System Test Scripts

Expererience with the system test scripts led to recommended changes. Among these were that

  1. default parameters should be stated, as they could change with time within the application, or one might want to repeat a portion of a test and no longer know what the original conditions were.
  2. The parameters should be periodically stated during the course of a long test script, as otherwise it becomes very difficult to rerun a portion of a test without starting over. (Typically, one or a few parameters are changed with each step).
  3. Some of the scripts test boundary conditions (e.g. marker centering normalizer) which allowed errors in the calculation to be found. Such edge tests are very valuable.
  4. Fully validated test scripts for new/changed components should be available before the next release.

System Test Process

  1. Should the person running a test script report any bugs into the bug-tracking system? Presumably yes. Problems with the script itself can be reported in the script results. (I think not all bugs seen in testing have been reported - need to review results).

Release Build Process

  1. With this release we changed to a system whereby, in most cases, files are included only if known to be needed. This allowed us to identify some hidden dependencies.
  2. When a new build of geWorkbench is run for the first time, genSpace creates a new file (genspace.xml) under the distribution/conf directory. This file should not be included in the distribution. If it is, the user will not be asked if he/she wants to use genSpace.
  3. The Analysis components were arranged in alphabetical order in the file (all.xml??) so that they would be ordered in the Analysis menu.

GUI/Functionality changes

  • A new category "Documentation" should be created in Mantis and any changes to a component's GUI or functionality should be reported/noted there, so that we can remember that documentation needs to be updated.


Other post-release Notes/Suggestions

  • Richard's comments need to be reviewed.
  • The control of calculations once launched seems uncertain. Which calculations are actually stopped when canceled? Which keep running, using CPU, even though no result will be returned? Local vs Grid?
  • GeneOntology - The Table View pane is a dead-end. (Can't associate displayed GO terms with individual markers. Can't return anything to Markers component). What about the tree view? It cannot be correlated with table view???
  • SOM zoom-in - check if working correctly.
  • MatrixReduce was reworked on-the-fly. Need to update System tests and Use-Case documents?
  • We should be able to filter out the top +- X percentage points of expression data. Currently we can only filter on various absolute values.
  • Note - any filtering operation (operation that changes the dataset) after an analysis node has been created will invalidate the analysis results. But no warning is given.
  • We should reexamine what data files are included in the distribution. If we had a live update of gene ontology files, maybe we would not even need to included them at all in the distribution. They could install themselves.
Personal tools