GeWorkbench Release 2.1

From Informatics

Jump to: navigation, search

Contents

General notes on previous geWorkbench releases


Release Schedule for 2.1.0

  • geWorkbench 2.1.0 code freeze: August 13th, 2010 (planned)
  • System Testing concluded: August 20th, 2010
  • Final release target: September 3rd, 2010
  • Actual release date: September 10th, 2010

Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Thomas Garben
  • Tech Lead – Zhou Ji
  • Tester – Udo Többen, and the rest of the bunch
  • Test Manager – Udo Többen
  • Technical Writer – Mary VanGinhoven

Things to remember

Known Issues - General

  • InstallAnywhere and Norton Internet Security Sonar - Under Windows, InstallAnywere places a file called "Install.exe" in a folder in a path like "C:\Users\ksmith\AppData\Local\Temp\I1276186086\Windows\. This file was seen to be detected and removed by "Norton Sonar", silently terminating the install. Everything below "Temp" is removed after the installation finishes.

Known Issues - release 2.1

See the 2.0 release page for existing issues. Need to see how many of those got fixed.


New Component Detail and Dependencies pages created

Annotation Dependencies - list of dependencies of particular components on particular annotation file columns.


CNKB Data - release status and available interactions for each database.

Major changes in release 2.1.0

Major Code Changes in 2.1.0

A number of components had major code cleanups:

List of changes to GUI

New components in release 2.1

  • Coefficient of Variation filter.

Other major new features in release 2.1

  • bug 2323 and others - Enhancements to BLAST (in sequence alignment).
  • bug 2340 - New Feature - System information display.

List of other major changes

As of 8/11/2010, 37 bugs are marked resolved, closed, or closed pending documentation. 48 issues are still open, of which some will be resolved for release 2.1.0.

  • bug 2303 - Fixed GO tree display in CNKB component. The tree is now expandable.
  • bug 2242 - In filtering components, check for valid inputs.
  • bug 2190 - Fixed problem when saving a workspace that contains a large dendrogram display from hierarchical clustering.
  • bug 2305 - Fixed a problem with JMOL performance.
  • bug 2294 - Welcome screen is now version aware - it will always appear first time when a new version is run.

Documentation

Documentation changes pending from previous releases

See the 2.0 release page for a long list of pending documentation changes and other TODO items.

Tutorial/Online Help chapters that need coordinated updating before release

  • Markers component - says Save is not implemented but I don't think this is right.


Tutorial/Online Help coordinated changes to both finished

  • ARACNe - updated discussion of upper limit on arrays, it pertains only to Fixed Bandwidth. Also made same changes in Online Help.
  • MINDy - added description of how to use ARACNe preprocessing. Full resynch of Tutorial and Online Help.
  • Filtering - added Coefficient of Variation.
  • Pattern Discovery - Existing tutorial (revised during release 1.8.0) ported to Online Help for release 2.1.0.

Versions of external files/components included in this release

  • gene_ontology.1_2.obo downloaded 8/10/2010 from geneontology.org.
  • Ontologizer.jar version 2.0, file created 3/10/2010 (as seen by inspecting files in the JAR file). We are using the "Command line" jar file. No change from release 2.0.0. Checked no further updates as of 8/10/2010. http://compbio.charite.de/index.php/ontologizer2.html
    • Note - On 5/31/2010, the Ontologizer "Manual" version jar file (which has a GUI) was updated. However, the command line version was still not updated.
  • Jaspar_CORE (http://jaspar.genereg.net/) - Unchanged from release 2.0.0. SQL files last updated on server 10/2009. (/html/DOWNLOAD/jaspar_CORE/non_redundant/all_species/sql_tables)
  • JMOL - JMOL 12 RC.10. No change from release 2.0.0. Jar file last updated 5/13/2010.

System Testing Notes

How to test BLAST queries

Our BLAST component uses the URLAPI - that is, it sends the query to NCBI BLAST as a string of arguments embedded in a URL. We can then test our Blast GUI by inspecting the resulting URL string. To see this string, one must change a log setting. In the geWorkbench installation directory, find conf/log4j.properties. Within this file, towards the bottom, add this line under "#component packages:"

  • log4j.logger.org.geworkbench.components.alignment=DEBUG

Components

List of Included Components

Noted on SVN refresh on 8/11/2010 - a number of outdated components look like dropped from SVN. See Changes seen in SVN.png

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • Preferences
  • Project Panel
  • Session manager - no one knows what this is - probably a SOAP interface. But it is definitely needed!

File input formats

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • PDB Structure Format
  • Tab-delimited (RMA Express Format)

Connectivity

  • caArray2 - updated to support caArray 2.3.0 in release 1.8.0 (released September 2009). The caArray client jar is NOT backwards-compatible with any previous versions.

Data filters

  • Filtering
  • Affy Detection Call Filter
  • Coefficient of Variation (new)
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter

Normalization

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values (Normalizer)
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information


Analyis/Visualization

  • Alignment Results
  • Analysis
  • ANOVA
  • ARACNe2 - adds Adaptive Partitioning algorithm and Preprocessing mode.
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • Cancer Gene Index integration in the Marker Annotations component.
  • CELImageViewer
  • Cellular Networks Knowledge Base
  • Color Mosaic
  • Component Configuration Manager.
  • Cytoscape_V2_4 - updated version of Cytoscape.
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Gene Ontology Enrichment Analysis and Display
  • Hierarchical Clustering Analysis
  • genSpace collaborative framework
  • Image Viewer
  • Jmol
  • Marker Annotations
  • MarkUs - Analysis and Viewer
  • MRA - Master Regulator Analysis
  • MatrixREDUCE
  • Microarray Viewer
  • MINDy - Analysis and Viewer
  • Pattern Discovery
  • Position Histogram
  • Pudge?? - Analysis and Viewer (Browser) - if this is working (Kiran?) we should include. We can create a very simple online help file, essentially pointing to the Pudge documentation at the Honig site (Aris).
  • Promoter
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot
  • GenePattern components
    • PCA (GenePattern) - Analysis and Viewer
    • K-nearest neighbors (GenePattern)
    • SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris).
    • WV - Weighted Voting (GenePattern)


  • GeneWays (need to update status for release 2.0) (need for Cytoscape viewer for ARACNe)- this component not working; we do

not have end-user documentation materials (Aris). However, Geneways must be included for Cytoscape interaction to function, but Geneways itself cannot be chosen as a component nor used directly.

Excluded and Dropped Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

Excluded components

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Still under development:

  • CART (GenePattern) - this component has not yet been released. Is part of another component and must be excluded manually from the final installer release build.
  • Cancer-GEMS (awaiting further development from NCI)
  • NetBoost
    • EdgeListFileFormat (NetBoost)
  • Evidence Integration
  • MEDUSA

Not actively being developed:

  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • GSEA
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • SMLR - Sparse Multinomial Logistic Regression - implementation by John Watkinson.
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers) (left over from a John Watkinson project).
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • caScript

Dropped components

These components are not expected to be used again.

  • CuteNet (GeneWays)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • GeneOntology (the original component, now replaced by geneontology2/Ontologizer2.0)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • Pattern Discovery Algorithm (association analysis)
  • Patterns (Pattern Panel) - Omit from release - Appears to have been superseded by the Sequence component.
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • Simulation (a student project)


Note - the original "interactions" component was dropped and reimplemented as the Cellular Networks Knowledge Base. It took a brief detour as being called component "interactions2".

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

Cytoscape

Any other components?

Analysis components - external runtime dependencies

  • has not been updated for 2.1.
component local external type username/password relay servlet known to work outside campus
ANOVA yes grid grid_default no ?
ARACNe yes grid grid_default no ?
CNKB no servlet some open data yes  ?
MINDy yes grid grid_default no ?
GenSpace local grid genSpace account no ?
Hierarchical Clustering yes grid grid_default no  ?
KNN no GenePattern ??? no ?
MarkUs no grid open no ?
MRA local no - no not applicable
MatrixREDUCE local grid grid_default no  ?
PCA no GenePattern ??? no  ?
PUDGE no web open no  ?
SkyLine no grid grid_default no  ?
SkyBase no grid grid_default no  ?
SOM yes grid grid_default no  ?
SVM no GenePattern ??? no  ?
WV no GenePattern ??? no  ?


Important links

http://www.psl.cs.columbia.edu/genspace/

Affymetrix Annotation Technote - Methodology

Affymetrix tabular annotation data

Personal tools