GeWorkbench Release 2.4

From Informatics

Jump to: navigation, search

Contents

General notes on previous geWorkbench releases

Other geWorkbench planning pages

Release 2.4.1

geWorkbench 2.4.1 was released on October 26th, 2012.

Following are the notes from the ReadMe file.

geWorkbench v2.4.1 is a small but important bug-fix release, dealing with the BLAST and ANOVA components. Due to a change in the HTML output of the NCBI BLAST server, the BLAST component in previous versions of geWorkbench no longer can parse and import BLAST results. In geWorkbench 2.4.1, a new parser using the BLAST XML result format has been implemented. This greatly reduces, but does not eliminate, the dependance on the BLAST HTML format result page. However, future changes to the HTML page format should no longer affect the basic retrieval of results. In addition, a problem was identified in the ANOVA component which caused incorrect markers to be used if a marker set was activated.

No changes were made to grid service URLs, as none of the bug fixes involved them.

The GenomeSpace installer was changed to look for geWorkbench in the user's home directory rather than the system Application folder, in line with the change below.


2.4.1 Bug fixes

  • Alignment results (BLAST)
    • 3186 - Implement XML parser due to NCBI HTML format changes.
  • ANOVA
    • 3156 - Incorrect markers used if marker set activated.
  • Installation
    • 3244 - Install on Macintosh and Linux will default to user home directory. If geWorkbench is installed to the system Applications folder, and if the user does not have admin privileges on the Mac, then geWorkbench cannot write files such as the GO OBO file or the ARACNe preprocessing file to the geWorkbench root.

Release Schedule for 2.4.0

  • Code freeze: June 18, 2012
  • System testing started: June 19, 2012
  • System testing end target: June 28, 2012
  • System Testing concluded: June 28, 2012
  • Bug fixes concluded: July 20, 2012
  • Final release target: July 17, 2012
  • Actual release date: July 23, 2012

Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Nikhil Podduturi
  • Tech Lead – Zhou Ji
  • Tester – Udo Többen, and the rest of the bunch
  • Test Manager – Udo Többen
  • Technical Writer – Mary VanGinhoven

Things to remember

  • The Perl script to convert Media Wiki geWorkbench tutorial pages to the format needed for the geWorkbench Java Help system.

External components in 2.4.0

  • caArray - caArray client external v1.0 (new version, compatible ONLY with caArray 2.5.0+).
  • caBio - caBIO client 4.3 (no change).
  • caGrid - caGrid version 1.4 (no change).
  • Cytoscape - Version 2.8.2 (no change).
  • GeneOntology OBO file- (6/17/2012) Updated to this date. But geWorkbench now downloads latest each time.
  • JASPAR - Version released October 12, 2009 (no change). We use the following files from the JASPAR CORE SQL tables directory (http://jaspar.genereg.net/html/DOWNLOAD/jaspar_CORE/non_redundant/all_species/sql_tables/):
    • MATRIX.txt
    • MATRIX_ANNOTATION.txt
    • MATRIX_DATA.txt
  • JMol - version 12.2.24 (updated).
  • Ontologizer - Ontologizer.jar version 2.0, file released 2010-03-10 (no change).

geWorkbench 2.4.0 Grid Service URLs

The default Index Service and Dispatcher Service are hard-coded in configuration file "conf/application.properties". Updating these defaults is part of the release process. That is, for the production version, the production URLs must be entered.

Production URLs for 2.4.0

Index and Dispatcher

Standard geWorkbench Grid Services

where ServiceName is e.g. Anova, Aracne, etc.

MarkUs and Skyline Service URLs

  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/MarkUs
  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/SkyLine

MarkUs and Skyline RESULT URLS

Both MarkUs and Skyline reference a web service to retrieve results. The results remain on the remote server and only the information requested is returned to geWorkbench. As a consequence, the results of these two analyses cannot be preserved indefinitely, even by saving a workspace.

The result urls are independent of cagrid index/dispatcher url, but linked tightly to bhapp.c2b2.columbia.edu.

Markus result url: http://bhapp.c2b2.columbia.edu/MarkUs/cgi-bin/browse.pl?pdb_id=MUS... [^] It's url for MarkUs web site, and won't change when we move our services around.

Skyline result url: http://cagridnode.c2b2.columbia.edu:8080/luna/SkyLineData/output [^] which is a proxy forward to bhapp.c2b2.columbia.edu:8080/SkyLineData/output

When cagrid index service moves to a new server, we just need to change the tomcat configuration of these two services to register to new index service. No need to change geworkbench code for them to work.

Development URLs

All grid services except MarkUs and Skyline use the development index service and dispatcher, and have development grid services.


System Testing

See results at http://afdev.cgc.cpmc.columbia.edu/systemtest/BrowseLogs.php

Java Version

geWorkbench 2.4.0 was developed and tested using the Java 6 JDK and JRE. Casual testing has also been done under Java 7, and a number of incompatibilties found and fixed. Any known remaining issues are listed below.

Note - geWorkbench 2.4.0 compiled with Java 6 was also casually tested using a Java 7 JRE runtime. Only one problem was found.


Known Incompatibilities with Java 1.7.

  • Marker Annotations - (#3072) does not receive activated marker set, caBIO client library conflict with Java 7.

Known caArray issue that keeps coming up

(This section is unchanged from the entry for release 2.3.0).

There is a problem with the caArray server code, in that as long as a geWorkbench session is running, the server retains the last used username/password, if any have been submitted.

See bugs 2022, 2555.


Two problems can arise:

Problem 1

Unfortunately, that defect still exists in the new API. The only situtation you will see it in is as follows:

  • User A connects to caArray via geWorkbench and enters his/her credentials.
  • An anonymous user then connects to caArray using the same geWorkbench instance. This anonymous user can still see User A's protected data.

The bug does not affect any other situations. E.g., if the users are using different instances of geWorkbench, there is no problem. If the second user is passing in a new set of credentials, it's not a problem. It is only a problem when the first user is credentialled and the second user is anonymous, and they are both connecting through the same geWorkbench one after another.

Thanks! Rashmi

Problem 2

Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench. However you can still put in a different username/password combination. This is a property of the caArray server-side code. Thus if you have no valid username/password and enter an incorrect one, you will need to restart geWorkbench before you can query caArray public experiments again (no login required).

Changes in release 2.4.0

Major changes

  • Significance Analysis of Microarrays (SAM) (#2986) - Addition of SAM interface to R local and grid services.
  • MRA (#2952) -
    • addition of two-FET method,
    • complete update of graphics to display multiple bar-graphs
    • bar-graph display changed from t-value to rank
  • SkyBase (#2510) - Add access to PDB-60 database. As of 7/18/2012, the databases have:
    • PDB60: 12,264 structures, 9,544,535 models.
    • NESG: 946 structures, 1,943,361 models.
  • File Parsers (#3006) - Add support for Affymetrix Gene 1.0 ST whole-transcript and Exon 1.0 ST annotation files
  • caArray (#3022) - Updated client to match new caArray version 2.5. Not backward compatible with caArray 2.4 or earlier.
  • Help System New Chapters
    • Menu Bar
    • SkyBase
    • Volcano Plot
  • t-test - changed to Apache Commons Math Library, p-values show slight changes due to improved precision.
  • Marker sets (#3025) - Remove markers from sets when filtered out from dataset.

Changes in functionality (requiring documentation updates)

(W = Wiki Updated, H = Help updated)

  • Alignment results (BLAST) - (W+H)
    • #3029 - Allow sequence hit import "Include" action to include hits from multiple result sets.
    • Added a picture.
  • CaArray - (W+H)
    • 2944 show caArray Experiment ID of selected expt.
    • 3022 upgrade to caArrray 2.5 client jar file
      • Change dialog radio-button label from "Remote" to "caArray 2.5"
  • Component Configuration Manager - (W+H)
    • 2904, 3058 Allow components to belong to multiple categories.
  • File Parsers - Chapters "File Formats", "Local Data Files" (W+H).
    • 1957 Restrictions on merging of microarray files tightened.
    • 2963 When loading network or pattern file, only show valid parent nodes
    • 3027 Annotation file with just Gene Symbol not sufficient
    • 3006 Add support for Affymetrix Gene 1.0 ST whole-transcript and Exon 1.0 ST annotation files
  • Gene Ontology Analysis (W+H)
    • 3042 Prevent autoloading into Ontologzier of HuGene and HuExon 1.0 ST annotation files.
  • GenSpace
    • 2999 Registration issue regarding genSpace/Remote Workspace
  • Marker Annotations (W+H)
    • 2345 Progress window redesigned
    • 3052 CGI filtering on collapsed fields described.
  • Marker sets/arrays phenotypes (W+H)
    • 3025 Remove markers from sets when filtered out from dataset.
  • Menu items (W+H)
    • 2223 Remove Command->Sets functionality from Menu Bar.
  • Master Regulator Analysis (W+H)
    • 2952 MRA version 4 enhancements (2-FET calculation, multiple result display)
    • 3020 add ability to save image of bar code graph
    • 3021 add button to display only intersection set
  • Pattern Discovery (W+H)
    • 3010 In Exhaustive, min support label changed from percent to number
  • Project Folders (W+H)
    • 3051 Disable Save in right-click menu if component does not implement it.
    • 3067 Several types of result node have nonfunctional save methods
    • 2981 Export as tab-delim default setting
  • SAM
    • 2986 Implement new component "Significance Analysis of Microarrays (SAM)"
  • Sequence retriever (W+H)
    • 3064 display all hits if no marker selected.
    • 2985 Grey-out transcript-start choice for protein query.
  • SkyBase (W+H)
    • 2510 Add access to PDB-60 database
  • T-test (W+H)
    • 3044 For sort mode, significant genes first sorted by t-value rather than fold-change.
    • 2724 change procedure in fold change for situation where avg case or avg control is negative
    • 2989 update t-test implementation and math library

Bug fixes (no documentation change)

  • Analysis
    • 2446 problems with parameter panels
  • ANOVA
    • 3087 Anova result does not sort properly by p-value
  • ARACNe
    • 3114 The "Analyze" button is not enabled when cancel ARACNe analyze process.
    • 3105 java.lang.NoClassDefFoundError: AracneComputation
    • 3040 resampling during bootstrapping fails
    • 2885 refactoring the code of lauching MINDY and ARACNE analysis
  • CaGrid
    • 2892 Out-of-memory error for ARACNE service (major revisions to ARACNe grid service)
  • Cellular Networks KB
    • 3008 CNKB does not receive marker panel selections under Java 7
    • 2961 add .txt file filter to CNKB file export
    • 2974 Exported interactome contain lines with single gene symbol
  • Color mosaic
    • 3043 Exception seen when cycling between different viewers
    • 2296 Clicking print button changes display size
    • 3011 Java 7 problem: array sets cause red display
  • Cytoscape
    • 3074 Exception when attempt to create subnetwork
    • 3094 Correlation cutoff of zero filters out all interactions
  • Dataset history
    • 2794 MRA doesnt' report parameter
  • Experiment info
    • 2835 Autorefresh seems to work only under certain conditions
  • Expression profiles
    • 2980 Expression profile does not plot if have activated array set in Java 7
  • File Parsers
    • 3026 1 GO-related error message, a few mistakes
    • 3030 Internal annotations lost
    • 2888 Marker Sorting by Gene Name doesn't work properly
    • 3053 Exceptions in GO Viewer when annotation file used with superset of markers.
  • Fold Change Analysis
    • 3049 controls not monitored for parameter changes
  • Gene Ontology Viewer
    • 3018 Gene Ontology Viewer shows wrong genes, when table is sorted
  • GenSpace
    • 3093 "Add as friend" button missing
    • 3076 Exception after try to run analysis without ethernet connection
    • 2938 Once stars are present, may not be able to further update
  • Help files
    • 3088 ARACNE Analysis help content not found.
  • Hierarchical Clustering
    • 2995 Code Cleaning-up for Hierarchical Clustering
  • IDEA
    • 3002 elapsed time calculation
    • 3129 proper display of "Chromosomal Band" value.
  • Jmol
    • 2894 Update to JMOL 12.2.24 Jmol.jar
  • Marker sets/arrays phenotypes
    • 2994 Selective marker selection opens right-click window
    • 3056 EDT exception on activating large set
  • Master Regulator Analysis
    • 2969 If no network loaded, File Load button INOP
    • 2671 Network not always cleared
    • 2786 Rename tab
    • 2601 re-enble marker loading from file
    • 3079 MRA results differ depending on if markers loaded as symbols or probesets
  • MatrixReduce
    • 3089 Not all parameters saved
  • Microarray Viewer
    • 3104 Microarray Viewer does not respond to marker set selections
    • 3071 Activating marker set causes empty display in Microarray Viewer under Java 7
  • MINDy
    • 3035 Local and Grid produce different results
    • 2968 Heat map scrunched and on scroll, get third map
    • 2951, 3057 Notify user if hub or modulators not in target set.
  • Normalization panel
    • 3097 Datafile section appeared multiple times in dataset history
  • Other
    • 2983 DSPattern, the classes that implement it, and the interface that extends it are pathological.
  • Pattern Discovery
    • 2978 In result full sequence view, tooltip positions only on first sequence
    • 2759 Better enforcement of parameter settings needed
    • 2611 Problems loading files/Refreshing GUI
  • PCA
    • 2359 Saving server settings causes crash
    • 2683 fix precision in text field on 3D PCA
  • Position histogram
    • 3007 Problems with pane resizing
  • Project Folders
    • 3103 image snapshot of pathway diagram appears in closed parent node
    • 3054 Exception on save network node
    • 300 need array info
    • 2971 improve file exists warning on write
  • Promoter Panel
    • 3099 fixed problem with missing tooltip information
  • Sequence Retriever
    • 3082 sequence retriever continues after warning no markers selected
    • 3070 exception after marker set activate/deactivate cycles
  • SkyBase
    • 3108 Skybase error
  • T-test
    • 2982 Number presentation for plot and hover box not in sync
    • 3048 "data is log2 transformed" check box not hooked up to parameter saving mechanism

Other Changes to Wiki tutorials

  • Analysis Framework (#2900) updated description of triggering parameter check in text fields.
  • ARACNe - updated description of Adjacency file. (W+H)
  • t-test and Volcano Plot - updated description of how fold change is calculated in t-test (log2 values always reported).
  • Color Mosaic - updated due to change in t-test math library. (W+H)
  • MRA (#2793) - Added note that no marker sets should be activated during MRA analysis.

Components

List of Included Components

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • Preferences
  • Project Panel
  • Session manager - no one knows what this is - probably a SOAP interface. But it is definitely needed! (check for 2.4.0)

File input formats

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • PDB Structure Format
  • Tab-delimited (RMA Express Format)

Connectivity

  • caArray2 - compatible with caArray 2.5.0 and higher. The caArray client jar is NOT backwards-compatible with any earlier versions.

Data filters

  • Filtering
  • Affy Detection Call Filter
  • Coefficient of Variation (new)
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter
  • Multiple Probeset Dilter
  • Entrez GeneID Filter

Normalization

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values (Normalizer)
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information


Analyis/Visualization

  • Alignment Results
  • Analysis
  • ANOVA
  • ARACNe2 - adds Adaptive Partitioning algorithm and Preprocessing mode.
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • Cancer Gene Index integration in the Marker Annotations component.
  • CELImageViewer
  • Cellular Networks Knowledge Base
  • Color Mosaic
  • Component Configuration Manager
  • Cytoscape_V2_8
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Fold-change Analysis
  • Gene Ontology Enrichment Analysis and Display
  • genSpace collaborative framework
  • Hierarchical Clustering Analysis
  • IDEA
  • Image Viewer
  • Jmol
  • Marker Annotations
  • MarkUs - Analysis and Viewer
  • MRA - Master Regulator Analysis
  • MatrixREDUCE
  • Microarray Viewer
  • MINDy - Analysis and Viewer
  • Pattern Discovery
  • Position Histogram
  • Pudge
  • Promoter
  • SAM
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot
  • GenePattern components
    • PCA (GenePattern) - Analysis and Viewer
    • K-nearest neighbors (GenePattern)
    • SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris).
    • WV - Weighted Voting (GenePattern)
    • GSEA

Excluded and Dropped Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

Excluded components

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Still under development:

  • CART (GenePattern) - this component has not yet been released. Is part of another component and must be excluded manually from the final installer release build.
  • Cancer-GEMS (awaiting further development from NCI)
  • NetBoost
    • EdgeListFileFormat (NetBoost)
  • Evidence Integration
  • MEDUSA

Not actively being developed:

  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • SMLR - Sparse Multinomial Logistic Regression - implementation by John Watkinson.
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers) (left over from a John Watkinson project).
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • caScript

Dropped components

These components are not expected to be used again.

  • CuteNet (GeneWays)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • GeneOntology (the original component, now replaced by geneontology2/Ontologizer2.0)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • Pattern Discovery Algorithm (association analysis)
  • Patterns (Pattern Panel) - Omit from release - Appears to have been superseded by the Sequence component.
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • Simulation (a student project)


Note - the original "interactions" component was dropped and reimplemented as the Cellular Networks Knowledge Base. It took a brief detour as being called component "interactions2".

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://wiki.c2b2.columbia.edu/mantis/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

MINDy jar file for caGrid

  • Source tree is kept in the geWorkbench local CVS repository.
  • Current version is MINDY-0.3.jar
  • Compile with ant dist-jar. The final jar file will be in the "dist" directory.

Any other components?

Personal tools