GeWorkbench Release 2.3

From Informatics

Jump to: navigation, search

Contents

General notes on previous geWorkbench releases

Other geWorkbench planning pages

Release Schedule for 2.3.0

  • Code freeze: February 1, 2012
  • System testing started: February 2, 2012
  • System testing end target: February 10, 2012
  • System Testing concluded: February 10, 2012
  • Bug fixes concluded: March 9th, 2012
  • Final release target: February 28, 2012
  • Actual release date: March 16, 2012

Role Assignments

  • Release Manager – Kenneth Smith
  • Release Engineer – Zheng Ma
  • Tech Lead – Zhou Ji
  • Tester – Udo Többen, and the rest of the bunch
  • Test Manager – Udo Többen
  • Technical Writer – Mary VanGinhoven

Things to remember

  • The Perl script to convert Media Wiki geWorkbench tutorial pages to the format needed for the geWorkbench Java Help system.

Java Version

geWorkbench 2.3.0 was developed and tested using Java 6 (1.6.*).


Known Incompatibilities with Java 1.7.

geWorkbench 2.3.0 was developed and tested using the Java 6 JDK and JRE. Subsequent testing with Java 7 has shown a number of problems, shown below. For this reason, please only use Java 6 JREs when running geWorkbench 2.3.0 or earlier.

  • CNKB - (#3008) activated markers not transfered from Markers component (fixed in development).
  • Color Mosaic - (#3011) activated array set causes color mosaic display to turn red (fixed in development).
  • Expression profiles - (#2980) activated array set causes expression profile not to be drawn (fixed in development).
  • Microarray Viewer - (#3071) no display if marker set activated (fixed in development).
  • Marker Annotations - (#3072) does not receive activated marker set, caBIO client library conflict with Java 7.


Updates to external components 2.3.0

  • caArray - caArray client external v1.0 (no change).
  • caBio - caBIO client 4.3 (no change).
  • caGrid - all services updated to caGrid version 1.4.
  • Cytoscape - updated to version 2.8.
  • GeneOntology OBO file- (2/1/2012) Updated to this date. But geWorkbench now downloads latest each time.
  • JASPAR - (checked 2/1/2012 ) As of this date, there has been no update to the JASPAR motif files since October 12, 2009. we use the following files from the JASPAR CORE SQL tables directory (http://jaspar.genereg.net/html/DOWNLOAD/jaspar_CORE/non_redundant/all_species/sql_tables/):
    • MATRIX.txt
    • MATRIX_ANNOTATION.txt
    • MATRIX_DATA.txt
  • JMol - version 12.0.45. A new version, 12.2.13, is available but requires code changes in geWorkbench to incorporate. Not included in 2.3.0.
  • Ontologizer - Ontologizer.jar version 2.0, file released 2010-03-10 (no change).

geWorkbench 2.3.0 Grid Service URLs

The default Index Service and Dispatcher Service are hard-coded in configuration file "conf/application.properties". Updating these defaults is part of the release process. That is, for the production version, the production URLs must be entered.

Production URLs for 2.3.0

Index and Dispatcher

Standard geWorkbench Grid Services

where ServiceName is e.g. Anova, Aracne, etc.

MarkUs and Skyline Service URLs

  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/MarkUs
  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/SkyLine

MarkUs and Skyline RESULT URLS

Both MarkUs and Skyline pick reference a web service to retrieve results. The results remain on the remote server and only the information requested is returned to geWorkbench. As a consequence, the results of these two analyses cannot be preserved indefinitely, even by saving a workspace.

The result urls are independent of cagrid index/dispatcher url, but linked tightly to bhapp.c2b2.columbia.edu.

Markus result url: http://bhapp.c2b2.columbia.edu/MarkUs/cgi-bin/browse.pl?pdb_id=MUS... [^] It's url for MarkUs web site, and won't change when we move our services around.

Skyline result url: http://cagridnode.c2b2.columbia.edu:8080/luna/SkyLineData/output [^] which is a proxy forward to bhapp.c2b2.columbia.edu:8080/SkyLineData/output

When cagrid index service moves to a new server, we just need to change the tomcat configuration of these two services to register to new index service. No need to change geworkbench code for them to work.

Development URLs

All grid services except MarkUs and Skyline use the development index service and dispatcher, and have development grid services.

System Testing

See results at http://afdev.cgc.cpmc.columbia.edu/systemtest/BrowseLogs.php


Known caArray issue that keeps coming up

There is a problem with the caArray server code, in that as long as a geWorkbench session is running, the server retains the last used username/password, if any have been submitted.

See bugs 2022, 2555.


Two problems can arise:

Problem 1

Unfortunately, that defect still exists in the new API. The only situtation you will see it in is as follows:

  • User A connects to caArray via geWorkbench and enters his/her credentials.
  • An anonymous user then connects to caArray using the same geWorkbench instance. This anonymous user can still see User A's protected data.

The bug does not affect any other situations. E.g., if the users are using different instances of geWorkbench, there is no problem. If the second user is passing in a new set of credentials, it's not a problem. It is only a problem when the first user is credentialled and the second user is anonymous, and they are both connecting through the same geWorkbench one after another.

Thanks! Rashmi

Problem 2

Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench. However you can still put in a different username/password combination. This is a property of the caArray server-side code. Thus if you have no valid username/password and enter an incorrect one, you will need to restart geWorkbench before you can query caArray public experiments again (no login required).

Changes in release 2.3.0

Major changes

  • Array Sets
    • #2730 - Add ability to read in array sets from CSV file.
    • #2828 - Interpret second column of array set CSV file as set names.
  • caArray
    • #2729 - memory requirements during download were dramatically decreased. More than 500 arrays have been downloaded with no adverse impact on memory usage. The previous limit was about 100 arrays before memory was exhausted.
  • CNKB
    • #2613 - Add export of interactome direct to Project
  • Grid Services (caGrid)
    • #2788 - Upgraded to caGrid release 1.4.
    • #2861 - Data transfer from geWorkbench to Dispatcher and from Dispatcher to grid service now uses caTransfer. This allows transfer of much larger files to remote services. Not yet implemented for return direction.
  • Cytoscape
    • #2841 - upgraded to Cytoscape 2.8.
  • File Parsers
    • #2848 - GEO GDS full.soft format handled.
  • Filtering
    • #2784 - dynamic search added to preview dialog on all filters. Searches on both marker and gene symbol.
    • #2777 - "Deviation Filter" renamed to "Standard Deviation Filter".
    • #2844 - "Multiple Gene ID Filter" renamed to "Entrez Gene ID Filter".
  • GUI
    • #2743 - implement new GUI element to invoke analysis
  • IDEA
    • #2416 - Implement new component.
  • MINDy
    • #2795 - Add export of result tables to CSV format file.
  • MRA
    • #2623 - MARINa grid service added (variation on MRA, grid only).
    • #2856 - Export of MRA results table to CSV and tab-delimited format files (User-contributed code).
  • Project Folders
    • #2335 - Export microarray data to standard tab-delimited format. From right-click menu.
    • #2797 - Much faster switching between various data/result nodes for large datasets, through major code improvements.
  • Tabular Microarray Viewer
    • #2762 - Export displayed data in spreadsheet format. Allows a selected subset of data to be exported to a tab-delimited file.

Other changes

  • Analysis
    • #2754 - all analyses should write timestamp to dataset history.
    • #2872 - Do not close analysis window after parameter setup error.
  • BLAST
    • #2722 - BLAST made a normal analysis component.
    • #2830 - a parsing problem in tblastx results due to changes in the HTML returned by NCBI is fixed in 2.3.0. The number from column "N" was appearing after the score in the e-value column.
    • #2876 - gap costs setting is removed for tblastx.
    • #2880 - In the results table, the number of identities rather than total aligned length was being reported under "align length".
  • caArray
    • #2769 - when more than one array is downloaded at a time, the arrays are automatically merged and the data node is given the name of the parent experiment. Previously, the name of each array was appended to create a very long data node name.
    • #2925 - experiments are now referenced internally in the caArray interface code by their unique experiment ids, not by their names. There are experiments in caArray with duplicate names.
  • CNKB
    • #2696 - Clarified effect of "restrict to genes in microarray set" during interactome export to Project.
    • #2817 - Export interactomes using tab-delimited file format
    • #2881 - Export interactome to project should use interactome name for node name
  • Color Mosaic
    • #2887 - limit size of screenshot to 100 Megapixels to avoid out-of-memory problems.
    • #2889 - Color Mosaic for t-test result incorrectly shows the original dataset when you un-select and re-select "Display" button.
  • Component Configuration Manger
    • #2668 - Cytoscape changed from required to recommended for ARACNe.
    • Cytoscape changed to loaded by default to avoid a windowing problem on first use.
  • Dataset History
    • #2870 - fixed some inconsistencies between histories recorded for local vs grid service runs.
  • Expression Value Distribution (EVD)
    • #2932 - EVD t-test was not interpreting activated array indices properly.
  • File Parsers
    • #2386 - Add ability to load Pattern Discovery "pattern" files directly into project.
    • #2731 - improvements to handling of local OBO files.
    • #2846 - preserve original file type extension in data node name.
  • Fold Change Analysis
    • #2739 - Check for error conditions in Fold Change calculation.
  • Gene Ontology Analysis and Viewer
    • #2753 - Make all columns in results tables sortable.
  • genSpace
    • #2479 - Filtering and Normalization events are now also captured, in addition to analysis events.
    • #2578 - Removing workspace comments was not working.
    • #2586 - Consistency and error checking improved on genSpace server side.
    • #2587 - Proper sizing of workflow graphs on page.
    • #2666 - Problem with remove friend fixed.
    • #2792 - Tool usage statistics not properly refreshing.
    • #2858 - Problems in workflow time window.
    • #2916 - Rating stars were not being displayed.
    • #2920 - After a friend request, the person is shown in your friend list but his or her details are not visible.
    • #2935 - Improvements to handling of workflow comments.
  • Grid Services (caGrid)
    • #2364 - catch and report out-of-memory errors from Dispatcher client.
    • #2790 - clean up memory leaks.
  • Matrix Reduce
    • #2804 - Memory leak on switching between multiple result nodes fixed.
    • #2803 - PSAM Logo diagrams from grid had parsing error.
    • #1555 - matrixREDUCE did not work if used "Specify Pattern" option on LINUX and Mac platforms.
  • MINDy
    • #2768 - Remove "Refresh Heat Map" button.
    • #2911, 2949 - The grid service version of MINDy was using activated marker sets rather than the target marker set selected in its own GUI.
    • #2912 - Mindy grid analysis using p-value throws Nullpointer exception.
    • #2967 - Bonferroni correction was calculated using all markers, not just target set.
  • MRA
    • Changed from two-sided to one-sided (right side, enrichment) FET calculation.
    • #2822 - bar graph calculated using converted p-value instead of t-value.
    • #2853 - MRA result node tooltip now shows number of master regulators.
    • #2757 - changes to export buttons.
  • Menu Bar
    • #2826 - Change "Export" to "File->Save->Dataset".
  • Logging
    • #2719 - Add timestamps for geWorkbench startup and shutdown to stdout.log and stderr.log.
  • Pattern Discovery
    • #2595 - Simplify parameter labels.
    • #2664 - Problems when invalid characters entered.
    • #2721 - Pattern Discovery component converted to regular Analysis component.
    • #2898 - Problems in error dialog when invalid parameters entered.
    • #2976 - Problem with display of motif hits across lines on full sequence view.
    • #2977 - Problem with display of motif on scrolling view.
  • Project Folders
    • #1025 - Fixed problem with representing arrays assigned to more than one set in an EXP format file.
    • #2691 - Display hover text with pattern count for pattern nodes.
  • Sequence Retriever
    • #2023 - Warn user if a query marker has no annotation.
    • #2840 - Add option to only show one transcript per start site

Changes to Wiki tutorials

  • All analysis, filtering and normalization chapters updated to reflect new dynamic-menu based access to these components and removal of the old "Command Area".
  • BLAST - all screenshots of analysis parameter setting panels were recreated. Text was updated as appropriate to describe new analysis setup and other minor changes.
  • caArray - all relevant screenshots updated because the merge button has been removed. Text updated to explain automatic merge and naming of merged set after experiment only.
  • CNKB - Update text and screenshots pertaining to interactome export to project or file.
  • Color Mosaic - update about memory limit on screenshot size.
  • Cytoscape - UniProt LinkOut workaround described.
  • Data Subsets - Arrays - Added function "Load Set" for loading array sets, plus dynamic search updated. Described using second column of arrays file ("Load Set") to hold set names. Many screenshots updated.
  • Dataset Details - pattern node hover text.
  • EVD - tutorial updated from Help, then ported back to Help.
  • Filtering - dynamic search described.
  • Fold Change - document error condition handling.
  • Gene Ontology Results Viewer - table sorting noted.
  • genSpace - details of how not-yet accepted friend requests are handled were added, as well as denied requests and canceling requests. Noted that filtering and normalization events now captured. Added detail on depiction of repeated steps in workflows (linear vs loops). Limit of 150 on displayed workflows.
  • Grid Services (caGrid) - updated to describe new caTransfer usage, new screenshots of analysis window with URLs.
  • Local Data Files - relevant text and most screen shots updated to reflect removal of "merge" radiobutton and implementation of automerge for microarray data, and to give details of new features such as loading of pattern files.
  • MenuBar - Options for saving files (exp, pdb, adj, fasta). Previously was called "Export". Now same as project right-click file save options.
  • MINDy - revise to remove "Refresh Heat Map" button, add "Export" button. Reshoot most screenshots to update those buttons and analysis framework.
  • MRA - Heavily revised to incorporate addition of MARINa, changes in export options, and changes in bar graphs. All new screenshots.
  • Pattern Discovery - All screenshots revised for new layout, from change to Analysis component and also layout cleanup.
  • Project Folders -
    • Options for saving files (exp, pdb, adj, fasta).
    • Option to save microarray to tab-delimited format.
    • Add description of how an array can be assigned to multiple sets (within one list) in an EXP format file.
  • Promoter - all screenshots updated to reflect recent GUI changes.
  • Sequence Retriever - Warn user if a query marker has no annotation. Add option to only show one transcript per start site. All screenshots update to reflect new option.
  • Viewing a Microarray Dataset - Export displayed data in spreadsheet format.

Changes to Online Help

  • all existing "Online Help" chapters that were previously ported from the Wiki were updated as needed (essentially all of them).
  • The following wiki tutorials were newly ported to Online Help:
    • Basics - ported to "Introduction" on Online Help.
    • caArray
    • File Formats
    • Fold Change Analysis
    • Gene Ontology Analysis
    • Gene Ontology Viewer
    • Hierarchical Clustering
    • Information Panel - replaced separate Comments, History and Experiment Information entries.
    • Local Data Files
    • Project Folders
    • SOM

Components

List of Included Components

Data Managmenent:

  • Arrays/Phenotypes
  • Markers
  • Preferences
  • Project Panel
  • Session manager - no one knows what this is - probably a SOAP interface. But it is definitely needed!

File input formats

  • Affy File Format
  • CEL File Loader
  • Exp. Format
  • FASTA Format
  • Genepix File Format
  • PDB Structure Format
  • Tab-delimited (RMA Express Format)

Connectivity

  • caArray2 - updated to support caArray 2.3.0 in release 1.8.0 (released September 2009). The caArray client jar is NOT backwards-compatible with any previous versions.

Data filters

  • Filtering
  • Affy Detection Call Filter
  • Coefficient of Variation (new)
  • Deviation Filter
  • Expression Threshold Filter
  • Genepix Filter (Two channel filter)
  • Genepix Flag Filter
  • Missing Values Filter
  • Multiple Probeset Dilter
  • Entrez GeneID Filter

Normalization

  • HouseKeeping Genes Normalizer
  • Normalization
  • Log2 Tranformation
  • Marker Centering Normalizer
  • Mean Variance Normalizer
  • Missing Values (Normalizer)
  • Microarray Centering Normalizer
  • Quantile Normalizer
  • Threshold Normalizer

Experiment Information


Analyis/Visualization

  • Alignment Results
  • Analysis
  • ANOVA
  • ARACNe2 - adds Adaptive Partitioning algorithm and Preprocessing mode.
  • caBIO Pathways (this has been integrated in the Marker Annotations component)
  • Cancer Gene Index integration in the Marker Annotations component.
  • CELImageViewer
  • Cellular Networks Knowledge Base
  • Color Mosaic
  • Component Configuration Manager.
  • Cytoscape_V2_8 - updated version of Cytoscape.
  • Dendrogram
  • Expression Profiles
  • Expression Value Distribution
  • Fold-change Analysis
  • Gene Ontology Enrichment Analysis and Display
  • genSpace collaborative framework
  • Hierarchical Clustering Analysis
  • IDEA
  • Image Viewer
  • Jmol
  • Marker Annotations
  • MarkUs - Analysis and Viewer
  • MRA - Master Regulator Analysis
  • MatrixREDUCE
  • Microarray Viewer
  • MINDy - Analysis and Viewer
  • Pattern Discovery
  • Position Histogram
  • Pudge
  • Promoter
  • Scatter Plot
  • Sequence
  • Sequence Alignment
  • Sequence Retriever
  • SOM Analysis
  • SOM Clusters
  • t Test Analysis
  • Tabular Microarray Viewer
  • Volcano Plot
  • GenePattern components
    • PCA (GenePattern) - Analysis and Viewer
    • K-nearest neighbors (GenePattern)
    • SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris).
    • WV - Weighted Voting (GenePattern)
    • GSEA

Excluded and Dropped Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

Excluded components

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Still under development:

  • CART (GenePattern) - this component has not yet been released. Is part of another component and must be excluded manually from the final installer release build.
  • Cancer-GEMS (awaiting further development from NCI)
  • NetBoost
    • EdgeListFileFormat (NetBoost)
  • Evidence Integration
  • MEDUSA

Not actively being developed:

  • GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
  • Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
  • SMLR - Sparse Multinomial Logistic Regression - implementation by John Watkinson.
  • SVM Format (in \geworkbench\src\org\geworkbench\components\parsers) (left over from a John Watkinson project).
  • Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
  • t-profiler
  • caScript

Dropped components

These components are not expected to be used again.

  • CuteNet (GeneWays)
  • Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
  • Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
  • GeneOntology (the original component, now replaced by geneontology2/Ontologizer2.0)
  • Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
  • Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
  • Pattern Discovery Algorithm (association analysis)
  • Patterns (Pattern Panel) - Omit from release - Appears to have been superseded by the Sequence component.
  • Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
  • Simulation (a student project)


Note - the original "interactions" component was dropped and reimplemented as the Cellular Networks Knowledge Base. It took a brief detour as being called component "interactions2".

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

  • FitModel binary is compiled manually as follows
    • gcc -c -O2 -mno-cygwin -funroll-loops *.c
    • gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
    • gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)
  • API jar: The Java API jar is created with the makefile, command "make jar".
  • FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
  • FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.
  • The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

MINDy jar file for caGrid

  • Source tree is kept in the geWorkbench local CVS repository.
  • Current version is MINDY-0.3.jar
  • Compile with ant dist-jar. The final jar file will be in the "dist" directory.

Any other components?

Personal tools