From Informatics

Jump to: navigation, search

1 General notes on previous geWorkbench releases
2 Other geWorkbench planning pages
3 Release Schedule for 2.3.0
4 Role Assignments
5 Things to remember
6 Java Version
- 6.1 Known Incompatibilities with Java 1.7.
7 Updates to external components 2.3.0
8 geWorkbench 2.3.0 Grid Service URLs
- 8.1 Production URLs for 2.3.0
- 8.2 Development URLs
9 System Testing
10 Known caArray issue that keeps coming up
- 10.1 Problem 1
- 10.2 Problem 2
11 Changes in release 2.3.0
12 Components

General notes on previous geWorkbench releases

General notes, feature requests and FAQ page - This page was started with material from the time of release 1.7.0 and will be updated continually.
Links to other release pages:

Other geWorkbench planning pages

geWorkbench Development Notes - contains notes for several changes that were part of release 2.2.0.

GeWorkbench TODO List - items that need to be planned or documented.

System Test Results Log

Nikhil's page for geWorkbench web notes: http://wiki.c2b2.columbia.edu/informatics/index.php/GeWorkbench-Web

The geWorkbench Roadmap (local version) contains possible directions for future development.

caBIG has a separate geWorkbench Roadmap page that we must maintain.

caBIG/NCI also provides the official download page for geWorkbench

Release Schedule for 2.3.0

Code freeze: February 1, 2012
System testing started: February 2, 2012
System testing end target: February 10, 2012
System Testing concluded: February 10, 2012
Bug fixes concluded: March 9th, 2012
Final release target: February 28, 2012
Actual release date: March 16, 2012

Role Assignments

Release Manager – Kenneth Smith
Release Engineer – Zheng Ma
Tech Lead – Zhou Ji
Tester – Udo Többen, and the rest of the bunch
Test Manager – Udo Többen
Technical Writer – Mary VanGinhoven

Things to remember

Best practices for defect management - See also Aris's email of 8/20/09 on this topic.
geWorkbench Roadmap page at NCICB - keep up to date with actual plans and developments - at https://cabig-kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench_Roadmap
InstallAnywhere JRE update packs: http://www.flexerasoftware.com/products/installanywhere/files-utilities.htm
Our local caArray is at afapp1.c2b2.columbia.edu port 38080 (web interface).

The Perl script to convert Media Wiki geWorkbench tutorial pages to the format needed for the geWorkbench Java Help system.

Java Version

geWorkbench 2.3.0 was developed and tested using Java 6 (1.6.*).

Known Incompatibilities with Java 1.7.

geWorkbench 2.3.0 was developed and tested using the Java 6 JDK and JRE. Subsequent testing with Java 7 has shown a number of problems, shown below. For this reason, please only use Java 6 JREs when running geWorkbench 2.3.0 or earlier.

CNKB - (#3008) activated markers not transfered from Markers component (fixed in development).
Color Mosaic - (#3011) activated array set causes color mosaic display to turn red (fixed in development).
Expression profiles - (#2980) activated array set causes expression profile not to be drawn (fixed in development).
Microarray Viewer - (#3071) no display if marker set activated (fixed in development).
Marker Annotations - (#3072) does not receive activated marker set, caBIO client library conflict with Java 7.

Updates to external components 2.3.0

caArray - caArray client external v1.0 (no change).
caBio - caBIO client 4.3 (no change).
caGrid - all services updated to caGrid version 1.4.
Cytoscape - updated to version 2.8.
GeneOntology OBO file- (2/1/2012) Updated to this date. But geWorkbench now downloads latest each time.
JASPAR - (checked 2/1/2012 ) As of this date, there has been no update to the JASPAR motif files since October 12, 2009. we use the following files from the JASPAR CORE SQL tables directory (http://jaspar.genereg.net/html/DOWNLOAD/jaspar_CORE/non_redundant/all_species/sql_tables/):
- MATRIX.txt
- MATRIX_ANNOTATION.txt
- MATRIX_DATA.txt
JMol - version 12.0.45. A new version, 12.2.13, is available but requires code changes in geWorkbench to incorporate. Not included in 2.3.0.
Ontologizer - Ontologizer.jar version 2.0, file released 2010-03-10 (no change).

geWorkbench 2.3.0 Grid Service URLs

The default Index Service and Dispatcher Service are hard-coded in configuration file "conf/application.properties". Updating these defaults is part of the release process. That is, for the production version, the production URLs must be entered.

Production URLs for 2.3.0

Index and Dispatcher

Default Index Service: http://cagridnode.c2b2.columbia.edu:8080/v2.3.0/wsrf/services/DefaultIndexService
Default Dispatcher Service: http://cagridnode.c2b2.columbia.edu:8080/v2.3.0/wsrf/services/cagrid/Dispatcher

Standard geWorkbench Grid Services

http://geworkbench1.c2b2.columbia.edu:8080/wsrf/services/caGrid/ServiceName

where ServiceName is e.g. Anova, Aracne, etc.

MarkUs and Skyline Service URLs

bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/MarkUs
bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/SkyLine

MarkUs and Skyline RESULT URLS

Both MarkUs and Skyline pick reference a web service to retrieve results. The results remain on the remote server and only the information requested is returned to geWorkbench. As a consequence, the results of these two analyses cannot be preserved indefinitely, even by saving a workspace.

The result urls are independent of cagrid index/dispatcher url, but linked tightly to bhapp.c2b2.columbia.edu.

Markus result url: http://bhapp.c2b2.columbia.edu/MarkUs/cgi-bin/browse.pl?pdb_id=MUS... [^] It's url for MarkUs web site, and won't change when we move our services around.

Skyline result url: http://cagridnode.c2b2.columbia.edu:8080/luna/SkyLineData/output [^] which is a proxy forward to bhapp.c2b2.columbia.edu:8080/SkyLineData/output

When cagrid index service moves to a new server, we just need to change the tomcat configuration of these two services to register to new index service. No need to change geworkbench code for them to work.

Development URLs

All grid services except MarkUs and Skyline use the development index service and dispatcher, and have development grid services.

Default Index Service: http://afdev.c2b2.columbia.edu:8080/wsrf/services/DefaultIndexService
Default Dispatcher Service: http://afdev.c2b2.columbia.edu:8080/wsrf/services/cagrid/Dispatcher

System Testing

See results at http://afdev.cgc.cpmc.columbia.edu/systemtest/BrowseLogs.php

Known caArray issue that keeps coming up

There is a problem with the caArray server code, in that as long as a geWorkbench session is running, the server retains the last used username/password, if any have been submitted.

See bugs 2022, 2555.

Two problems can arise:

Problem 1

Unfortunately, that defect still exists in the new API. The only situtation you will see it in is as follows:

User A connects to caArray via geWorkbench and enters his/her credentials.
An anonymous user then connects to caArray using the same geWorkbench instance. This anonymous user can still see User A's protected data.

The bug does not affect any other situations. E.g., if the users are using different instances of geWorkbench, there is no problem. If the second user is passing in a new set of credentials, it's not a problem. It is only a problem when the first user is credentialled and the second user is anonymous, and they are both connecting through the same geWorkbench one after another.

Thanks! Rashmi

Problem 2

Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench. However you can still put in a different username/password combination. This is a property of the caArray server-side code. Thus if you have no valid username/password and enter an incorrect one, you will need to restart geWorkbench before you can query caArray public experiments again (no login required).

Changes in release 2.3.0

Major changes

Array Sets
- #2730 - Add ability to read in array sets from CSV file.
- #2828 - Interpret second column of array set CSV file as set names.
caArray
- #2729 - memory requirements during download were dramatically decreased. More than 500 arrays have been downloaded with no adverse impact on memory usage. The previous limit was about 100 arrays before memory was exhausted.
CNKB
- #2613 - Add export of interactome direct to Project
Grid Services (caGrid)
- #2788 - Upgraded to caGrid release 1.4.
- #2861 - Data transfer from geWorkbench to Dispatcher and from Dispatcher to grid service now uses caTransfer. This allows transfer of much larger files to remote services. Not yet implemented for return direction.
Cytoscape
- #2841 - upgraded to Cytoscape 2.8.
File Parsers
- #2848 - GEO GDS full.soft format handled.
Filtering
- #2784 - dynamic search added to preview dialog on all filters. Searches on both marker and gene symbol.
- #2777 - "Deviation Filter" renamed to "Standard Deviation Filter".
- #2844 - "Multiple Gene ID Filter" renamed to "Entrez Gene ID Filter".
GUI
- #2743 - implement new GUI element to invoke analysis
IDEA
- #2416 - Implement new component.
MINDy
- #2795 - Add export of result tables to CSV format file.
MRA
- #2623 - MARINa grid service added (variation on MRA, grid only).
- #2856 - Export of MRA results table to CSV and tab-delimited format files (User-contributed code).
Project Folders
- #2335 - Export microarray data to standard tab-delimited format. From right-click menu.
- #2797 - Much faster switching between various data/result nodes for large datasets, through major code improvements.
Tabular Microarray Viewer
- #2762 - Export displayed data in spreadsheet format. Allows a selected subset of data to be exported to a tab-delimited file.

Other changes

Analysis
- #2754 - all analyses should write timestamp to dataset history.
- #2872 - Do not close analysis window after parameter setup error.
BLAST
- #2722 - BLAST made a normal analysis component.
- #2830 - a parsing problem in tblastx results due to changes in the HTML returned by NCBI is fixed in 2.3.0. The number from column "N" was appearing after the score in the e-value column.
- #2876 - gap costs setting is removed for tblastx.
- #2880 - In the results table, the number of identities rather than total aligned length was being reported under "align length".
caArray
- #2769 - when more than one array is downloaded at a time, the arrays are automatically merged and the data node is given the name of the parent experiment. Previously, the name of each array was appended to create a very long data node name.
- #2925 - experiments are now referenced internally in the caArray interface code by their unique experiment ids, not by their names. There are experiments in caArray with duplicate names.
CNKB
- #2696 - Clarified effect of "restrict to genes in microarray set" during interactome export to Project.
- #2817 - Export interactomes using tab-delimited file format
- #2881 - Export interactome to project should use interactome name for node name
Color Mosaic
- #2887 - limit size of screenshot to 100 Megapixels to avoid out-of-memory problems.
- #2889 - Color Mosaic for t-test result incorrectly shows the original dataset when you un-select and re-select "Display" button.
Component Configuration Manger
- #2668 - Cytoscape changed from required to recommended for ARACNe.
- Cytoscape changed to loaded by default to avoid a windowing problem on first use.
Dataset History
- #2870 - fixed some inconsistencies between histories recorded for local vs grid service runs.
Expression Value Distribution (EVD)
- #2932 - EVD t-test was not interpreting activated array indices properly.
File Parsers
- #2386 - Add ability to load Pattern Discovery "pattern" files directly into project.
- #2731 - improvements to handling of local OBO files.
- #2846 - preserve original file type extension in data node name.
Fold Change Analysis
- #2739 - Check for error conditions in Fold Change calculation.
Gene Ontology Analysis and Viewer
- #2753 - Make all columns in results tables sortable.
genSpace
- #2479 - Filtering and Normalization events are now also captured, in addition to analysis events.
- #2578 - Removing workspace comments was not working.
- #2586 - Consistency and error checking improved on genSpace server side.
- #2587 - Proper sizing of workflow graphs on page.
- #2666 - Problem with remove friend fixed.
- #2792 - Tool usage statistics not properly refreshing.
- #2858 - Problems in workflow time window.
- #2916 - Rating stars were not being displayed.
- #2920 - After a friend request, the person is shown in your friend list but his or her details are not visible.
- #2935 - Improvements to handling of workflow comments.
Grid Services (caGrid)
- #2364 - catch and report out-of-memory errors from Dispatcher client.
- #2790 - clean up memory leaks.
Matrix Reduce
- #2804 - Memory leak on switching between multiple result nodes fixed.
- #2803 - PSAM Logo diagrams from grid had parsing error.
- #1555 - matrixREDUCE did not work if used "Specify Pattern" option on LINUX and Mac platforms.
MINDy
- #2768 - Remove "Refresh Heat Map" button.
- #2911, 2949 - The grid service version of MINDy was using activated marker sets rather than the target marker set selected in its own GUI.
- #2912 - Mindy grid analysis using p-value throws Nullpointer exception.
- #2967 - Bonferroni correction was calculated using all markers, not just target set.
MRA
- Changed from two-sided to one-sided (right side, enrichment) FET calculation.
- #2822 - bar graph calculated using converted p-value instead of t-value.
- #2853 - MRA result node tooltip now shows number of master regulators.
- #2757 - changes to export buttons.
Menu Bar
- #2826 - Change "Export" to "File->Save->Dataset".
Logging
- #2719 - Add timestamps for geWorkbench startup and shutdown to stdout.log and stderr.log.
Pattern Discovery
- #2595 - Simplify parameter labels.
- #2664 - Problems when invalid characters entered.
- #2721 - Pattern Discovery component converted to regular Analysis component.
- #2898 - Problems in error dialog when invalid parameters entered.
- #2976 - Problem with display of motif hits across lines on full sequence view.
- #2977 - Problem with display of motif on scrolling view.
Project Folders
- #1025 - Fixed problem with representing arrays assigned to more than one set in an EXP format file.
- #2691 - Display hover text with pattern count for pattern nodes.
Sequence Retriever
- #2023 - Warn user if a query marker has no annotation.
- #2840 - Add option to only show one transcript per start site

Changes to Wiki tutorials

All analysis, filtering and normalization chapters updated to reflect new dynamic-menu based access to these components and removal of the old "Command Area".

BLAST - all screenshots of analysis parameter setting panels were recreated. Text was updated as appropriate to describe new analysis setup and other minor changes.
caArray - all relevant screenshots updated because the merge button has been removed. Text updated to explain automatic merge and naming of merged set after experiment only.
CNKB - Update text and screenshots pertaining to interactome export to project or file.
Color Mosaic - update about memory limit on screenshot size.
Cytoscape - UniProt LinkOut workaround described.
Data Subsets - Arrays - Added function "Load Set" for loading array sets, plus dynamic search updated. Described using second column of arrays file ("Load Set") to hold set names. Many screenshots updated.
Dataset Details - pattern node hover text.
EVD - tutorial updated from Help, then ported back to Help.
Filtering - dynamic search described.
Fold Change - document error condition handling.
Gene Ontology Results Viewer - table sorting noted.
genSpace - details of how not-yet accepted friend requests are handled were added, as well as denied requests and canceling requests. Noted that filtering and normalization events now captured. Added detail on depiction of repeated steps in workflows (linear vs loops). Limit of 150 on displayed workflows.
Grid Services (caGrid) - updated to describe new caTransfer usage, new screenshots of analysis window with URLs.
Local Data Files - relevant text and most screen shots updated to reflect removal of "merge" radiobutton and implementation of automerge for microarray data, and to give details of new features such as loading of pattern files.
MenuBar - Options for saving files (exp, pdb, adj, fasta). Previously was called "Export". Now same as project right-click file save options.
MINDy - revise to remove "Refresh Heat Map" button, add "Export" button. Reshoot most screenshots to update those buttons and analysis framework.
MRA - Heavily revised to incorporate addition of MARINa, changes in export options, and changes in bar graphs. All new screenshots.
Pattern Discovery - All screenshots revised for new layout, from change to Analysis component and also layout cleanup.
Project Folders -
- Options for saving files (exp, pdb, adj, fasta).
- Option to save microarray to tab-delimited format.
- Add description of how an array can be assigned to multiple sets (within one list) in an EXP format file.
Promoter - all screenshots updated to reflect recent GUI changes.
Sequence Retriever - Warn user if a query marker has no annotation. Add option to only show one transcript per start site. All screenshots update to reflect new option.
Viewing a Microarray Dataset - Export displayed data in spreadsheet format.

Changes to Online Help

all existing "Online Help" chapters that were previously ported from the Wiki were updated as needed (essentially all of them).
The following wiki tutorials were newly ported to Online Help:
- Basics - ported to "Introduction" on Online Help.
- caArray
- File Formats
- Fold Change Analysis
- Gene Ontology Analysis
- Gene Ontology Viewer
- Hierarchical Clustering
- Information Panel - replaced separate Comments, History and Experiment Information entries.
- Local Data Files
- Project Folders
- SOM

Components

List of Included Components

Data Managmenent:

Arrays/Phenotypes
Markers
Preferences
Project Panel
Session manager - no one knows what this is - probably a SOAP interface. But it is definitely needed!

File input formats

Affy File Format
CEL File Loader
Exp. Format
FASTA Format
Genepix File Format
PDB Structure Format
Tab-delimited (RMA Express Format)

Connectivity

caArray2 - updated to support caArray 2.3.0 in release 1.8.0 (released September 2009). The caArray client jar is NOT backwards-compatible with any previous versions.

Data filters

Filtering
Affy Detection Call Filter
Coefficient of Variation (new)
Deviation Filter
Expression Threshold Filter
Genepix Filter (Two channel filter)
Genepix Flag Filter
Missing Values Filter
Multiple Probeset Dilter
Entrez GeneID Filter

Normalization

HouseKeeping Genes Normalizer
Normalization
Log2 Tranformation
Marker Centering Normalizer
Mean Variance Normalizer
Missing Values (Normalizer)
Microarray Centering Normalizer
Quantile Normalizer
Threshold Normalizer

Experiment Information

Dataset Annotation
Dataset History
Experiment Info
Version Information

Analyis/Visualization

Alignment Results
Analysis
ANOVA
ARACNe2 - adds Adaptive Partitioning algorithm and Preprocessing mode.
caBIO Pathways (this has been integrated in the Marker Annotations component)
Cancer Gene Index integration in the Marker Annotations component.
CELImageViewer
Cellular Networks Knowledge Base
Color Mosaic
Component Configuration Manager.
Cytoscape_V2_8 - updated version of Cytoscape.
Dendrogram
Expression Profiles
Expression Value Distribution
Fold-change Analysis
Gene Ontology Enrichment Analysis and Display
genSpace collaborative framework
Hierarchical Clustering Analysis
IDEA
Image Viewer
Jmol
Marker Annotations
MarkUs - Analysis and Viewer
MRA - Master Regulator Analysis
MatrixREDUCE
Microarray Viewer
MINDy - Analysis and Viewer
Pattern Discovery
Position Histogram
Pudge
Promoter
Scatter Plot
Sequence
Sequence Alignment
Sequence Retriever
SOM Analysis
SOM Clusters
t Test Analysis
Tabular Microarray Viewer
Volcano Plot

GenePattern components
- PCA (GenePattern) - Analysis and Viewer
- K-nearest neighbors (GenePattern)
- SVM 3.0 (GenePattern) - Analysis and Viewer - include, we need to develop online help and tutorial (Aris).
- WV - Weighted Voting (GenePattern)
- GSEA

Excluded and Dropped Components

The release creation script in build.xml now explicitly includes components by name (previously it excluded components by name) The following is a list of modules known to be excluded.

Excluded components

The following components are excluded for a variety of reasons, most often due to lack of formal requirements documentation or/and associated system test scripts. Some of them should be scheduled for inclusion in the next production release. For modules not found in the current all.xml a path to the component is shown.

Still under development:

CART (GenePattern) - this component has not yet been released. Is part of another component and must be excluded manually from the final installer release build.
Cancer-GEMS (awaiting further development from NCI)
NetBoost
- EdgeListFileFormat (NetBoost)
Evidence Integration
MEDUSA

Not actively being developed:

GCRMA Via R CEL Loader (in \geworkbench\src\org\geworkbench\components\parsers)
Multi-t-test (OK, but need to understand when it would be used, e.g. after ANOVA, and if it is what we really want).
SMLR - Sparse Multinomial Logistic Regression - implementation by John Watkinson.
SVM Format (in \geworkbench\src\org\geworkbench\components\parsers) (left over from a John Watkinson project).
Synteny (in \geworkbench\components\alignment\src\org\geworkbench\components\alignment\client)
t-profiler
caScript

Dropped components

These components are not expected to be used again.

CuteNet (GeneWays)
Column Major Format (in \geworkbench\src\org\geworkbench\components\parsers)
Frequency Threshold Filter (There is a class called AllelicFrequencyThresholdFilter in \geworkbench\components\filtering\src\org\geworkbench\components\filtering)
GeneOntology (the original component, now replaced by geneontology2/Ontologizer2.0)
Genotypic File Format (in \geworkbench\src\org\geworkbench\components\parsers\genotype)
Network Browser (was part of Reverse Engineering - would require major rewrite to revive. PathwayDecoder is module name)
Pattern Discovery Algorithm (association analysis)
Patterns (Pattern Panel) - Omit from release - Appears to have been superseded by the Sequence component.
Reverse Engineering (non-ARACNE, unpublished algorithm. PathwayDecoder is module name)
Simulation (a student project)

Note - the original "interactions" component was dropped and reimplemented as the Cellular Networks Knowledge Base. It took a brief detour as being called component "interactions2".

Externally supplied components

The following components originate external to the geWorkbench source tree:

MatrixReduce

Source

MatrixReduce source code was obtained from the Bussemaker lab and a modified copy saved under: adcvs.cu-genome.org:/cvs/magnet/matrixreduce_distribution. This modified copy contains Java API changes made to integrate with geWorkbench.

Compiling

MatrixReduce is compiled using the following commands:

FitModel binary is compiled manually as follows
- gcc -c -O2 -mno-cygwin -funroll-loops *.c
- gcc -mno-cygwin -static nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModel –lm (for windows and linux)
- gcc -mno-cygwin nrutil.o fncs_cmns.o fncs_seqs.o fncs_tdat.o fncs_seed.o fncs_app1.o fncs_app2.o fncs_nrcs.o fncs_topo.o fncs_mylm.o fncs_bits.o FitModel.o -o FitModelMac –lm (for Mac)

API jar: The Java API jar is created with the makefile, command "make jar".
FitModel binary is compiled manually with gcc, with extra flags to tell it to not use Cygwin, to optimize and to unroll loops
FitModel.exe bundles both the NR (Numerical Recipies) and GNU libraries.

The API jar is created with the makefile under MatrixREDUCE's top directory.

Notes

See comment on white spaces in file names/paths in Mantis : http://mantis.cu-genome.org/view.php?id=1316

Aracne.jar for MINDY

Although ARACNE is a geWorkbench component, the MINDY component uses a version of ARACNE that is externally maintained. The file aracne.jar is copied directly into the geWorkbench CVS tree.

The location of the external ARACNE code is:

The version of the external ARACNE code is:

MINDy jar file for caGrid

Source tree is kept in the geWorkbench local CVS repository.
Current version is MINDY-0.3.jar
Compile with ant dist-jar. The final jar file will be in the "dist" directory.

GeWorkbench Release 2.3