geWorkbench

Overview

geWorkbench version 2.0.0 was released on June 9th, 2010. We recommend that all users upgrade to the latest version. The Release Notes and downloads can be obtained from https://gforge.nci.nih.gov/frs/?group_id=78, and installation instructions can be found on the Download and Installation page of this Wiki.

geWorkbench (genomics Workbench) is a Java-based open-source platform for integrated genomics. Using a component architecture it allows individually developed plug-ins to be configured into complex bioinformatic applications. At present there are more than 70 available plug-ins supporting the visualization and analysis of gene expression and sequence data. Example use cases include:

loading data from local or remote data sources.
visualizing gene expression, molecular interaction networks, protein sequence and protein structure data in a variety of ways.
providing access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory neworks reconstruction, BLAST searches, pattern/motif discovery, etc.
validating computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis.

geWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks (one of the 7 National Centers for Biomedial Computing funded through the NIH Roadmap). Additionally, geWorkbench is supported by caBIG^®, NCI's cancer Biomedical Informatics Grid initiative.

End-user and developer support for geWorkbench is provided through the caBIG^® Molecular Analysis Tools Knowledge Center, a component of the caBIG^® Enterprise Support Network.

Summary of changes in geWorkbench release 2.0.0

Major new features

Filtering - completely revamped - now works directly for all modes, allows specification of minimum % matching arrays before filtering occurs.
File parsers added:
- MAGE-TAB data matix
- GEO Soft format - added series (GSE) and curated matrix (GDS).
Java 6 - Moved from Java 5 to Java 6. geWorkbench now requires Java 6. Works on both 32 bit and 64 bit VMs (JREs).
Look and Feel - Switched to new, more modern Look and Feel (Nimbus). geWorkbench appearance now consistent across all platforms.
caBIO component updated from 4.2 to 4.3.
Cellular Network Knowledge Base (CNKB) - Revamped interface to allow choice of interactome and data types.
More than 250 additional "bug reports" were closed. These included improvements in the usability of numerous components, and actual bug fixes.

New Components

Skybase - SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. It is queried using FASTA-format protein sequence files.
Skyline - A high-throughput comparative modeling pipeline. It creates structural homology models for protein sequences with similarity to a protein with an experimentally determined 3-D structure. The input is a PDB file. (Depends on an internal server, external use not yet enabled).
Pudge - Interface to a protein structure prediction server which integrates tools used at different stages of the structural prediction process. Modeling starts with a FASTA-format protein sequence file.

Other major changes

caArray - Improved memory usage on downloads from caArray.
CNKB - Can now return markers direct from CNKB without use of Cytoscape.
Color Mosaic - enhancements to display (bug 2147):
- toggle array names on/off
- search on array name, accession, or label
Component Configuration Manager - now can filter display list by categories: Analysis, Viewer, Normalizer, Filter.
Cytoscape - Corrected mapping between gene names in Cytoscape display and markers in Marker Sets panel (now uses Entrez IDs).
Dendrogram - can now create Array subsets as well as marker subsets.
Markers and Arrays - Hover text available in Markers and Arrays phenotypes to visualize long names if needed.
Marker Annotation - search results can be saved to a text file, including relevant URLs and pathway BioCarta pathway names.
File loading - Checking for "out of memory" errors during file loading.
GUI - in switching to new Look and Feel, fixed many text highlighting problems that were previously seen on Macintosh only but now appeared on Windows also.
File parser menu - The file parser selection menu now shows valid file extensions for each type.
Promoter - JASPAR promoter motifs now filterable by taxon.
Sequence alignment (BLAST) - many enhancements, including added additional databases to match those listed at NCBI improved handling of results from searches containing long query sequences.

Summary of changes in geWorkbench release 1.8.0

geWorkbench version 1.8.0 was released on November 5, 2009.

geWorkbench 1.8.0 adds one new component for calculating Gene Ontology enrichment using Ontologizer 2.0. It also has been updated to connect with the new caArray 2.3.0 Java API. However, geWorkbench 1.8.0 is not backward compatible with earlier versions of caArray.

The geWorkbench 1.8.0. release notes are available at Release Notes

The geWorkbench application can be downloaded from NCI's GForge site.

Major new features in 1.8.0

Gene Ontology Enrichment - A new pair of components, called GO Terms Analysis and GO Terms Visualization have been released. The Analysis component is built on Ontologizer 2.0. This component performs "overrepresentation analysis" on a supplied list of genes. It offers a number of advanced methods through the Ontologzier 2.0 engine.

Other changes in release 1.8.0

caArray - Update caArray component to use caArray 2.3.0 Java API. Please note that geWorkbench 1.8.0 is not compatible with earlier versions of caArray.
CNKB - The network graph generated by CNKB was only showing nodes centered about a focus node. Now all accepted nodes will be displayed.
Dataset History - Additions for several modules.
Grid Services - A number of fixes to grid services were made.
Marker Annotations - Fixed a problem with retrieving marker annotations when microarray data downloaded from caArray.
Mark-Us - JMOL dependency added for molecule display.
Promoter - Update JASPAR motifs to release of December 2007. -Note on October 12, 2009 a new version of JASPAR was released which made an incompatible change in the file format.
Promoter - component now displays logos using the "Schneider" method, including his "small-value correction", rather than using a previous "in-house" method.
Promoter - the displayed data now does not include the effects of the pseudo-count normalization process.
Promoter - Added ability to specify pseudocount or select previous hard-coded option of square root of number of sequences.
Promoter - Loaded TFs now are properly added to the list of available TFs.
Sequence Alignment (BLAST) - PFP filtering option removed
Usability fixes - operation of cancel buttons, progress bar.
Release Notes - Added specific installation instructions.

Summary of changes in geWorkbench release 1.7.0

Major new features in 1.7.0

Marker Annotations - The Marker Annotations component now includes direct access to NCI Cancer Gene Index annotations. It supplies detailed literature-based annotations on a curated set of cancer-related genes.
Grid Services - All geWorkbench grid services were updated to use caGrid v1.3, with caTransfer used for transferring large data sets.
ARACNe2 - cellular regulatory network reverse engineering - This release includes a new version of ARACNe, called ARACNe2, from the lab of Andrea Califano at Columbia University. The new version adds the option to preprocess the user's dataset to obtain optimal runtime parameters. It also adds a new algorithm, Adaptive Partitioning, for calculating the mutual information between gene expression profiles. Adaptive partitioning is much faster and is considered to be more accurate than the previous algorithm.
Component Configuration Manager (CCM) - The CCM allows individual components to be loaded and unloaded as desired, allowing geWorkbench to be customized to your needs.
genSpace collaborative framework - discovery and visualization of workflows. Implemented user registration, preferences, and enhancements to function.

Newly released analysis components in 1.7.0

MarkUs - The MarkUs component assists in the assessment of the biochemical function for a given protein structure. It serves as an interface to the Mark-Us web server at Columbia. Mark-Us identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile.
MRA - The Master Regulator Analysis component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control.
Pudge - Interface to a protein structure prediction server (developed in the lab of Barry Honig at Columbia University) which integrates tools used at different stages of the structural prediction process.
SVM 3.0 (GenePattern) - Support Vector Machines for classification. Provides an interface to remote execution on a GenePattern server.

Other changes in release 1.7.0

Analysis - Parameter saving implemented in all components. If current settings match a saved set, it is highlighted.
ARACNe - improved description of DPI in Online Help.
caArray - query filtering on Array Provider, Organism and Investigator implemented.
caArray - can now add a local annotation file to caArray data downloads.
caGrid - caGrid connectivity is now built directly in to supported components rather than being a separate component itself.
caScript - The caScript editor is no longer supported.
Color Mosaic - now interactive with the Marker Sets list and Selection set.
Cytoscape - Upgrade to Cytoscape version 2.4 for network visualization and interaction.
Cytoscape - Set operations on genes being returned from Cytoscape network visualizations, via right-click menu.
Cytoscape - Changes to tag-for-visualization - e.g., now only one way, from marker set to Cytoscape, not vice-versa.
File loading - PDB protein structure files can now be loaded directly from the PDB database by structure name.
Gene Ontology file - the OBO 1.2 file format is supported.
Marker Annotations - add export to CSV file.
Marker Sets component - a set copy function was added.
MINDy - many improvements to display and results filtering - including marker set filtering.
Scatter Plot - Up to 100 overlapping points can be displayed in a single tooltip.
Various - A number of components were refactored.
Workspace saving - now works properly for all components.