Release Descriptions

Revision as of 17:27, 15 July 2013 by Floratos (talk | contribs)

Brief overview of changes in geWorkbench release 2.4.1

Version 2.4.1 is a bug fix release to deal with a few specific issues:

  • BLAST - Due to changes in the HTML output format of BLAST result pages from NCBI, results from an NCBI BLAST search could no longer be parsed into geWorkbench. geWorkbench will now rely primarily on NCBI's XML result format, which should remain more stable.
  • ANOVA - A problem with activated marker sets has been corrected. Incorrect markers may have been used.
  • Installation - The installer will now install geWorkbench to the user's home directory on all platforms.

geWorkbench 2.4.1 is only intended for use with Java 6. A few incompatibilities with Java 7 are known to exist.

View the geWorkbench 2.4.1 Release Notes.

Brief overview of changes in geWorkbench release 2.4.0

Release Date: July 23, 2012

  • Significance Analysis of Microarrays (SAM) - An interface to an R implementation of SAM has been added. The user can choose an instance of R running on his or her own desktop, or access a remote grid service version with proper authorization.
  • Master Regulator Analysis - Major updates have been made to the MRA component.
    • The FET-based method now has the option of performing two FET runs on orthogonal slices of the data to determine the regulatory mode of the candidate master regulators.
    • The graphical result display has been completely updated. It now can display multiple bar graphs for multiple master regulator candidates, and uses a rank based ordering of the bars rather than the t-statistic value.
  • SkyBase - Adds access to the much larger PDB-60 database of homology model structures. As of 7/19/2012, the databases have:
    • PDB60: 12,264 structures, 7,804,258 models.
    • NESG: 946 structures, 1,943,390 models.
  • Affymetrix Human Gene ST whole-transcript and Human Exon ST annotation files - support has been added for Affymtrix Whole-Transcript Gene and Exon Array transcript-level annotation files. Examples include the Gene and Exon 1.0 ST arrays, and the new Gene 2.0 ST array. All whole-transcript Exon and Gene ST arrays use this format.
  • caArray - The interface to caArray can now query the newly released caArray version 2.5. However, it is not backward compatible with caArray 2.4 or earlier.
  • t-test - The t-test is now calculated using the Apache Commons Math Library. P-values may show slight changes due to improved precision.


Some, but not all, known incompatibilities with Java 7 have been corrected. geWorkbench 2.4.0 is only intended for use with Java 6.

View the geWorkbench 2.4.0 Release Notes.

Brief overview of changes in geWorkbench release 2.3.0

Release Date: March 16, 2012

geWorkbench v2.3.0 introduces significant improvements in responsiveness and memory usage, and a streamlining of the graphical interface to make using analysis, filtering, normalization and visualization components much easier.

  • The analysis, filtering and visualization components are now reached through a right-click menu directly on the data node, or through the commands menu in the upper menu bar. This allowed the removal of the dedicated "commands" area from the geWorkbench graphical interface, making more room available for the display of results.
  • Switching back and forth between large data nodes is now much faster.
  • caArray downloads have been speeded up dramatically, and memory problems that limited the number of arrays that could be downloaded were resolved. We have test-downloaded 527 arrays of type Affymetrix HT_HG-U133A in 16 minutes with no memory problems.
  • Dynamic search for marker and gene names has been added to all filtering components.
  • A number of data and result export options have been added. Microarray data can now be exported to a tab-delimited file directly, or from the tabular viewer, allowing subsets of the data to be exported.
  • Interactomes stored in the Cellular Network Knowledge Base can now be exported directly into the Project Folders component.
  • A new component, IDEA (Interactome Dysregualtion Enrichment Analysis), is included.

Brief overview of changes in geWorkbench release 2.2.2

Release Date: 8/19/2011

  • In the caArray download dialog, arrays are now sorted by name.
  • The following problems in using the Gene Ontology components were fixed:
    • Marker sets returned from the Tree Browser was not properly appearing in marker selection pulldown menus (e.g. ARACNe hub marker selection).
    • After restoring a saved workspace, the reference gene list in the GO analysis component was not populated.
    • In some cases, the GO Tree Browser returned markers having no EntrezID along with the expected markers.

View the geWorkbench 2.2.2 Release Notes and the change list.

geWorkbench 2.2.2 can be downloaded from the NCI GForge site at https://gforge.nci.nih.gov/frs/?group_id=78. Installation instructions can be found on the Download and Installation page of this Wiki. The Release Notes are also available on GForge.

Brief overview of changes in geWorkbench release 2.2.1

Release Date: 7/29/2011

This release augments the recently released version 2.2.0, and includes more than 90 additional enhancements and bug fixes, with many focused on network import, display and export, sequence retrieval, and pattern discovery. It also corrects two omissions from release 2.2.0.

  • Alternate file viewer for large networks that may be too large to view in Cytoscape.
  • Improvements to display of probeset-level networks.
  • Networks can now be imported either from ARACNe adjacency matrix files or from SIF files. Networks can be represented by gene symbols, Entrez IDs, probeset names, or other identifiers.
  • Adds support for use of alternate ontology files (e.g. from the GO website) for gene ontology analysis.
  • Sequence retrieval for DNA sequences is now based on refSeq records from the UCSC refGene table, and is available for all organisms with genomes supported by UCSC.
  • The new "Fold Change" analysis component, omitted from release 2.2.0, is included.
  • A feature to overlay a t-test result onto a Cytoscape network, not functional as released in version 2.2.0, now works correctly.
  • Bonferroni correction added to the ARACNe GUI.

For a full list of changes, see Changes.

Brief overview of changes in geWorkbench release 2.2.0

geWorkbench 2.2.0 is a major release containing more than 180 new features, enhancements and bug fixes. The most important of each are summarize below.

Overview of new features

  • New network comparison and manipulation features were added to the Cytoscape component,allowing comparison of CNKB interactomes to other expression datasets, projection of t-test results onto a network, and creation of subnetworks from the result of these comparisons.
  • The Gene Ontology component can now serve as a full, standalone GO term browser.
  • Options for import and export of interaction networks (interactomes) were added.
  • Two new filters were added to give the user options to deal with many-to-many relationships between genes and markers (probesets).
  • Two new analysis components were added. One calculates differential expression fold-change, and the second provides a front-end to GenePattern GSEA (Gene Set Enrichment Analysis).
  • Improvements were made to the Master Regulator Analysis component - it can now accept any user-supplied list of "signature genes", and the "bar-code" display was altered to match the style used in recent publications.
  • A new Social Center feature was added to the genSpace component, allowing users to directly interact with each other. Improvements in workflow

saving and visualization were also made.

  • A major code refactoring was completed, simplifying a number of core data structures to improve performance and prepare for the development of a new web version.

New features in detail

  • CNKB
    • (#2389) Export complete CNKB interactomes (SIF or ADJ formats). The Cellular Network Knowledge Base (CNKB) stores a number of "interactomes" derived from computational analysis of various types of gene expression data. These interactomes, and other interaction datasets, can be queried in geWorkbench to find targets which interact with genes of interest. With this release, complete interaction networks stored in the CNKB can be exported to a file, in either SIF format or as an ARACNe-format adjacency matrix.
  • Cytoscape
    • (#2424) Compare differential expression results to an interaction network ("interactome"). This feature calculates the Pearson's correlation coefficient for the expression profiles of each pair of nodes connected by an edge in an interaction network. Only those portions of the network which have a correlation coefficient above a user-set threshold will be displayed.
    • (#2424) Create a new subnetwork containing only edges exceeding the calculated correlation threshold.
    • (#2429) From an existing network, create a subnetwork containing only nodes in a marker set defined in the Markers component.
  • File Parsers
    • (#2388) Import an ARACNe adjacency matrix from a file. Either gene symbols or probeset names can be used.
  • Filters
    • Multiple probeset per gene filter (#2444) - For genes with multiple probesets (markers), retain only the probeset with: (a) highest coefficient of variation, (b) highest mean, or (c) highest median expression across all arrays.
    • Multiple Entrez GeneID Filter (#2445) - Filter out markers which are annotated to (a) no Entrez gene id, or (b) multiple Entrez gene ids.
  • Fold-change Analysis
    • (#2431) This is a new component that performs fold-change analysis and places markers that pass the specified threshold into two new sets in the Markers component, one for markers with positive fold-change, and the other for those with negative fold-change.
  • Gene Ontology
    • The Gene Ontology component is now always available when a microarray dataset has been loaded along with its annotation file. The GO Tree can be browsed or searched for any term. The markers annotated to any term can be returned to a new set in the Markers component.
    • (#1875) The most recent Gene Ontology OBO file is now downloaded automatically from the internet when geWorkbench is started, with the option to instead load a specified OBO file from disk.
  • GenePattern GSEA
    • A front-end to GSEA running on any GenePattern server has been added.
  • genSpace
    • A new social networking feature, the Social Center, has been implemented, allowing users to directly interact with friends or create networks (chat and share).
    • Users can now create their own Workflow Repository, where they can collect and comment on their favorite workflows.
    • The GUIs for Workflow Visualization, Real Time Workflow Suggestion and Workflow Statistics have been updated.
  • Master Regulator Analyis (#2523 and others)
    • Allow any user-supplied list of markers to be used for the phenotype signature.
    • Bar-code graph revised to match style of published work.

Other Enhancements/Fixes

  • ARACNe
    • (#2366) - In ARACNe, bootstrapping is re-enabled but only single threaded.
    • (#2482) - ARACNe results can now be pruned to retain only highest MI edge per gene-gene pair, or return all edges.
  • BLAST (#2419) - Continued improvements to BLAST interface to match NCBI website functionality and to improve usability.
  • CCM - "Sequence Analysis" has been retitled as "BLAST Analysis", "Alignment Viewer" has been retitled as BLAST Alignment Viewer.
  • Gene Ontology Viewer (#2391) - When a marker set is returned for a GO term, the set is given the term name.
  • GEO Soft (#2402, #2462, #2465) - GEO Soft parsers improved to handle various special cases - multiple platforms, missing values, mixed sample and data matrix files...
  • Grid Services (#1773) - Simplified grid service activation (removed one radio button).
  • JMOL (#2505) - Updated to JMOL version 12.0.35.
  • Markers/Arrays component (#2430) - Dynamic filtering of displayed marker or array list as search term is entered.
  • MarkUs
    • (#2500) - Add ability to retrieve prior MarkUs jobs by job id
    • (#2509) - Add private key option to MarkUs job submission
  • MINDy (#2214) - Corrected sign of modulation effect in table displays.
  • Pattern Discovery (#2119) - Corrected display problems in Pattern Discovery related to regular expressions and use of substitution matrices.
  • Preferences (#2393) - Added ability to reorder data sorted by marker name, gene name, or original order (set in preferences, affects all components).
  • Sequence Retriever (#2518) - Fixed problem with obtaining name of latest human genome build from Santa Cruz.
  • Tabular Microarray Viewer(#2253) - Tabular Microarray viewer now allows adjustable precision in display, and choice of fixed or scientific notation.
  • t-test (#1626) - Changed math package used in order to correct precision problem with p-value calculation at very small p-values.
  • Volcano Plot (#2492) - extreme point color range corrected.

Changes to documentation (Tutorial sections)

Note - not all of the new features described above have been documented as of the release date, but they will be added as quickly as possible to the tutorial chapters on this website.

  • ARACNe
    • Corrected and improved descriptions of DPI Tolerance and DPI Target List.
    • Added Technical Notes section and usage notes.
    • Mapped relationship of ARACNe command line options -s and -l to ARACNe in geWorkbench.
    • Rewrote introduction.
  • CNKB - all screenshots on tutorial page updated to reflect
    • Interactome and version details display.
    • Illustration of new display options in Cytoscape (see below).
  • Cytoscape - Added new section to show all new features for release 2.2
    • Correlation Overlay,
    • t-test overlay,
    • subnetwork creation,
    • interaction type coloring etc.
  • Data Subsets - Arrays
    • Many new and revised screenshots added.
    • Added new full section on visual properties editor.
    • Now call the collections of sets "Lists" (pulldown menu entries).
    • Much material was previously in a separate "Examples" section; this has now all been moved into the primary description of each menu item.
  • Data Subsets - Markers
    • All the same changes made to "Data Subsets - Arrays" were also made to "Data Subsets - Markers".
    • Material unique to the Markers component was added as needed.
  • File Formats - Error dialog offering 3 choices of action when duplicate entries are encountered in an annotation file (#1624).
  • Local Data files - full update, including
    • updated descriptions of GEO files.
    • updated screenshots and descriptions,
    • added additional material about file browser, merging, annotation files....
  • Menu Bar - New tutorial written to cover all actions available in the top level menu bar.
  • Pattern Discovery - all screenshots replaced. Changes reflect:
    • Fixed display of regular expression matches in Full Sequence view after running with "Exact" unchecked.
    • Better illustration of multiple pattern displays in viewer.
    • In Advanced tab, matrix and threshold settings now disabled when "Exact" is checked.
  • Projects
    • Improved descriptions and added additional screenshots throughout
    • Added missing options such as RCSB PDB.
    • Add new section on Workspaces
  • SVM - all new chapter written. It is updated to reflect small changes to the Test tab GUI.
  • t-test
    • Fully revised all sections.
    • Added sections on Volcano Plot and Color Mosaic

Brief overview of changes in geWorkbench release 2.1.0

Release Date: September 10, 2010.

  • BLAST
    • A major upgrade of the built-in BLAST interface now provides almost all query options available on the NCBI BLAST website.
    • geWorkbench can retrieve full or partial sequences for BLAST hits.

A recent change at NCBI caused this to stop working. This is fixed in this release.

  • Filtering
    • A new Coefficient of Variation data filter has been added. This scales expression profile standard deviations by their means, so that profiles can be filtered on a directly comparable measure of variation.
  • Gene Ontology expandable tree views were added to the
    • Gene Ontology Enrichment viewer
    • Cellular Network Knowledge Base (CNKB) viewer
  • System Information tool
    • A menu item was added which provides system information such as Java memory allocated and used, path to the current JRE, and Operating System details.
  • Arrays component
    • The members of an array set can now be saved as a list to a file on disk, matching functionality already present for markers.
  • Online Help chapter updates (help files built-in to geWorkbench)
    • BLAST (Sequence Alignment component) – fully revised.
    • Filtering - added section for Coefficient of Variation filter.
    • MINDy - added section on using ARACNe preprocessing.
    • Pattern Discovery – fully revised.
  • Cytoscape component
    • Updated to Cytoscape version 2.7.0.
  • Bugs
    • A number of bugs were fixed; full details are available in the Release Notes.
  • Refactoring
    • A project of ongoing refactoring and simplification was continued in order to enhance long-term maintainability and performance of the code.

Brief overview of changes in geWorkbench release 2.0.2

Release Date: July 16, 2010.

  • Fixed problem with genSpace logging.
  • Fully revised Online Help chapter for MINDy.

Brief overview of changes in geWorkbench release 2.0.1

Release Date: June 25, 2010

  • Fixed a problem with caGrid connectivity.
  • Fully revised Online Help chapter for the Cellular Networks Knowledge Base (CNKB) component.

Brief overview of changes in geWorkbench release 2.0.0

Release Date: June 9th, 2010

Major new features

  • Filtering - completely revamped - now works directly for all modes, allows specification of minimum % matching arrays before filtering occurs.
  • File parsers added:
    • MAGE-TAB data matix
    • GEO Soft format - added series (GSE) and curated matrix (GDS).
  • Java 6 - Moved from Java 5 to Java 6. geWorkbench now requires Java 6. Works on both 32 bit and 64 bit VMs (JREs).
  • Look and Feel - Switched to new, more modern Look and Feel (Nimbus). geWorkbench appearance now consistent across all platforms.
  • caBIO component updated from 4.2 to 4.3.
  • Cellular Network Knowledge Base (CNKB) - Revamped interface to allow choice of interactome and data types.
  • More than 250 additional "bug reports" were closed. These included improvements in the usability of numerous components, and actual bug fixes.

New Components

  • Skybase - SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. It is queried using FASTA-format protein sequence files.
  • Skyline - A high-throughput comparative modeling pipeline. It creates structural homology models for protein sequences with similarity to a protein with an experimentally determined 3-D structure. The input is a PDB file. (Depends on an internal server, external use not yet enabled).
  • Pudge - Interface to a protein structure prediction server which integrates tools used at different stages of the structural prediction process. Modeling starts with a FASTA-format protein sequence file.


Other major changes

  • caArray - Improved memory usage on downloads from caArray.
  • CNKB - Can now return markers direct from CNKB without use of Cytoscape.
  • Color Mosaic - enhancements to display (bug 2147):
    • toggle array names on/off
    • search on array name, accession, or label
  • Component Configuration Manager - now can filter display list by categories: Analysis, Viewer, Normalizer, Filter.
  • Cytoscape - Corrected mapping between gene names in Cytoscape display and markers in Marker Sets panel (now uses Entrez IDs).
  • Dendrogram - can now create Array subsets as well as marker subsets.
  • Markers and Arrays - Hover text available in Markers and Arrays phenotypes to visualize long names if needed.
  • Marker Annotation - search results can be saved to a text file, including relevant URLs and pathway BioCarta pathway names.
  • File loading - Checking for "out of memory" errors during file loading.
  • GUI - in switching to new Look and Feel, fixed many text highlighting problems that were previously seen on Macintosh only but now appeared on Windows also.
  • File parser menu - The file parser selection menu now shows valid file extensions for each type.
  • Promoter - JASPAR promoter motifs now filterable by taxon.
  • Sequence alignment (BLAST) - many enhancements, including added additional databases to match those listed at NCBI improved handling of results from searches containing long query sequences.

Brief overview of changes in geWorkbench release 1.8.0

geWorkbench version 1.8.0 was released on November 5, 2009.

geWorkbench 1.8.0 adds one new component for calculating Gene Ontology enrichment using Ontologizer 2.0. It also has been updated to connect with the new caArray 2.3.0 Java API. However, geWorkbench 1.8.0 is not backward compatible with earlier versions of caArray.

The geWorkbench 1.8.0. release notes are available at Release Notes

The geWorkbench application can be downloaded from NCI's GForge site.


Major new features in 1.8.0

Gene Ontology Enrichment - A new pair of components, called GO Terms Analysis and GO Terms Visualization have been released. The Analysis component is built on Ontologizer 2.0. This component performs "overrepresentation analysis" on a supplied list of genes. It offers a number of advanced methods through the Ontologzier 2.0 engine.


Other changes in release 1.8.0

  1. caArray - Update caArray component to use caArray 2.3.0 Java API. Please note that geWorkbench 1.8.0 is not compatible with earlier versions of caArray.
  2. CNKB - The network graph generated by CNKB was only showing nodes centered about a focus node. Now all accepted nodes will be displayed.
  3. Dataset History - Additions for several modules.
  4. Grid Services - A number of fixes to grid services were made.
  5. Marker Annotations - Fixed a problem with retrieving marker annotations when microarray data downloaded from caArray.
  6. Mark-Us - JMOL dependency added for molecule display.
  7. Promoter - Update JASPAR motifs to release of December 2007. -Note on October 12, 2009 a new version of JASPAR was released which made an incompatible change in the file format.
  8. Promoter - component now displays logos using the "Schneider" method, including his "small-value correction", rather than using a previous "in-house" method.
  9. Promoter - the displayed data now does not include the effects of the pseudo-count normalization process.
  10. Promoter - Added ability to specify pseudocount or select previous hard-coded option of square root of number of sequences.
  11. Promoter - Loaded TFs now are properly added to the list of available TFs.
  12. Sequence Alignment (BLAST) - PFP filtering option removed
  13. Usability fixes - operation of cancel buttons, progress bar.
  14. Release Notes - Added specific installation instructions.

Brief overview of changes in geWorkbench release 1.7.0

Major new features in 1.7.0

  • Marker Annotations - The Marker Annotations component now includes direct access to NCI Cancer Gene Index annotations. It supplies detailed literature-based annotations on a curated set of cancer-related genes.
  • Grid Services - All geWorkbench grid services were updated to use caGrid v1.3, with caTransfer used for transferring large data sets.
  • ARACNe2 - cellular regulatory network reverse engineering - This release includes a new version of ARACNe, called ARACNe2, from the lab of Andrea Califano at Columbia University. The new version adds the option to preprocess the user's dataset to obtain optimal runtime parameters. It also adds a new algorithm, Adaptive Partitioning, for calculating the mutual information between gene expression profiles. Adaptive partitioning is much faster and is considered to be more accurate than the previous algorithm.
  • Component Configuration Manager (CCM) - The CCM allows individual components to be loaded and unloaded as desired, allowing geWorkbench to be customized to your needs.
  • genSpace collaborative framework - discovery and visualization of workflows. Implemented user registration, preferences, and enhancements to function.

Newly released analysis components in 1.7.0

  • MarkUs - The MarkUs component assists in the assessment of the biochemical function for a given protein structure. It serves as an interface to the Mark-Us web server at Columbia. Mark-Us identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile.
  • MRA - The Master Regulator Analysis component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control.
  • Pudge - Interface to a protein structure prediction server (developed in the lab of Barry Honig at Columbia University) which integrates tools used at different stages of the structural prediction process.
  • SVM 3.0 (GenePattern) - Support Vector Machines for classification. Provides an interface to remote execution on a GenePattern server.

Other changes in release 1.7.0

  • Analysis - Parameter saving implemented in all components. If current settings match a saved set, it is highlighted.
  • ARACNe - improved description of DPI in Online Help.
  • caArray - query filtering on Array Provider, Organism and Investigator implemented.
  • caArray - can now add a local annotation file to caArray data downloads.
  • caGrid - caGrid connectivity is now built directly in to supported components rather than being a separate component itself.
  • caScript - The caScript editor is no longer supported.
  • Color Mosaic - now interactive with the Marker Sets list and Selection set.
  • Cytoscape - Upgrade to Cytoscape version 2.4 for network visualization and interaction.
  • Cytoscape - Set operations on genes being returned from Cytoscape network visualizations, via right-click menu.
  • Cytoscape - Changes to tag-for-visualization - e.g., now only one way, from marker set to Cytoscape, not vice-versa.
  • File loading - PDB protein structure files can now be loaded directly from the PDB database by structure name.
  • Gene Ontology file - the OBO 1.2 file format is supported.
  • Marker Annotations - add export to CSV file.
  • Marker Sets component - a set copy function was added.
  • MINDy - many improvements to display and results filtering - including marker set filtering.
  • Scatter Plot - Up to 100 overlapping points can be displayed in a single tooltip.
  • Various - A number of components were refactored.
  • Workspace saving - now works properly for all components.