Home

Revision as of 18:13, 23 May 2011 by Smith (talk | contribs) (Changes to documentation)

Quick Start

Please see the Quick Start guide to geWorkbench to see how to get started using geWorkbench right away. We are continuing to develop new material for this guide.

Overview

Welcome to geWorkbench. The current version is 2.1.0, released September 10th, 2010.


The latest Release Notes and downloads can be obtained from https://gforge.nci.nih.gov/frs/?group_id=78. Installation instructions can be found on the Download and Installation page of this Wiki.


geWorkbench (genomics Workbench) is a Java-based open-source platform for integrated genomics. Using a component architecture it allows individually developed plug-ins to be configured into complex bioinformatic applications. At present there are more than 70 available plug-ins supporting the visualization and analysis of gene expression and sequence data. Example use cases include:

  • loading data from local or remote data sources.
  • visualizing gene expression, molecular interaction networks, protein sequence and protein structure data in a variety of ways.
  • providing access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory neworks reconstruction, BLAST searches, pattern/motif discovery, etc.
  • validating computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis.


geWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks (one of the 7 National Centers for Biomedial Computing funded through the NIH Roadmap). Additionally, geWorkbench is supported by caBIG®, NCI's cancer Biomedical Informatics Grid initiative.


End-user and developer support for geWorkbench is provided through the caBIG® Molecular Analysis Tools Knowledge Center, a component of the caBIG® Enterprise Support Network.

Graphical User Interface

GeWB GUI Cytoscape.png



Summary of changes in geWorkbench release 2.2.0

Expected release date: May 23rd/24th, 2011.

New Features

  • CNKB
    • (#2389) Export complete CNKB interactomes (SIF or ADJ formats).
  • Cytoscape
    • (#2424) Calculate the Pearson's correlation coefficient for the expression profiles of each pair of nodes connected by an edge in an interaction network. Filter edges to display based on the magnitude of the correlation coefficient.
    • (#2424) Create a new subnetwork containing only edges exceeding the calculated correlation threshold.
    • (#2429) From an existing network, create a subnetwork containing only nodes in a marker set defined in the Markers component.
  • File Parsers
    • (#2388) Import an ARACNe adjacency matrix from a file.
  • Filters
    • (#2444) Multiple probeset per gene filter - for genes with multiple probesets (markers), remove all but one probeset based on: retain only (a) highest coefficient of variation, (b) highest mean, or (c) highest median.
    • (#2445) Multiple Entrez GeneID Filter: Filter out markers which are annotated to (a) no Entrez gene id, or (b) multiple Entrez gene ids.
  • Fold-change Analysis
    • (#2431) This is a new component that performs fold-change analysis and places markers that pass the specified threshold into two new sets in the Markers component: 1 for markers with positive fold-change, and the other for those with negative fold-change.
  • Gene Ontology
    • The Gene Ontology component is now always available when a microarray dataset has been loaded along with its annotation file. The GO Tree can be browsed or searched for any term. The markers annotated to any term can be returned to a new set in the Markers component.
    • (#1875) Most recent Gene Ontology OBO file now downloaded automatically from internet when geWorkbench started, with option to instead use a specified file from disk.
  • GenePattern GSEA
    • A front-end to GSEA running on any GenePattern server has been added.
  • Master Regulator Analyis (#2523 and others)
    • Master Regulator Analysis component fully revised.
    • Now allow any list of markers to be used for the phenotype signature.
    • Bar code graph revised to match style of published work.

Other Enhancements/Fixes

  • ARACNe
    • 1. (#2366) In ARACNe, bootstrapping is re-enabled but only single threaded.
    • 2. (#2482) ARACNe results can now be pruned to retain only highest MI edge per gene-gene pair, or return all edges.
  • BLAST (#2419) Continued improvements to BLAST interface to match NCBI website functionality and to improve usability.
  • CCM - "Sequence Analysis" has been retitled as "BLAST Analysis", "Alignment Viewer" has been retitled as BLAST Alignment Viewer.
  • Gene Ontology Viewer (#2391) When a marker set is returned for a GO term, the set is given the term name.
  • GEO Soft (#2402, #2462, #2465) GEO Soft parsers improved to handle various special cases - multiple platforms, missing values, mixed sample and data matrix files...
  • Grid Services (#1773) Simplified grid service activation (removed one radio button).
  • JMOL (#2505) updated to JMOL version 12.0.35.
  • Markers/Arrays component (#2430) dynamic filtering of displayed marker or array list as search term is entered.
  • MarkUs
    • 1. (#2500) Add ability to retrieve prior MarkUs jobs by job id
    • 2. (#2509) Add private key option to MarkUs job submission
  • MINDy (#2214) Corrected sign of modulation effect in table displays.
  • Pattern Discovery (#2119) corrected display problems in Pattern Discovery related to regular expressions and use of substitution matrices.
  • Preferences (#2393) Added ability to reorder data sorted by marker name, gene name, or original order (set in preferences, affects all components).
  • Sequence Retriever (#2518) fixed problem with obtaining name of latest human genome build from Santa Cruz.
  • Tabular Microarray Viewer(#2253) Tabular Microarray viewer now allows adjustable precision in display, and choice of fixed or scientific notation.
  • t-test (#1626) changed math package used in order to correct precision problem with p-value calculation at very small p-values.
  • Volcano Plot (#2492) extreme point color range corrected.

Changes to documentation

  • ARACNe
    • Corrected and improved descriptions of DPI Tolerance and DPI Target List.
    • Added Technical Notes section and usage notes.
    • Mapped relationship of ARACNe command line options -s and -l to ARACNe in geWorkbench.
    • Rewrote introduction.
  • CNKB - all screenshots on tutorial page updated to reflect
    • Don't use multiple markers from same gene because they all get the same query results, and the interaction totals reflects all of the duplicates.
    • New default color scheme of Cytoscape 2.7
    • New right-click menu options in Cytoscape 2.7.
  • Cytoscape - Added new section to show all new features for release 2.2 - Correlation Overlay, t-test overlay, subnetworks etc.
  • Data Subsets - Arrays
    • Many new and revised screenshots added.
    • Added new full section on visual properties editor.
    • Now call the collections of sets "Lists" (pulldown menu entries).
    • Much material was previously in a separate "Examples" section; this has now all been moved into the primary description of each menu item.
  • Data Subsets - Markers
    • All the same changes made to "Data Subsets - Arrays" were also made to "Data Subsets - Markers".
    • Material unique to the Markers component was added as needed.
  • File Formats - Bug 1624 - "duplicate entries in annotation file should be reported" - added new section to "File Formats" page showing screenshot of error dialog offering 3 choices of action.
  • Local Data files - full update, including
    • updated descriptions of GEO files.
    • updated screenshots and descriptions,
    • added additional material about file browser, merging, annotation files....
  • Menu Bar - New tutorial written to cover all actions available in the top level menu bar.
  • MINDy - In the tutorial and Online Help, the description of Table->Discretized Scores->Score value incorrectly described a range of -1 to +1 for the raw delta MI scores. The actual bounds on MI are zero to +infinity, so this phrase was removed in both sources.
  • Pattern Discovery - all screenshots replaced. Changes reflect:
    • Fixed display of regular expression matches in Full Sequence view after running with "Exact" unchecked.
    • In Advanced tab, matrix and threshold settings now disabled when "Exact" is checked.
    • Old screenshots misspelled "Exhaust.".
    • Better illustration of multiple pattern displays in viewer.
    • Once most screenshots were replaced for above reasons, rest were replaced because could see the difference in new vs old "Look and Feel".
  • Projects
    • Improved descriptions and added additional screenshots throughout
    • Added missing options such as RCSB PDB.
    • Add new section on Workspaces
  • SVM - all new chapter written. It is updated to reflect small changes to the Test tab GUI in development.
  • t-test
    • Fully revised all sections.
    • Added sections on Volcano Plot and Color Mosaic

Summary of changes in geWorkbench release 2.1.0

Release Date: September 10, 2010.

  • BLAST
    • A major upgrade of the built-in BLAST interface now provides almost all query options available on the NCBI BLAST website.
    • geWorkbench can retrieve full or partial sequences for BLAST hits.

A recent change at NCBI caused this to stop working. This is fixed in this release.

  • Filtering
    • A new Coefficient of Variation data filter has been added. This scales expression profile standard deviations by their means, so that profiles can be filtered on a directly comparable measure of variation.
  • Gene Ontology expandable tree views were added to the
    • Gene Ontology Enrichment viewer
    • Cellular Network Knowledge Base (CNKB) viewer
  • System Information tool
    • A menu item was added which provides system information such as Java memory allocated and used, path to the current JRE, and Operating System details.
  • Arrays component
    • The members of an array set can now be saved as a list to a file on disk, matching functionality already present for markers.
  • Online Help chapter updates (help files built-in to geWorkbench)
    • BLAST (Sequence Alignment component) – fully revised.
    • Filtering - added section for Coefficient of Variation filter.
    • MINDy - added section on using ARACNe preprocessing.
    • Pattern Discovery – fully revised.
  • Cytoscape component
    • Updated to Cytoscape version 2.7.0.
  • Bugs
    • A number of bugs were fixed; full details are available in the Release Notes.
  • Refactoring
    • A project of ongoing refactoring and simplification was continued in order to enhance long-term maintainability and performance of the code.

Summary of changes in geWorkbench release 2.0.2

Release Date: July 16, 2010.

  • Fixed problem with genSpace logging.
  • Fully revised Online Help chapter for MINDy.

Summary of changes in geWorkbench release 2.0.1

Release Date: June 25, 2010

  • Fixed a problem with caGrid connectivity.
  • Fully revised Online Help chapter for the Cellular Networks Knowledge Base (CNKB) component.

Summary of changes in geWorkbench release 2.0.0

Release Date: June 9th, 2010

Major new features

  • Filtering - completely revamped - now works directly for all modes, allows specification of minimum % matching arrays before filtering occurs.
  • File parsers added:
    • MAGE-TAB data matix
    • GEO Soft format - added series (GSE) and curated matrix (GDS).
  • Java 6 - Moved from Java 5 to Java 6. geWorkbench now requires Java 6. Works on both 32 bit and 64 bit VMs (JREs).
  • Look and Feel - Switched to new, more modern Look and Feel (Nimbus). geWorkbench appearance now consistent across all platforms.
  • caBIO component updated from 4.2 to 4.3.
  • Cellular Network Knowledge Base (CNKB) - Revamped interface to allow choice of interactome and data types.
  • More than 250 additional "bug reports" were closed. These included improvements in the usability of numerous components, and actual bug fixes.

New Components

  • Skybase - SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. It is queried using FASTA-format protein sequence files.
  • Skyline - A high-throughput comparative modeling pipeline. It creates structural homology models for protein sequences with similarity to a protein with an experimentally determined 3-D structure. The input is a PDB file. (Depends on an internal server, external use not yet enabled).
  • Pudge - Interface to a protein structure prediction server which integrates tools used at different stages of the structural prediction process. Modeling starts with a FASTA-format protein sequence file.


Other major changes

  • caArray - Improved memory usage on downloads from caArray.
  • CNKB - Can now return markers direct from CNKB without use of Cytoscape.
  • Color Mosaic - enhancements to display (bug 2147):
    • toggle array names on/off
    • search on array name, accession, or label
  • Component Configuration Manager - now can filter display list by categories: Analysis, Viewer, Normalizer, Filter.
  • Cytoscape - Corrected mapping between gene names in Cytoscape display and markers in Marker Sets panel (now uses Entrez IDs).
  • Dendrogram - can now create Array subsets as well as marker subsets.
  • Markers and Arrays - Hover text available in Markers and Arrays phenotypes to visualize long names if needed.
  • Marker Annotation - search results can be saved to a text file, including relevant URLs and pathway BioCarta pathway names.
  • File loading - Checking for "out of memory" errors during file loading.
  • GUI - in switching to new Look and Feel, fixed many text highlighting problems that were previously seen on Macintosh only but now appeared on Windows also.
  • File parser menu - The file parser selection menu now shows valid file extensions for each type.
  • Promoter - JASPAR promoter motifs now filterable by taxon.
  • Sequence alignment (BLAST) - many enhancements, including added additional databases to match those listed at NCBI improved handling of results from searches containing long query sequences.


See also the list of changes in previous releases.