Overview

Introduction

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extensible collection of tools for the management, analysis, visualization and annotation of biomedical data. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analysis. Furthermore, genomic sequences around markers of interest found in microarray experiments can be easily retrieved and, for example, used for promoter/transcription factor analysis.

Specific types of data supported include:

  • Microarray Gene Expression
    • GEO Soft: Series, Series Matrix, and Annotated Matrix (GDS)
    • MAGE-TAB data matrix
    • Affymetrix GCOS/MAS5
    • Matrix format (geWorkbench)
    • Tab-delimited (e.g. RMAExpress)
    • GenePix
  • Microarray Gene Expression Annotation file support
    • Affymetrix 3' Expression
    • Affymetrix WT Gene/Exon ST (transcript-level) including Gene Array 1.0/2.0 ST and Exon 1.0 ST.
  • DNA and Protein Sequences
    • FASTA
  • Pathways
    • BioCarta
  • Molecular structure - prediction, annotation and display
  • Sequence Patterns
    • Regular Expressions
  • Gene Ontology
  • Regulatory Networks

Most importantly, geWorkbench provides an environment which supports moving from one data type to another in a seamless fashion, e.g. from gene expression to sequences to patterns.

Slide1.gif


geWorkbench as an interface to external data and computational resources

geWorkbench provides access to a variety of external data sources, including:

  • Microarray gene expression repositories (caArray)
  • BLAST (NCBI)
  • Gene annotation pages (via bioDBNet)
  • Protein and DNA sequence retrieval (UC Santa Cruz and EBI)
  • Pathway diagrams (BioCarta)

geWorkbench also provides a gateway to several computational services currently hosted on Columbia servers and clusters, including:

  • Pattern Discovery
  • Pudge - protein structure modeling
  • SkyBase - database of molecular models


Basic Layout of the Graphical User Interface

The graphical user interface for geWorkbench is divided into three major sections, for

  1. Projects - Data management (upper left)
  2. Visualization tools (upper right)
  3. Marker and Array/Phenotype set selection and management (lower left)


GeWorkbench full GUI color mosaic.png


The Data Management area can hold one workspace, and a workspace in turn can hold one or more projects. Projects can be used as wished to group different data sets. Each opened data file or analysis result is stored in a project. A workspace and all the data it contains can be saved and returned to later.

The GUI provides a menu bar at top with a standard choice of commands. Many commands that are available in the menu bar are also available by right-clicking on data objects.


Developing for geWorkbench

geWorkbench has been designed using a plug-in framework which allows new modules to be developed with relative ease. A repository will be maintained for community-developed modules. Developers can take advantage of all the existing capabilities for data management and visualization, and thus concentrate development efforts on the more important, novel aspects of their project.