geWorkbench

Revision as of 19:06, 27 February 2006

Introduction

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extensible collection of tools for the management, analysis, visualization and annotation of biomedical data. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analyis. Furthermore, genomic sequences around markers of interest found in microarray experiements can be easily retrieved and, for example, used for promoter/transcription factor analysis.

Specific types of data supported include:

Microarray Gene Expression
- Affymetrix GCOS/MAS5
- Matrix format (geWorkbench)
- RMAExpress
- GenePix
DNA and Protein Sequences
- FASTA
Pathways
- BioCarta
Patterns
- Regular Expressions
Gene Ontology
Networks

Most importantly, geWorkbench provides an environment which supports moving from one data type to another in a seamless fashion, e.g. from gene expression to sequences to patterns.

Developing for geWorkbench

geWorkbench has been designed using a plug-in framework which allows new modules to be developed with relative ease. A repository will be maintained for community-developed modules. Developers can take advantage of all the existing capabilities for data management and visualization, and thus concentrate development efforts on the more important, novel aspects of their project.

geWorkbench as an interface to external data and computational resources

geWorkbench provides access to a variety of external data sources, including:

Microarray gene expression repositories (caArray)
Gene annotation pages (via CGAP)
DNA sequence retrieval
Pathway diagrams (BioCarta)

geWorkbench also provides a gateway to several computational services currently hosted on Columbia servers and clusters, including:

BLAST
Pattern Discovery
Synteny

Basic Layout of the Graphical User Interface

The graphical user interface for geWorkbench is divided into for major sections, for

1. Projects - Data management (upper left)

2. Marker and Array/Phenotype set selection and management (lower left)

3. Visualization tools (upper right)

4. Analytical tools (lower right)

The Data Management area can hold one workspace, and a workspace in turn can hold one or more projects. Projects can be used as wished to group different data sets. Each opened data file or analysis result is stored in a project. A workspace and all the data it contains can be saved and returned to later.

The GUI provides a menu bar at top with a standard choice of commands. Many commands that are available in the menu bar are also available by right-clicking on data objects.

Component Interoperability

The most important design goal of geWorkbench is to allow data produced or altered in one module to be easily transfered to other modules for successive analysis steps. There are two places that hold shared data - the Project component (1), and the Set Selection component(2). While the Project component holds files and various types of analysis result sets, the Set Selection component groups markers or arrays/phenotypes into sets. These sets can then be selected for further analysis of only that particular subset of data. For example, several analysis components produce lists of markers, and each such new list is placed into the Markers component as a new marker set. An example of using a phenotype set is to group microarrays by their disease state.

A key feature of the GUI is that the modules displayed in the Visualization (3) and Analysis (4) areas depend on the type of data currently selected in the Project area (1). Thus you will see a different set of choices (tabs) when a microarray data set is selected, as compared to when a DNA or protein sequence file is selected. When a new data file is loaded, or an analysis produces a new data set, not only is it added to the Project area, but an appropriate viewer in the Visualization area is automatically selected.

@@ Line 67: / Line 67: @@
 ==Component Interoperability==
-The most important design goal of geWorkbench is to allow data produced or altered in one module to be easily transfered to other modules for successive analysis steps.  There are two places that hold shared data - the Project component (1), and the Set Selection component(2).  While the Project component holds files and various types of analysis result sets, the Set Selection component groups markers or arrays/phenotypes into sets.   These sets can then be selected for further analysis of only that particular subset of data.  For example, several analysis components produce lists of markers, and each such new list is placed into the Markers component as a new marker set.  An example of using a phenotype panel is to group microarrays by their disease state.  For example, the tutorials to follow will demonstrate how a set of markers is defined through selecting a cluster in the Hierarchical Clustering component, and this set of markers is then passed to the Sequence Retrieval component to begin sequence analysis.
+The most important design goal of geWorkbench is to allow data produced or altered in one module to be easily transfered to other modules for successive analysis steps.  There are two places that hold shared data - the Project component (1), and the Set Selection component(2).  While the Project component holds files and various types of analysis result sets, the Set Selection component groups markers or arrays/phenotypes into sets.   These sets can then be selected for further analysis of only that particular subset of data.  For example, several analysis components produce lists of markers, and each such new list is placed into the Markers component as a new marker set.  An example of using a phenotype set is to group microarrays by their disease state.
+A key feature of the GUI is that the modules displayed in the Visualization (3) and Analysis (4) areas depend on the type of data currently selected in the Project area (1).  Thus you will see a different set of choices (tabs) when a microarray data set is selected, as compared to when a DNA or protein sequence file is selected.  When a new data file is loaded, or an analysis produces a new data set, not only is it added to the Project area, but an appropriate viewer in the Visualization area is automatically selected.
-A key feature of the GUI is that the modules displayed in the Visualization (3) and Analysis (4) areas depend on the type of data currently selected in the Project Panel.  Thus you will see a different set of choices (tabs) when a microarray data set is selected, as compared to when a DNA or protein sequence file is selected.  When a new data file is loaded, or an analysis produces a new data set, not only is it added to the Project Panel, but an appropriate viewer in the Visualization area is automatically selected.

geWorkbench

Difference between revisions of "Basics"

Revision as of 19:06, 27 February 2006

Contents

Introduction

Developing for geWorkbench

geWorkbench as an interface to external data and computational resources

Basic Layout of the Graphical User Interface

Component Interoperability

Search

Personal tools

Tools