Difference between revisions of "Grid Services"

 
(Production URLs for geWorkbench 2.5.0 and 2.5.1)
 
(56 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
In cooperation with caBIG(R), the National Cancer Institute's Cancer Biomedical Informatics Grid program, a number of the geWorkbench analysis components have also been adapted to run as services on caGrid, the primary infrastructure component of caBIG.  In accordance with caBIG principles, each has a well-defined object design and a public application programming interface (API) via which data can be exchanged.  Annotations describing each service, object and parameter are stored in the caDSR (NCI's Cancer Data Standards Repository), using standard vocabulary terms available from the Enterprise Vocabulary Services (EVS).
 
In cooperation with caBIG(R), the National Cancer Institute's Cancer Biomedical Informatics Grid program, a number of the geWorkbench analysis components have also been adapted to run as services on caGrid, the primary infrastructure component of caBIG.  In accordance with caBIG principles, each has a well-defined object design and a public application programming interface (API) via which data can be exchanged.  Annotations describing each service, object and parameter are stored in the caDSR (NCI's Cancer Data Standards Repository), using standard vocabulary terms available from the Enterprise Vocabulary Services (EVS).
  
Some services are implemented only remotely, such as Mark-Us, where the grid component serves as an interface to a web service.
+
Some services are implemented only remotely.
  
Each geWorkbench analysis component that has an associated grid service will show a '''Services''' tab in the [[Tutorial_-_Analysis | Analysis]] framework.
+
Each geWorkbench analysis component that has an associated grid service will show a '''Services''' tab in the [[Analysis_Framework | Analysis]] framework, adjacent to the '''Parameters''' tab.
 +
 
 +
caGrid 1.4 is used for all services.
 +
 
 +
'''Note''' - Grid services are only supported for the most recent major release of geWorkbench.
  
 
==Services tab==
 
==Services tab==
 +
 +
[[Image:Grid_services_empty.png]]
 +
 +
* '''Local''' - When the "Local" radio button is selected, the calculation will be performed directly within geWorkbench, if available.  Some analyses have no local implementation.
 +
* '''Change Index Service''' - Index Services maintain lists of available grid services.  geWorkbench is delivered with the URL of a Columbia Index Service preconfigured, which provides access to demonstration grid service implementations.
 +
* '''Change Dispatcher''' - The Dispatcher is a geWorkbench server-side component which provides connectivity between geWorkbench and caGrid.  geWorkbench is delivered with the URL of a Columbia Dispatcher Service preconfigured.
 +
 +
 +
===Search Grid Services===
 +
 +
[[Image:Grid_service_ANOVA.png]]
 +
 +
When the '''Search Grid Services''' button is pushed, the list of available services of the desired type will be retrieved from the specified index service.  The list will appear in the area below, with each available service preceded by a radio button.  The desired remote grid service can be selected using these radio buttons.
 +
 +
 +
===Service Details===
 +
 +
[[Image:Grid_service_ANOVA_selected.png|{{ImageMaxWidth}}]]
 +
 +
Once a particular grid service has been selected (via its radio button), the details of the service will be displayed in the lower window.
 +
 +
'''Note''' - for grid services hosted inside the Columbia firewall, the service metadata cannot be retrieved from outside the firewall.  Instead, generic metadata will be displayed.  You may see a delay of up to 30 seconds while geWorkbench attempts to retrieve the service metadata, before a timeout occurs and the generic metadata is substituted.
 +
 +
==Production URLs for geWorkbench 2.6.0, 2.5.1 and 2.5.0==
 +
The Index and Dispatcher service URLS for geWorkbench 2.6.0 are the same as those for geWorkbench 2.5.0 and 2.5.1.
 +
 +
===Index and Dispatcher===
 +
* Default Index Service: http://cagridnode.c2b2.columbia.edu:8080/v2.5.0/wsrf/services/DefaultIndexService
 +
* Default Dispatcher Service: http://cagridnode.c2b2.columbia.edu:8080/v2.5.0/wsrf/services/cagrid/Dispatcher
 +
 +
===Standard geWorkbench Grid Services===
 +
 +
* http://geworkbench1.c2b2.columbia.edu:8080/wsrf/services/cagrid/ServiceName
 +
 +
where ServiceName is e.g. Anova, Aracne, etc.
 +
 +
===MarkUs and Skyline Service URLs===
 +
* bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/MarkUs
 +
* bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/SkyLine
 +
 +
==Production URLs for geWorkbench 2.4.0 and 2.4.1==
 +
(Note - previous version URLs are not reachable because they are not forwarded by the proxy server.  The information below is for reference only).
 +
* Default Index Service: http://cagridnode.c2b2.columbia.edu:8080/v2.4.0/wsrf/services/DefaultIndexService
 +
* Default Dispatcher Service: http://cagridnode.c2b2.columbia.edu:8080/v2.4.0/wsrf/services/cagrid/Dispatcher
 +
 +
===Standard geWorkbench Grid Services===
 +
 +
* http://geworkbench2.c2b2.columbia.edu:8080/wsrf/services/cagrid/ServiceName
 +
 +
==Running a grid job==
 +
 +
# The Index Service and Dispatcher URLs are set to default geWorkbench services.  If needed, choose an appropriate alternate Index service and/or Dispatcher service. 
 +
# Push the Grid Services button
 +
# Select an available grid service.
 +
# Return to the Parameters tab, and when ready, push the Analyze button.
 +
 +
Some services require login credentials.  If so, a dialog will appear asking for a Username and Password.  If you possess the appropriate credentials for the service you have selected, enter them here and push OK.
 +
 +
[[Image:Grid_services_Username.png]]
 +
 +
 +
A message may appear indicating that the job is being submitted.
 +
 +
[[Image:T_Grid_Services_Submitting_request.png]]
 +
 +
 +
While the job is running, a node marked "Pending" will be placed in the [[Workspace]], preceded by an hourglass icon.  Note that the progress bar that appears when analyses are run locally within geWorkbench will not appear for grid jobs.
 +
 +
[[Image:Grid_services_pending.png]]
 +
 +
==Further aspects of running grid jobs==
 +
 +
# The grid job, once started, is independent of geWorkbench.  The dispatcher component cooperates with geWorkbench to track job status.  A geWorkbench workspace containing running grid jobs can be saved and later restored.  At the time that the saved workspace is reloaded, geWorkbench will resume monitoring the job for completion, and retrieve the finished results if available.
 +
# Once a grid job has been started, its execution cannot be canceled from within geWorkbench.  However, the "pending" node can be removed from the [[Workspace]].  In this case, geWorkbench will not receive any results when the calculation actually completes.
 +
 +
==Dispatcher==
 +
The geWorkbench Dispatcher Service is separate from geWorkbench and serves as an intermediary between geWorkbench and grid services. 
 +
 +
As of release 2.3.0, outbound data (from geWorkbench to the grid service) is moved using caTransfer, both from geWorkbench to the Dispatcher, and from the Dispatcher to the grid service.  Returning data is transferred for both hops as a base-64 encoded string, which for very large data sets can lead to an out-of-memory error.  Full replacement with caTransfer is being investigated.
 +
 +
The error message will read:
 +
 +
Out-of-memory error: Java heap space
 +
It is advisable to restart geWorkbench.
 +
You may also wish to increase the geWorkbench memory size.
 +
 +
In this case, as the message states, it is advisable to restart geWorkbench.  Instructions for increasing the amount of memory available to Java are available in the [[FAQ| geWorkbench FAQ]].  For Java memory requests larger than about 1.5 GB, running geWorkbench using a 64-bit Java Runtime Environment (JRE) will be necessary.
 +
 +
==List of grid services==
 +
 +
The following analyses in geWorkbench are provided via grid services.  Some are password protected.
 +
 +
Some analyses can be run either locally within geWorkbench, or via an external grid or web service.  Some grid services hosted at Columbia require a username/password.  The following table summarizes this information for geWorkbench components that can utilized grid services.
 +
 +
 +
{|border="1" class="tablesorter"
 +
! component || local avail. || remote service type|| grid username/password req'd.
 +
|-
 +
|ANOVA|| yes || grid || yes
 +
|-
 +
|ARACNe || yes || grid ||yes 
 +
|-
 +
|Hierarchical Clustering|| yes || grid ||  yes
 +
|-
 +
|MarkUs|| no || web or grid ||  no
 +
|-
 +
|MINDy|| yes ||  grid || yes 
 +
|-
 +
|MatrixREDUCE|| yes || grid || yes 
 +
|-
 +
|SkyBase|| no || grid || no
 +
|-
 +
|SkyLine|| no || grid || yes 
 +
|-
 +
|SOM|| yes|| grid || yes 
 +
|-
 +
|}
 +
 +
==Technical Details on Data Transfer==
 +
When a grid analysis is started, a view of the data defined by the activated array and marker sets is sent to the grid service, as well as the entire parent data set.

Latest revision as of 18:20, 6 March 2015

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

Most analytic routines (e.g. clustering, t-test etc.) available in geWorkbench are implemented directly in the geWorkbench desktop application code. They run on your local PC.

In cooperation with caBIG(R), the National Cancer Institute's Cancer Biomedical Informatics Grid program, a number of the geWorkbench analysis components have also been adapted to run as services on caGrid, the primary infrastructure component of caBIG. In accordance with caBIG principles, each has a well-defined object design and a public application programming interface (API) via which data can be exchanged. Annotations describing each service, object and parameter are stored in the caDSR (NCI's Cancer Data Standards Repository), using standard vocabulary terms available from the Enterprise Vocabulary Services (EVS).

Some services are implemented only remotely.

Each geWorkbench analysis component that has an associated grid service will show a Services tab in the Analysis framework, adjacent to the Parameters tab.

caGrid 1.4 is used for all services.

Note - Grid services are only supported for the most recent major release of geWorkbench.

Services tab

Grid services empty.png

  • Local - When the "Local" radio button is selected, the calculation will be performed directly within geWorkbench, if available. Some analyses have no local implementation.
  • Change Index Service - Index Services maintain lists of available grid services. geWorkbench is delivered with the URL of a Columbia Index Service preconfigured, which provides access to demonstration grid service implementations.
  • Change Dispatcher - The Dispatcher is a geWorkbench server-side component which provides connectivity between geWorkbench and caGrid. geWorkbench is delivered with the URL of a Columbia Dispatcher Service preconfigured.


Search Grid Services

Grid service ANOVA.png

When the Search Grid Services button is pushed, the list of available services of the desired type will be retrieved from the specified index service. The list will appear in the area below, with each available service preceded by a radio button. The desired remote grid service can be selected using these radio buttons.


Service Details

Grid service ANOVA selected.png

Once a particular grid service has been selected (via its radio button), the details of the service will be displayed in the lower window.

Note - for grid services hosted inside the Columbia firewall, the service metadata cannot be retrieved from outside the firewall. Instead, generic metadata will be displayed. You may see a delay of up to 30 seconds while geWorkbench attempts to retrieve the service metadata, before a timeout occurs and the generic metadata is substituted.

Production URLs for geWorkbench 2.6.0, 2.5.1 and 2.5.0

The Index and Dispatcher service URLS for geWorkbench 2.6.0 are the same as those for geWorkbench 2.5.0 and 2.5.1.

Index and Dispatcher

Standard geWorkbench Grid Services

where ServiceName is e.g. Anova, Aracne, etc.

MarkUs and Skyline Service URLs

  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/MarkUs
  • bhapp.c2b2.columbia.edu:8080/wsrf/services/cagrid/SkyLine

Production URLs for geWorkbench 2.4.0 and 2.4.1

(Note - previous version URLs are not reachable because they are not forwarded by the proxy server. The information below is for reference only).

Standard geWorkbench Grid Services

Running a grid job

  1. The Index Service and Dispatcher URLs are set to default geWorkbench services. If needed, choose an appropriate alternate Index service and/or Dispatcher service.
  2. Push the Grid Services button
  3. Select an available grid service.
  4. Return to the Parameters tab, and when ready, push the Analyze button.

Some services require login credentials. If so, a dialog will appear asking for a Username and Password. If you possess the appropriate credentials for the service you have selected, enter them here and push OK.

Grid services Username.png


A message may appear indicating that the job is being submitted.

T Grid Services Submitting request.png


While the job is running, a node marked "Pending" will be placed in the Workspace, preceded by an hourglass icon. Note that the progress bar that appears when analyses are run locally within geWorkbench will not appear for grid jobs.

Grid services pending.png

Further aspects of running grid jobs

  1. The grid job, once started, is independent of geWorkbench. The dispatcher component cooperates with geWorkbench to track job status. A geWorkbench workspace containing running grid jobs can be saved and later restored. At the time that the saved workspace is reloaded, geWorkbench will resume monitoring the job for completion, and retrieve the finished results if available.
  2. Once a grid job has been started, its execution cannot be canceled from within geWorkbench. However, the "pending" node can be removed from the Workspace. In this case, geWorkbench will not receive any results when the calculation actually completes.

Dispatcher

The geWorkbench Dispatcher Service is separate from geWorkbench and serves as an intermediary between geWorkbench and grid services.

As of release 2.3.0, outbound data (from geWorkbench to the grid service) is moved using caTransfer, both from geWorkbench to the Dispatcher, and from the Dispatcher to the grid service. Returning data is transferred for both hops as a base-64 encoded string, which for very large data sets can lead to an out-of-memory error. Full replacement with caTransfer is being investigated.

The error message will read:

Out-of-memory error: Java heap space
It is advisable to restart geWorkbench.
You may also wish to increase the geWorkbench memory size. 

In this case, as the message states, it is advisable to restart geWorkbench. Instructions for increasing the amount of memory available to Java are available in the geWorkbench FAQ. For Java memory requests larger than about 1.5 GB, running geWorkbench using a 64-bit Java Runtime Environment (JRE) will be necessary.

List of grid services

The following analyses in geWorkbench are provided via grid services. Some are password protected.

Some analyses can be run either locally within geWorkbench, or via an external grid or web service. Some grid services hosted at Columbia require a username/password. The following table summarizes this information for geWorkbench components that can utilized grid services.


component local avail. remote service type grid username/password req'd.
ANOVA yes grid yes
ARACNe yes grid yes
Hierarchical Clustering yes grid yes
MarkUs no web or grid no
MINDy yes grid yes
MatrixREDUCE yes grid yes
SkyBase no grid no
SkyLine no grid yes
SOM yes grid yes

Technical Details on Data Transfer

When a grid analysis is started, a view of the data defined by the activated array and marker sets is sent to the grid service, as well as the entire parent data set.