Most analytic routines (e.g. clustering, t-test etc.) available in geWorkbench are implemented directly in the geWorkbench desktop application code. They run on your local PC.
In cooperation with caBIG(R), the National Cancer Institute's Cancer Biomedical Informatics Grid program, a number of the geWorkbench analysis components have also been adapted to run as services on caGrid, the primary infrastructure component of caBIG. In accordance with caBIG principles, each has a well-defined object design and a public application programming interface (API) via which data can be exchanged. Annotations describing each service, object and parameter are stored in the caDSR (NCI's Cancer Data Standards Repository), using standard vocabulary terms available from the Enterprise Vocabulary Services (EVS).
Some services are implemented only remotely.
Each geWorkbench analysis component that has an associated grid service will show a Services tab in the Analysis framework, adjacent to the Parameters tab.
As of geWorkbench release 2.3.0, caGrid 1.4 is used for all services.
Grid services are only supported for the two most recent releases of geWorkbench at any one time.
- Local - When the "Local" radio button is selected, the calculation will be performed directly within geWorkbench, if available. Some analyses have no local implementation.
- Change Index Service - Index Services maintain lists of available grid services. geWorkbench is delivered with the URL of a Columbia Index Service preconfigured, which provides access to demonstration grid service implementations.
- Change Dispatcher - The Dispatcher is a geWorkbench server-side component which provides connectivity between geWorkbench and caGrid. geWorkbench is delivered with the URL of a Columbia Dispatcher Service preconfigured.
Search Grid Services
When the Search Grid Services button is pushed, the list of available services of the desired type will be retrieved from the specified index service. The list will appear in the area below, with each available service preceded by a radio button. The desired remote grid service can be selected using these radio buttons.
Once a particular grid service has been selected (via its radio button), the details of the service will be displayed in the lower window.
Note - for grid services hosted inside the Columbia firewall, the service metadata cannot be retrieved from outside the firewall. Instead, generic metadata will be displayed. You may see a delay of up to 30 seconds while geWorkbench attempts to retrieve the service metadata, before a timeout occurs and the generic metadata is substituted.
Production URLs for geWorkbench 2.4.0 and 2.4.1
- Default Index Service: http://cagridnode.c2b2.columbia.edu:8080/v2.4.0/wsrf/services/DefaultIndexService
- Default Dispatcher Service: http://cagridnode.c2b2.columbia.edu:8080/v2.4.0/wsrf/services/cagrid/Dispatcher
Standard geWorkbench Grid Services
Production URLs for geWorkbench 2.3.0
Index and Dispatcher
- Default Index Service: http://cagridnode.c2b2.columbia.edu:8080/v2.3.0/wsrf/services/DefaultIndexService
- Default Dispatcher Service: http://cagridnode.c2b2.columbia.edu:8080/v2.3.0/wsrf/services/cagrid/Dispatcher
Standard geWorkbench Grid Services
Running a grid job
- The Index Service and Dispatcher URLs are set to default geWorkbench services. If needed, choose an appropriate alternate Index service and/or Dispatcher service.
- Push the Grid Services button
- Select an available grid service.
- Return to the Parameters tab, and when ready, push the Analyze button.
Some services require login credentials. If so, a dialog will appear asking for a Username and Password. If you possess the appropriate credentials for the service you have selected, enter them here and push OK.
A message may appear indicating that the job is being submitted.
While the job is running, a node marked "Pending" will be placed in the Project Folders component, preceeded by an hourglass icon. Note that the progress bar that appears when analyses are run locally within geWorkbench will not appear for grid jobs.
Further aspects of running grid jobs
- The grid job, once started, is independent of geWorkbench. The dispatcher component cooperates with geWorkbench to track job status. A geWorkbench workspace containing running grid jobs can be saved and later restored. At the time that the saved workspace is reloaded, geWorkbench will resume monitoring the job for completion, and retrieve the finished results if available.
- Once a grid job has been started, its execution cannot be canceled from within geWorkbench. However, the "pending" node can be removed from the Project Folders component. In this case, geWorkbench will not receive any results when the calculation actually completes.
The geWorkbench Dispatcher Service is separate from geWorkbench and serves as an intermediary between geWorkbench and grid services.
As of release 2.3.0, outbound data (from geWorkbench to the grid service) is moved using caTransfer, both from geWorkbench to the Dispatcher, and from the Dispatcher to the grid service. Returning data is transferred for both hops as a base-64 encoded string, which for very large data sets can lead to an out-of-memory error. Full replacement with caTransfer is being investigated.
The error message will read:
Out-of-memory error: Java heap space It is advisable to restart geWorkbench. You may also wish to increase the geWorkbench memory size.
In this case, as the message states, it is advisable to restart geWorkbench. Instructions for increasing the amount of memory available to Java are available in the geWorkbench FAQ. For Java memory requests larger than about 1.5 GB, running geWorkbench using a 64-bit Java Runtime Environment (JRE) will be necessary.
List of grid services
The following analyses in geWorkbench are provided via grid services. Some are password protected.
Some analyses can be run either locally within geWorkbench, or via an external grid or web service. Some grid services hosted at Columbia require a username/password. The following table summarizes this information for geWorkbench components that can utilized grid services.
|component||local avail.||remote service type||grid username/password req'd.|
|MarkUs||no||web or grid||no|
Technical Details on Data Transfer
When a grid analysis is started, a view of the data defined by the activated array and marker sets is sent to the grid service, as well as the entire parent data set.