TeraGrid

From Informatics

Jump to: navigation, search

see also: Working Groups, TeraGrid, TeraGrid User Support, CaGrid, CaGrid/TeraGrid/geWorkbench_Integration.

Contents

Overview

Biomedical research is becoming an increasingly collaborative undertaking. Parallel advances in biotechnology and informatics are creating new possibilities for discovery as well as increased demands for information sharing and exchange capabilities. To date, most existing databases and analytical tools have been developed independently, with tremendous variability in data rules, processes, vocabularies, and representations. Overall, there has been a lack of any unifying architectures to support interoperability among these databases, knowledge stores, and software tools. An overarching infrastructure is urgently needed to support the technological and lexical standards upon which such interoperability critically depends.

In particular, the need to accelerate the translation of basic research discoveries into new clinical therapies demands that the channels for communication, data exchange, and collaboration—among cancer centers along all points of the basic-to-clinical spectrum—must be significantly expanded. Recognizing the major national impact that a true networking of cancer centers can achieve, the National Cancer Institute (NCI) has introduced the cancer Biomedical Informatics Grid (CaBIG) initiative to address these issues in the cancer research community.

The caBIG initiative will expedite access of the cancer research communities to key bioinformatics platforms. In partnership with the cancer research community, caBIG will create a common, extensible informatics platform that integrates diverse data types and supports interoperable analytic tools. This platform will allow research groups to tap into the rich collection of emerging cancer research data while supporting their individual investigations. The participation of multiple cooperating Cancer Centers in the earliest pilot stages of this effort ensures that the user community’s needs will be appropriately addressed, and, that the stakeholders in the enterprise will embrace the emerging vocabulary harmonizations and data exchange standards.

For data or computationally intensive operations or processes that wish to expose themselves to caBIG as grid service operations, there may be a significant amount of work to implement the service operations in an efficient way. Such implementations would traditionally make use of high performance cluster tools. Often these tools are fairly low level and would be difficult to attempt to make caBIG compatible. TeraGrid was identified as a potential solution to provide high throughput computational analysis services. TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource.

TeraGrid

TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource. Access to TeraGrid resources is granted to credentialed users through the issuance of compute resource allocations.

caGrid

caBIG is a voluntary virtual informatics infrastructure that connects data, research tools, scientists, and organizations to leverage their combined strengths and expertise in an open federated environment with widely accepted standards and shared tools. The underlying service oriented infrastructure that supports caBIG is referred to as CaGrid.

caGrid Gateway Service

A caGrid Gateway Service is a grid service running in the environment which acts as a bridge or proxy to TeraGrid resources for a subset of users. Created for a specific scientific scenario, the service abstracts away the details of leveraging TeraGrid for performance intensive operations. It alleviates the need for users to understand the complexities of TeraGrid, and bridges the separate security domains such that its users are not required to obtain TeraGrid accounts.

Anatomy of a caGrid Gateway Service

A caGrid Gateway Service, from a user's perspective is just a standard caGrid service. It meets all of the standard caBIG requirements, such as the use of caDSR-registered data types in its service interface, runs with caBIG-trusted service credentials, and registers itself to the caGrid Index Service.

From a service developer's perspective, the gateway service is generally more complex than standard caGrid services. Whereas a standard service would call into an existing API Figure 1), a gateway service is responsible for mapping client requests into job submissions to TeraGrid, and providing appropriate abstractions to its clients to indirectly interact with those jobs. This includes transforming client input data objects to appropriate job submission parameters, and transforming job results to appropriate result data objects. The service also has the added complexity of operating as a bridge between two security domains ( and TeraGrid), and managing the auditing and reporting requirements necessary to map between user credentials in the separate domains. The details of these components are described in the this wiki.

References

For more information, see

Creating a caGrid Gateway Service

Follow the step-by-step Cook Book for detailed instructions on how to create your own gateway service.

Example

For details on the demo example, see geWorkbench caGrid/TeraGrid Pilot Effort. For more information on geWorkbench, see

Personal tools