CaGrid 0.5

From Informatics

Jump to: navigation, search

Contents

General

Visit the website here.

caGrid differs from a basic grid infrastructure due to the special emphasis given to data modelling and semantics. All data types used on the grid must be formally described (.xsd registered in gme) and curated (caDSR).

From the caBIG overview guide: "As caDSR and EVS define properties, relationships, and semantics of caBIG data types, the GME defines the syntax of the XML serialization of them". To me, this means:

  1. uml models (the data types in the models) are stored in the caDSR.
  2. the .xsd files which are uploaded to a gme are used to describe the xml serialization of the objects. Defining a datatype in an .xsd file, however, does not allow you to use the object on the grid until it has been registered in the caDSR (not pertinent to caGrid 0.5).

caGrid contains tools to create/deploy grid services to be used on caGrid. The services can be one of two types:

  • data services
  • analytical services

Setup

Environment

  • Download the caGrid archive from here.
  • Unzip this into a new folder in your eclipse workspace. This folder should be named something like cagrid-0.5.3. We will refer to this as $INSTALLATION_DIR.
  • Create a new project in eclipse (disscussed here and give it the same name as the folder created in the previous step. Eclipse will auto-configure the appropriate files.
  • In eclipse, set the compiler compliance to 1.4 (Right click on $INSTALLATION_DIR and select Properties->Java Compiler). You must do this because cagrid-0.5.3 relies on apache axis, which has package names that consist of the word enum. The is a reserved keyword in Java 5 (it is a type). See article here.
  • Open a command line prompt and cd to $INSTALLATION_DIR from the command line. You cannot execute the next steps from within eclipse because some ant tasks are dependant on others, and eclipse will not refresh the project until the primary ant task has completed.
  • Make sure you have ant installed, and $ANT_HOME/bin is on the $PATH.
  • Run the command export JAVA_HOME=your java 1.4.xx JDK. You must do this because Java 5 gives an error like "NoClassDefFoundError: org.apache.xpath.XPathAPI. The version of Xalan shipped with caGrid-0.5.3 is not compatible with Java 5, so we must use Java 1.4.2_xx. Actually, use 1.4.2_04 ... see bug here. (the export command is used in a linux environment, cygwin included. On Linux, this command looks like export JAVA_HOME=C:/Program\ Files/Java/j2sdk1.4.2_04 ('\ ' is used to escape spaces). On Windows, this looks like set JAVA_HOME=C:\Program Files\Java\j2sdk1.4.2_04.
  • Run ant -f bootstrap.xml. Follow the prompts and:
    • Install Tomcat if you do not have it installed.
    • Install Globus if you do not have it installed.
    • Install Ogsa-dai if you do not have it installed.
  • When the gui pops up and asks you to manually get certain jars, ignore this and click Next because you already have them (they are in CAGRID_HOME/ogsadai-5.0/externals).
  • When asked to deploy globus select your tomcat instance.
  • When asked to deploy OGSA-DAI select your tomcat as the webcontainer to deploy ogsa-dai to.
  • Refresh the project (in eclipse, single click on your caGrid project and press F5).
  • Copy hsqldb.jar from here to eclipse_workspace/your ca-grid-project/ogsadai-5.0/drivers.
  • cd to $INSTALLATION_DIR/ogsadai-5.0 and run ant guiDeployTestFactory.
  • In eclipse, you can point to your tomcat installation. If you use the tomcat instance created from the caGrid scripts, this would be:
    • point to $INSTALLATION_DIR/jakarta-tomcat-5.0.30.
    • If you see an error complaining about "no temp directory found", create a temp directory in $CATALINA_HOME.
    • In $CATALINA_HOME/conf/Catalina/tomcat-users.xml add the following roles and user:
  
  <role rolename="admin"/>
  <role rolename="manager"/>
  <user username="admin" password="kzootio" roles="admin,manager"/>
  

Configuring Clients to trust caGrid

We need to configure the environment so our clients will trust the caGrid infrastructure. To do this, do the following:

  • Make a directory .globus/certificates in $Home and copy $INSTALLATION_DIR/cacert.pem to $Home/.globus/certificates, then rename cacert.pem to nci_cacert.1.

Creating A Service

(commands are run from the command prompt unless otherwise specified)

  • Run:
    • export GLOBUS_LOCATION=$INSTALLATION_DIR/ogsa-3.2.1
    • export OGSADAI_LOCATION=$INSTALLATION_DIR/ogsadai-5.0
    • export CATALINA_HOME/jakarta-tomcat-5.0.30
    • export $PATH:$CATALINA_HOME/bin:$GLOBUS_LOCATION/bin
  • cd to cagrid/caGRID.
  • Create the file analytical.properties and add entries like:

analytical.skeleton.namespace.domain=http\://analysis.cabig.columbia.edu analytical.skeleton.package.dir=edu/columbia/cabig/analysis/foobar analytical.skeleton.service.name=FooBar analytical.skeleton.destination.dir=C\:java/apps/eclipse_workspace/FooBar analytical.skeleton.package=edu.columbia.cabig.analysis.foobar analytical.transient=no

  • Run ant -Danalytical.properties=analytical.properties createAnalyticalService.

Trying to run ant createAnalyticalService without specifying analytical.properties seems to hang. The ant task fails to respond after asking you to specify a directory for your service. - This doesn't seem to generate all the appropriate files. Instead, use the analytical portal. That is:

  • run ant analyticalPortal. You will see a gui pop up. Create a new service (do not put 'Service' in the name). Make sure that your package dir contains "." and not "/".
    • This generates the following in $SERVICE_DIR (directory of your service):
      • FooBarI - interface
      • FooBarClient - implements FooBarI
      • FooBarImpl - implements FooBarI
      • FooBarProvider - extends AnalyticalServiceProvider, delegates to FooBarImpl
  • cd to $INSTALLATION_DIR/cagrid/caGRID/gme in gme-view-globus-config.xml change the gridService to be:

  <gridService serviceId="http://137.187.67.37:80/ogsa/services/cagrid/gme" />

Adding Methods to Your Service

  • cd to $INSTALLATION_DIR/cagrid/caGRID and run ant analyticalPortal.
  • Complete the steps in the gui to add a method.
    • To use the cagrid datatypes in your methods (that is, to be able to discover schemas), you must first be authenticated. See details here.
  • You should see a build directory created in your $SERVICE_DIR.

Implementing Your Service

  • The class you want to edit in is src/package.dir.of.your.service/YourServiceImpl.java.

Deploying Your Service

When you deploy a grid service, the following files are edited in tomcat (if your service is Algorithm):

  • jakarta-tomcat-5.0.30/webapps/ogsa/WEB-INF/etc/_cagrid_Algorithm/_cagrid_Algorithm.wsdd
  • jakarta-tomcat-5.0.30/webapps/ogsa/WEB-INF/server-config.wsdd

This means if you want to undeploy a grid service, remove:

  • jakarta-tomcat-5.0.30/webapps/ogsa/Algorithm
  • jakarta-tomcat-5.0.30/webapps/ogsa/WEB-INF/etc/_cagrid_Algorithm/
  • the element from the file jakarta-tomcat-5.0.30/webapps/ogsa/WEB-INF/server-config.wsdd beginning with:

<service name="cagrid/Algorithm1" provider="Handler" style="wrapped" use="literal">

I found this by grepping for Algorithm1_service.wsdl in jakarta-tomcat-5.0.30. See grep for details.

  • If not already set, run:
    • export GLOBUS_LOCATION=$INSTALLATION_DIR/ogsa-3.2.1
    • export OGSADAI_LOCATION=$INSTALLATION_DIR/ogsadai-5.0
    • export CATALINA_HOME/jakarta-tomcat-5.0.30
  • cd to $SERVICE_DIR and run ant deploy. This will place your service in /$CATALINA_HOME/webapps/ogsa/schema/cagrid/.

NOTE: If you built the service using eclipse (see below), before deploying make sure you

  1. build the project in eclipse to resolve any compilation errors.
  2. go to the Project menu and uncheck Build Automatically.
  3. delete the entire build directory (from the Navigator view). You can still keep this directory build/stubs on the build path.

The reason for this is because the ant task ant deploy includes a java compile component, so we want to let the ant task properly build the service for us and put the files in the appropriate directories.

NOTE: If you are doing this from the command line, you have to specify that the java compiler accept code with assertions (since we are using javac 1.4). To do this, I added the -source flag to all targets in the build.xml file of my service containing javac. This looks like:

<javac srcdir="${build.stubs}" destdir="${build.dest}" debug="${debug}" deprecation="${deprecation}" classpathref="classpath" source="1.4">

  • To verify this has been deployed, start tomcat, select the ogsa application, and look for your service in the list of available services.

Building Your Service (In eclipse ... to run the client)

Generate the service (as discussed above) in the eclipse workspace. Then create a New Java Project in eclipse and use the name of the service you created above.

Set your compiler compliance setting to 1.4.

  • Add the generated /build/stubs to the build path (if not already added).
  • Remove build/classes from the build path.
  • Create a build.properties in your home directory (or get a copy of this from Kiran).
  • cd to your service and run maven. (I want to set a maven repo here where I can store some jars. I think maven will download most of these from ibiblio, but just in case ibiblio does not have a some jars in the future, having our own repo could be a good idea). NOTE: Move this to maven 2 when you get a chance.
    • The following jars are needed to run the client, and are downloaded when running the above maven command.
      • axis.jar
      • jaxrpc.jar
      • wsdl4j.jar
      • ogsa.jar
      • cog-jglobus.jar
      • cog-axis.jar
      • saaj.jar
      • commons-discovery.jar
      • xmlsec.jar
      • xalan.jar

In eclipse, you will see some build errors. Create the variable MAVEN_REPO in eclipse (right click on the project, select Java Build Path, under the libraries tab select Add Variables).

Global Model Exchange

Connect to NCI's GME

  • To connect to NCI's gme, you must
    • register for a grid user account here
    • after receiving notification of your account creation, cd to INSTALLATION_DIR/cagrid/caGRID and run ant gumsPortal.
    • select Credential Management, then Create Proxy.

Now you should be able to discover schemas in the analyticalPortal (see above).

BUG: A bug found here when trying to create a proxy (with ant gumsPortal) was that the Apache XML Security library used by ogsa is calling the field org/apache/xpath/compiler/FunctionTable.m_functions. Newer versions of xalan have made this field private, when it was public before. These are packaged with newer j2sdk versions. Moral of the story: use j2sdk 1.4.2_04.

Creating A "Local" GME

  • cd to INSTALLATION_DIR/cagrid/caGRID.
  • ant createGMEService (you will not find this task in build.xml in the directory ... it is imported from gme-utils.xml).
  • cd to the GME_LOCATION and edit the file etc/gme-globus-config.xml. Specifically, edit lines 4, 5, 35, 56, 59. These are:

<MobiusNetworkServiceDescriptor serviceType="GME" hostname="http://localhost:8080/ogsa/services/cagrid/gme">

<serviceIdentifier serviceId="http://localhost:8080/ogsa/services/cagrid/gme"/>

<serviceRegistry serviceId="localhost" registryClass="org.projectmobius.common.DefaultRegistry"/>

<username>floratos</username> <password>kzoot</password>

  • cd to the GME_LOCATION and run ant deploy.
  • to test that you have configured this correctly, you must

NOTE: If you see an error stating something to the effect of a proxy file not being found, then tomcat is looking for the proxy in a location other than where the proxy is created. After some searching, I found my proxy in /c/Documents and Settings/keshav/Local Settings/Temp/1. You can resolve this error by copying this to $CATALINA_HOME/temp.

Adding a Namespace to your GME

Namespaces have definitions of datatypes that are available for use (I know, this is pretty vague).

  • cd to INSTALLATION_DIR/cagrid/caGRID and run ant gmeViewer.
  • right click and Add A Grid Service (in the Grid Service Id field).
  • add http://localhost:8080/ogsa/services/cagrid/gme.
  • add a namespace to this. The default namespace is cagrid.nci.nih.gov, so you could add something like cagrid.geworkbench.columbia.edu.
  • Right click on this grid service id and Publish Schema. Select your .xsd file. More information on the format of xsd files (and the corresponding java) can be found below.
  • NOTE: If you are getting an unsuccessful compilation when trying to use your datatypes, this is probably due to the fact that the namespace for your datatypes in your schema file (.xsd) is different from the namespace in the .gwsdl (which in turn generates the .wsdl file) file. For instance, I had:

Normalization/schema/cagrid/Normalization.gwsdl

<definitions name="Normalization" targetNamespace="http://cagrid.geworkbench.columbia.edu/Normalization" xmlns="http://schemas.xmlsoap.org/wsdl/"          xmlns:ana="http://cagrid.nci.nih.gov/1/Analytical" xmlns:cadsr="http://cagrid.nci.nih.gov/1/CaDSRExtract" xmlns:common="http://cagrid.nci.nih.gov/1/CommonServiceMetadata"     xmlns:gwsdl="http://www.gridforum.org/namespaces/2003/03/gridWSDLExtensions" xmlns:ns10="gme://cagrid.geworkbench.columbia.edu/1/expression3"  xmlns:ogsi="http://www.gridforum.org/namespaces/2003/03/OGSI" xmlns:sd="http://www.gridforum.org/namespaces/2003/03/serviceData"                   xmlns:tns="http://cagrid.geworkbench.columbia.edu/Normalization" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <import location="./expression3.xsd" namespace="gme://cagrid.geworkbench.columbia.edu/1/expression3"/>
 <import location="../../ogsi/ogsi.gwsdl" namespace="http://www.gridforum.org/namespaces/2003/03/OGSI"/>
 <import location="../types/Common/CommonServiceMetadata.xsd" namespace="http://cagrid.nci.nih.gov/1/CommonServiceMetadata"/>
 <import location="../types/AnalyticalServices/AnalyticalServiceMetadata.xsd" namespace="http://cagrid.nci.nih.gov/1/AnalyticalServiceMetadata"/>

Normalization/schema/cagrid/expression3.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="cagrid.geworkbench.columbia.edu/1/expression3"  xmlns:mobius="cagrid.geworkbench.columbia.edu/1/expression3" targetNamespace="cagrid.geworkbench.columbia.edu/1/expression3" elementFormDefault="qualified" attributeFormDefault="unqualified">

This caused a compilation error because the namespaces in 2) did not have the namespace xmlns:ns10="gme://cagrid.geworkbench.columbia.edu/1/expression. For some reason, the wsdl file prefixes the namespace with gme://. To get around this, I changed the namespaces in 2) to contain the prefix gme://. This now looks like:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="gme://cagrid.geworkbench.columbia.edu/1/expression3" xmlns:mobius="gme://cagrid.geworkbench.columbia.edu/1/expression3" 
targetNamespace="gme://cagrid.geworkbench.columbia.edu/1/expression3" 
elementFormDefault="qualified" attributeFormDefault="unqualified">

caGrid Applications

RProteomics Service

  • cd to RProteomics and run ant build. This is an example of what a service contains.

All services have:

  • /build/schema/cagrid/RProteomics/types/Common/CommonServiceMetadata - every service must provide this
  • /build/schema/cagrid/RProteomics/types/AnalyticalServices/AnalyticalServiceMetadata - pending on the type of service
  • types are in /build/stubs/edu/duke/cabig/rproteomics/bean/RProteomics/bean

Mapping Types from .xsd to Java

An excerpt from scanFeatures.xsd:

<xs:complexType name="valueEnumerationType">
   <xs:annotation>
     <xs:documentation>See the enumeration element</xs:documentation>
   </xs:annotation>
   <xs:sequence>
     <xs:element name="value" type="xs:string" minOccurs="0" maxOccurs="unbounded">
       <xs:annotation>
         <xs:documentation>A single feature value</xs:documentation>
       </xs:annotation>
     </xs:element>
   </xs:sequence>
   <xs:attribute name="type" type="dataTypeType" use="required">
     <xs:annotation>
       <xs:documentation>The data type of the values</xs:documentation>
     </xs:annotation>
   </xs:attribute>
   <xs:attribute name="count" type="xs:int" use="required">
     <xs:annotation>
       <xs:documentation>The number of values in the enumeration.</xs:documentation>
     </xs:annotation>
   </xs:attribute>
 </xs:complexType>

In java, this translates to:

public class ValueEnumerationType  implements java.io.Serializable {
   private java.lang.String[] value;
   private edu.duke.cabig.rproteomics.bean.DataTypeType type;  // attribute
   private int count;  // attribute

When using my own schema, getting a message saying need to define types - the reference impl seems to be putting the types in the build/types dir ... but this is generated so it doesn't make senes to define them here ... must be somewhere else

Browser

  • cagrid-browser - cd to $INSTALLATION_DIR/caGrid/Applications/cagrid-browser.
    • Run ant all.
    • This will deploy the caGrid browser on Tomcat. Start Tomcat. To login, you must configure clients to "trust" the caGrid infrastructure (see setup above).

OGSA

This is just a specification telling us how to create services, deploy them, and invoke them ... the actual implementation of this is Globus.

Core Services provided (by Globus):

  • Index Service - used to register services

OGSA-DAI

Implementation of OGSA-DAI, another standard for data services.

  • middleware used to expose data resources such as databases (relational, xml).

caDSR

  • files of interest:
    • /your-service/etc/analytical/analyticalSDE.xml


Outstanding Issues

  • cagrid 0.5/1.0 does not have support for multidimensional arrays. It uses apache axis 1.0 (cagrid 1.0 uses a slightly modified version of this) which cannot parse multidimensional arrays. Actually, apache axis can support multidimensional arrays if they are parameters to a method, but cannot support multidimensional arrays that are part of a java bean (which is the situation we have when developing services with the cagrid toolkit) See discussion here: http://mail-archives.apache.org/mod_mbox/ws-axis-user/200210.mbox/%3C016e01c27f02$f4563df0$1c00a8c0@MARTINJ%3E
Personal tools