Meng's transition

From Informatics

Jump to: navigation, search

Contents

cluster jobs

Several geworkbench components's backend service is to submit jobs to SGE clusters. They are various different types of software that are mostly NOT part of geworkbench code repository, and similar to each other as cluster jobs from the point of view of the geworkbench application. They are:

  • aracne (c++, matlab, perl) ~cagrid/r/aracne
  • marina (matlab) ~cagrid/matlab/marina
  • demand (R) ~cagrid/r/demand
  • viper (R) ~cagrid/r/viper
  • skyline (perl) defined in data.directory in cagrid1.4 skyline.properties

In geworkbench, many of the scripts to be submitted to the cluster are created dynamically created on the control parameters and input data.

Most of them depends on the fact the particular software has been installed on the machine where the script is submitted to the cluster. The machine is afdev in most cases (or all cases?)

Typical, the created script is saved in a sub-directory called 'bin' or 'script', and the corresponding results are saved in sub-directory called 'run'.

Many of them are handled in a similar way concerning how the input data files are transferred for the web service.

web services

The web services developed and maintained by Meng are all Spring framework based, 'wsdl contract-first' web services. Required libraries for the Java clients are: axiom-api-1.2.10.jar and axis2-kernel-1.5.3.jar.

In MTOM/XOP, the binary data is attached outside the SOAP XML envelope under a MIME header, and is referenced by cid in xop node in SOAP XML envelope. Since the binary data is not part of the soap envelope, it won't be loaded or parsed in memory. To enable MTOM in web service, for example, in aracne,

  • enable mtom in tomcat, cagrid@afdev:apache-tomcat-5.5.35/bin/catalina.sh, line 196:
JAVA_OPTS="$JAVA_OPTS -Dsaaj.use.mimepull=true"
  • enable mtom in service, aracne.service: src/main/webapp/WEB-INF/aracne-servlet.xml, line 24:
<property name="mtomEnabled" value="true"/>
  • enable mtom on client, geworkbench-web: src/main/java/org/geworkbenchweb/plugins/aracne/AracneAxisClient.java, lines 83-87:
serviceOptions.setProperty( Constants.Configuration.ENABLE_MTOM, Constants.VALUE_TRUE );
serviceOptions.setProperty( Constants.Configuration.ATTACHMENT_TEMP_DIR, System.getProperty("java.io.tmpdir") );
serviceOptions.setProperty( Constants.Configuration.CACHE_ATTACHMENTS, Constants.VALUE_TRUE );
serviceOptions.setProperty( Constants.Configuration.FILE_SIZE_THRESHOLD, "1024" );
  • use datahandler to attach binary data without loading them into memory, AracneAxisClient.java line 51:
OMText textData = omFactory.createOMText(new DataHandler(new FileDataSource(dataFile)), true);
  • define binary data type, schema.xsd, line 18:
<element name="expFile" type="base64Binary" xmime:expectedContentTypes="application/octet-stream"/>

aracne

(used by geworkbench web only)

This service's basic behavior depends on the existence of the ARACNE program properly set up, including necessary software platform to support those programs, e.g. Perl, on the same server where this service is deployed.

WSDL url: http://afdev.c2b2.columbia.edu:9090/aracne-server/services/aracne.wsdl (see also http://forum.spring.io/forum/spring-projects/web-services/24952-getting-the-wsdl-from-a-web-service-endpoint)

The ARACNE computation includes three major parts:

  1. preprocessing (create the 'parameter config' result): MATLAB
  2. the ARACNE algorithm: C++ program, pre-built and ready to run
  3. postprocess (create consensus network): perl script
  • service code https://github.com/geworkbench-group/aracne.service
    • for development, import the project from git to eclipse
    • to build and deploy: run "maven packge", then copy the war file (aracne-server.war) to the tomcat's webapp directory
      • note that the default name of the war file is ${artifactId}-${version} if you don't explicitly specify <finalName>
  • mantis issue http://wiki.c2b2.columbia.edu/mantis/view.php?id=3749
  • how to submit a cluster job? Example: qsub my_script.sh The detail can be found in the service code, not documented anywhere else.
  • what is the account used to do this in geworkbench web? What permission is required for the account, and for the machine to submit qsub, and for what else to make it work?

demand

viper

index service

This is a simplictic index service for other services (demand and viper).

pudge

marina

The desktop and the web version invoke the same MARINA code.

  • contact person: Mukesh
  • web version: submit cluster job. MATLAB code must be there on afdev for the application to access. See above 'cluster job' section.
  • desktop version: invoke through caGrid service

markus

  • contact person: Nacho
  • remote server information: http connection to Honig lab's markus web server

skyline

  • contact person: Hunjoong
  • backend: invoke the perl script batch_leverage.pl.

skybase

  • contact person: Hunjoong
  • geworkbench queries the skybase database directly, not through the web interface developed by their group, instead playing the same role.
  • Before querying the database, it executes web_blast.pl which is the same as the one used by Honig lab's skybase website.

vaadin 7

  • done: the application (geworkbench web) was converted to vaadin 7 at the architecture level, and functions as expected
  • not done: (1) there are certain features and plugins not working. (2) the code is in a branch that is out of sync with the master branch because a lot of changes have been made since the vaandin 7 conversion work.

geworkbench1 and geworkbench2

It has been found out that we could not configure two versions both to work properly from outside because of the combination of caTransfer feature and the request redirection from cagridnode. Therefore, there is not much use to maintain two versions.

  • how to set proxy during release: modify http.conf file on cagridnode.

gmail account for geworkbench web 'admin'

This is to handle the user registration confirmation message, forget-password request, etc. The actual password should be changed after deployed to the production.

genomespace project

http://www.genomespace.org/

  • code base:
    • geworkbench component: geonemespace
    • geworkbench lauch helper. This is a web start application. The code is not in any version control system. Meng emailed me the latest version in email today (6/17/2014).
      • It is deployed on http://www.c2b2.columbia.edu/geworkbench/ (the same server for c2b2 web site), and needed by the genomespace users to launch geworkbench application. Thus, it may need to be updated when we release a new version of geworkbench. The tricky task encountered before is to test it on different OS's and different browsers.
  • To launch the geWorkbench launcher from genomespace web site, you need java security permission granted for both genomespace.org and www.c2b2.columbia.edu. See technical information at http://java.com/en/download/help/jcp_security.xml
  • The current deployment (as of Jan. 23, 2015) is at /ifs/www/vhosts/www.c2b2.columbia.edu/html/geworkbench, totally 5 files: launchHelp.jar, launchHelp.jar.Feb14 (I believed this is just a backup), index.html (just a symbolic link to index.php), index.php, and launchHelp.jnlp. To update due to new release, I think the only deployed file that needs to be update is launchHelp.jar. In this jar file, which is also in the zip file Meng left, there are four class files, which are obviously supposed to be updated, one gif file (no need to update), and a directory META-INF containing three files (C2B2.DSA, C2B2.SF, MANIFEST.MF), which seem to be signature and I believe must be updated.
  • outside contact: Broad Institute

upgrade cagrid server

This task was pending admin group's work.

geworkbench collaboration (remote workspace) feature

  • server side setting (development and 'production'): web server? database?
  1. source code: login.c2b2.columbia.edu:/cvs/magnet/geworkbench module dev/collab
  2. web server: http://genspace.cs.columbia.edu:8080/axis2/services
  3. database: server genspace.cs.columbia.edu, dbname genspace, username/password can be found in code
  • see README under dev/collab for instructions on setting up this web service on any server including afdev.
Personal tools