CaIntegrator
From Informatics
This page provides a quick introduction to caIntegrator, installation notes, bugs, architecture.
caIntegrator - Getting Started/Overview
http://gforge.nci.nih.gov/frs/?group_id=154
The caIntegrator knowledge framework provides researchers with the ability to perform ad hoc querying and reporting across multiple domains. The overall goal of the caIntegrator project is to provide a framework with the infrastructural components needed to develop enterprise level translational applications such as Rembrandt and I-SPY. In terms of the bigger picture, the goals are to:
- adopt caIntegrator as a warehouse to store analysis results from clinical studies involving genotypic/expression data
- adopt caArray to store the raw array data (this has yet to be decided, as the Cancer Center currently uses GeneTraffic for this purpose and we may decide to go with that option instead)
- adopt caTissue for managing the storage of tissue data
- adopt geWorkbench for facilitating analyses and for providing access to finding data stored in caIntegrator
- build a caBIG-compatible generic framework that allows retrieval and transformation of data from a variety of heterogeneous data sources that house:
- microarray data
- genomic data
- tissue array
- imaging and clinical data
- build a user-centric, high-performance search, retrieval and analysis platform for translational data:
- build an analytical tool that allows Clinician/Scientists/Biostatisticians to conduct translational analysis of study specific data in a user-friendly manner
- caIntegrator-derived applications:
- Rembrandt
- ISPY
- CGEMS
- DCEG/EAGLE
Architecture
This application framework comprises an n-tier service oriented architecture that allows pluggable web-based graphical user interfaces, a business object layer, server components that process the queries and result sets, a data access layer and a robust data warehouse.
- caIntegrator Architecture Guiding Principles
- build a framework with the infrastructural components needed to develop enterprise level translational applications such as Rembrandt, I-SPY, and CGEMS
- driven by user requirements
- user-friendly for a wide range of audience (physician scientists, programmers, statisticians)
- standards-based and pattern-driven
- extensible and scalable
- reuse/extend existing open-source technologies
- caBIG silver-level compatibility
- caIntegrator Architecture Summary
- n-tiered architecture (J2EE)
- rich user-friendly web tier (Struts, XML/XSL, AJAX)
- clinical-genomics service layer that handles both fine and coarse grained, strongly typed objects
- scalable run-time analysis service (JMS/R-Server/R-Binary)
- High Performance Query Service (multi-threaded query processing/ hybrid star schema)
- remote interface with WebGenome (EJB)
Hardware and Software Requirements
Java Software Development Kit (JDK) version 1.5.0_04
http://java.sun.com/j2se/1.5.0/download.jsp
JBoss Container (recommended:JBoss version 4.0.4)
http://labs.jboss.com/jbossas/downloads
Jakarta Ant version 1.6.2
http://archive.apache.org/dist/ant/binaries/
Oracle 9i Release 2 (9.2.0.5)
http://www.oracle.com
caIntegrator v1.0
http://gforge.nci.nih.gov/frs/?group_id=154
caIntegrator WGS 1.2 Source Bundle
http://gforge.nci.nih.gov/frs/?group_id=154
Weka 3.4.10 Data Mining Software
http://www.cs.waikato.ac.nz/~ml/weka/index.html
Installation Notes
Create Database and Load Seed Data
- Check to make sure that database is running and can be connected to:
C:\>tnsping biodb1_adora TNS Ping Utility for 32-bit Windows: Version 9.2.0.1.0 - Production on 05-JUN-2007 19:03:42 Copyright (c) 1997 Oracle Corporation. All rights reserved. Used parameter files: C:\OraClient92\network\admin\sqlnet.ora Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = ADORA)(PORT = 1521))) (CONNECT_DA TA = (SID = BIODB1) (SERVER = DEDICATED))) OK (20 msec)
- Logged into Oracle9iR2 on ADORA with DBA account
- Created a tablespace and created user "integrator" with this tablespace as default
- Downloaded the wgs_db.zip file from the caIntegrator gForge site specified above
- Unzipped the file and moved it to D:\Michael on ADORA
D:\Michael>imp integrator/<password>@biodb2 file=wgs.dmp log=wgs.log full=y
Import: Release 9.2.0.5.0 - Production on Tue May 29 18:20:37 2007 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.5.0 - Production Export file created by EXPORT:V09.02.00 via conventional path
import done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character set . importing CGEMSQA's objects into INTEGRATOR . . importing table "CHR_START_END" 26 rows imported . . importing table "DNA_SPECIMEN" 0 rows imported . . importing table "GENE_ALIAS" 71418 rows imported . . importing table "GENE_DIM" 26850 rows imported . . importing table "GENE_SNP_ASSO" 698058 rows imported . . importing table "GENOTYPE_FACT" 9 rows imported . . importing table "GENOTYPE_STATUS_LU" 2 rows imported . . importing table "HISTOLOGY" 0 rows imported . . importing table "SNPID_GENE_MAP" 586388 rows imported . . importing table "SNP_ANALYSIS_FINDING_FACT" 10 rows imported . . importing table "SNP_ANALYSIS_GROUP" 28 rows imported . . importing table "SNP_ASSAY" 1617414 rows imported . . importing table "SNP_ASSOCIATION_ANALYSIS" 10 rows imported . . importing table "SNP_DIM" 1062062 rows imported . . importing table "SNP_FREQUENCY_FACT" 3 rows imported . . importing table "SNP_MAP" 647002 rows imported . . importing table "SNP_PANEL" 4 rows imported . . importing table "SPECIMEN" 9454 rows imported . . importing table "STDPT_ANALYSIS_GRP_AS" 22922 rows imported . . importing table "STUDY_DIM" 3 rows imported . . importing table "STUDY_PANEL_ASSO" 5 rows imported . . importing table "STUDY_PARTICIPANT" 6902 rows imported . . importing table "STUDY_POPULATION" 12 rows imported . . importing table "STUDY_STDPOPUPLATION_ASSO" 12 rows imported . . importing table "STUDY_TIMECOURSE_DIM" 0 rows imported Import terminated successfully without warnings.
Result:
- 25 tables
- 4 views
- 61 indexes
Download required software packages and install (as root)
(1) downloaded JBoss 4.0.4 to /opt/downloads and unzipped it in /opt to create directory "jboss-4.0.4"
(2) downloaded Jakarta Ant 1.6.2 to /opt/downloads and unzipped it in /opt to create directory "ant-1.6.2"
(3) downloaded caIntegrator v1.0 to /opt/downloads and unzipped it in /opt to create directory "caintegrator"
(4) downloaded caIntegrator WGS 1.2 Source bundle and unzipped it in /opt to yield four new zip files:
caintegrator-analysis-commons.zip
caintegrator-application-commons.zip
caintegrator-spec.zip
cgems.zip
(5) download Weka 3.4.10 to /opt/downloads and unzipped it in /opt to create directory "weka-3-4-10"
Step 1: Building caintegrator-analysis-commons
[rmhonig@afdev opt]# export JAVA_HOME=/opt/java
[rmhonig@afdev caintegrator-analysis-commons]# /opt/ant-1.6.2/bin/ant build_dependency Buildfile: build.xml jar_check: warning: build_jar: [delete] Deleting directory /opt/caintegrator-analysis-commons/bin [mkdir] Created dir: /opt/caintegrator-analysis-commons/bin [javac] Compiling 53 source files to /opt/caintegrator-analysis-commons/bin [jar] Building jar: /opt/caintegrator-analysis-commons/caintegrator-analysis-commons.jar [delete] Deleting directory /opt/caintegrator-analysis-commons/bin [mkdir] Created dir: /opt/caintegrator-analysis-commons/bin [javac] Compiling 53 source files to /opt/caintegrator-analysis-commons/bin [jar] Building jar: /opt/caintegrator-analysis-commons/caintegrator-analysis-commons.jar
build_dependency: [echo] [echo] Artifacts copied to ../artifacts [echo] [copy] Copying 1 file to /opt/artifacts BUILD SUCCESSFUL Total time: 5 seconds
*** CONFIRM *** caintegrator-analysis-commons.jar was successfully created under /opt/artifacts directory
Step 2: Building caintegrator-spec
- [rmhonig@afdev weka-3-4-10]# cp weka.jar /opt/caintegrator-spec/deployed_jars
- [rmhonig@afdev weka-3-4-10]# cd /opt/caintegrator-spec/deployed_jars
- [rmhonig@afdev opt]# cd caintegrator-spec
[rmhonig@afdev caintegrator-spec]# /opt/ant-1.6.2/bin/ant build_dependency Buildfile: build.xml jar_check: warning:
config_application_context: [copy] Copying 1 file to /opt/caintegrator-spec
build_jar_anthill: [mkdir] Created dir: /opt/caintegrator-spec/bin [javac] Compiling 278 source files to /opt/caintegrator-spec/bin [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/annotation/snp/bean/PlatformTechnology.java:42: warning: unmappable character for encoding UTF8 [javac] * The SNPlex� Genotyping System enables the simultaneous genotyping of up to 48 SNPs (single nucleotide [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/common/bean/Measurement.java:46: warning: unmappable character for encoding UTF8 [javac] * such as ml, kg, mm, m/s, �F, etc. [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/common/bean/Measurement.java:53: warning: unmappable character for encoding UTF8 [javac] * such as ml, kg, mm, m/s, �F, etc. [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:113: warning: unmappable character for encoding UTF8 [javac] * Estrogen Receptor Status � Total Score Total Score = ER_PS+ ER_IS Considered Allred Score; = 3 is [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:120: warning: unmappable character for encoding UTF8 [javac] * Estrogen Receptor Status � Total Score Total Score = ER_PS+ ER_IS Considered Allred Score; = 3 is [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:288: warning: unmappable character for encoding UTF8 [javac] * Size of Largest Palpable Node (cm) � Clinical Assessment at Baseline [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:293: warning: unmappable character for encoding UTF8 [javac] * Size of Largest Palpable Node (cm) � Clinical Assessment at Baseline [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:394: warning: unmappable character for encoding UTF8 [javac] * Progesterone Receptor Status � Total Score Total Score = PgR_PgS+ PgR_IS Considered Allred Score; [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/finding/clinical/breastCancer/bean/BreastCancerClinicalFinding.java:401: warning: unmappable character for encoding UTF8 [javac] * Progesterone Receptor Status � Total Score Total Score = PgR_PgS+ PgR_IS Considered Allred Score; [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/study/bean/ProcedureName.java:97: warning: unmappable character for encoding UTF8 [javac] * the health of the heart�s major pumping chambers. [javac] ^ [javac] /opt/caintegrator-spec/src/gov/nih/nci/caintegrator/domain/study/bean/ProcedureName.java:106: warning: unmappable character for encoding UTF8 [javac] * history � an account of the symptoms as experienced by the patient. Together with the medical history, [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 11 warnings [jar] Building jar: /opt/caintegrator-spec/caintegrator-spec.jar build_dependency: [echo] [echo] Artifacts copied to ../artifacts [echo] [copy] Copying 1 file to /opt/artifacts BUILD SUCCESSFUL Total time: 11 seconds
*** CONFIRM *** caintegrator-spec.jar was successfully created under /opt/artifacts directory
Step 3: Building caintegrator-application-commons
[rmhonig@afdev opt]# cd caintegrator-application-commons
[rmhonig@afdev caintegrator-application-commons]# /opt/ant-1.6.2/bin/ant build_dependency Buildfile: build.xml build_jar_anthill: [mkdir] Created dir: /opt/caintegrator-application-commons/bin [javac] Compiling 70 source files to /opt/caintegrator-application-commons/bin [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [jar] Building jar: /opt/caintegrator-application-commons/caintegrator-application-commons.jar retrieve_deployment_artifacts: [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts build_dependency: [echo] [echo] Artifacts copied to ../artifacts [echo] [copy] Copying 1 file to /opt/artifacts BUILD SUCCESSFUL Total time: 3 seconds
*** CONFIRM *** caintegrator-application-commons.jar was successfully created under /opt/artifacts directory
Step 4: Building cgems.war
[rmhonig@afdev ~]# cd /opt/cgems/
[rmhonig@afdev cgems]# /opt/ant-1.6.2/bin/ant build_war_anthill Buildfile: build.xml config_application_context: [copy] Copying 1 file to /opt/cgems [move] Moving 1 files to /opt/cgems/src config_common_security_module: [echo] Configuring Common Security Module [echo] Setting ApplicationSecurityConfig.xml [copy] Copying 1 file to /opt/cgems/csm_deploy [echo] Setting cgems.hibernate.cfg.xml [copy] Copying 1 file to /opt/cgems/csm_deploy [echo] Configuring oracle-ds.xml [copy] Copying 1 file to /opt/cgems/csm_deploy [echo] Configuring properties-service.xml [copy] Copying 1 file to /opt/cgems/csm_deploy [replaceregexp] The following file is missing: '/opt/cgems/csm_deploy/properties-service.xml' [echo] Configuring login-config.xml [copy] Copying 1 file to /opt/cgems/csm_deploy configure_cgems-properties-service: [echo] Setting caIntegratorConfig.xml [copy] Copying 1 file to /opt/cgems/caintegrator_deploy [echo] Configuring properties-service.xml [copy] Copying 1 file to /opt/cgems/caintegrator_deploy [copy] Copying 1 file to /opt/cgems/caintegrator_deploy [copy] Copying 1 file to /opt/cgems/caintegrator_deploy [copy] Copying 1 file to /opt/cgems/caintegrator_deploy deploy_artifacts: [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts [copy] Copying 1 file to /opt/artifacts build_war_anthill: [mkdir] Created dir: /opt/cgems/bin [javac] Compiling 49 source files to /opt/cgems/bin [copy] Copying 54 files to /opt/cgems/WebRoot/WEB-INF/classes [copy] Copying 3 files to /opt/cgems/WebRoot/WEB-INF/classes [war] Building war: /opt/cgems/cgems.war [war] Warning: selected war files include a WEB-INF/web.xml which will be ignored (please use webxml attribute to war task) [copy] Copying 1 file to /opt/artifacts BUILD SUCCESSFUL Total time: 13 seconds
(1) *** CONFIRM *** cgems.war was successfully created under /opt/artifacts directory (2) ApplicationSecurityConfig.xml was successfully created under the /opt/artifacts directory (3) oracle-ds.xml was successfully created under the /opt/artifacts directory (4) properties-service.xml was successfully created under the /opt/artifacts directory (5) login-config.xml was successfully created under the /opt/artifacts directory
Step 5: Configure JBOSS for WGS CGEMS application
- Modify /opt/artifacts/oracle-ds.xml with database information (IP address, db_instance, user-name, password)
<datasources> <local-tx-datasource> <jndi-name>cgems</jndi-name> <connection-url>jdbc:oracle:thin:@156.111.188.180:1521:BIODB2</connection-url> <user-name>integrator</user-name> <password>XXXXXXXXX</password> <driver-class>oracle.jdbc.driver.OracleDriver</driver-class> <exception-sorter-class-name>org.jboss.resource.adapter.jdbc.vendor.OracleExceptionSorter</exception-sorter-class-name> </local-tx-datasource> </datasources>
- # cp oracle-ds.xml /opt/jboss-4.0.4/server/default/deploy/
- # mkdir caintegrator/externalized_properties_folder
- # cp mail.properties zip.properties ../caintegrator/externalized_properties_folder/
- copied the following to /opt/jboss-4.0.4/server/default/deploy/properties-service.xml
<attribute name="Properties"> gov.nih.nci.cgems.zip.properties=/opt/caintegrator/externalized_properties_folder/zip.properties gov.nih.nci.cgems.mail.properties=/opt/caintegrator/externalized_properties_folder/mail.properties gov.nih.nci.caintegrator.configFile=/opt/caintegrator/externalized_properties_folder/caIntegratorConfig.xml </attribute>
Step 6: Deploy WGS CGEMS application under JBoss
- [rmhonig@afdev artifacts]# cp cgems.war /opt/jboss-4.0.4/server/default/deploy/
- [rmhonig@afdev bin]# export JAVA_HOME=/opt/java
- [rmhonig@afdev ~]# cd /opt/jboss-4.0.4/bin
- [rmhonig@afdev bin]# nohup ./run.sh & (to start JBoss)
- Point the browser to http://afdev:8080/cgems/ to get to CGEMS About page.
- [rmhonig@afdev bin]# ./shutdown.sh -S (to stop JBoss)