CaArray
The caArray Open File dialog
geWorkbench can retrieve gene expression data from remote instances of the NCICB's caArray database. These may be copies maintained by the NCI itself, or copies maintained locally at your or another institution. You can maintain settings for any number of different caArray installations.
Recent Changes
- The caArray 2.5.0 communication client code is not backward compatible with that used in caArray 2.4 or earlier versions. With the release of geWorkbench 2.4.0, only communication with caArray 2.5 installations will be directly supported.
- With geWorkbench 2.4.0, this dialog was renamed from "Remote" to "caArray 2.5" to more accurately reflect its current function.
- As of release 2.3.0, improvements in memory management allow large numbers of arrays to be downloaded. Downloads of more than 500 arrays at one time have been tested without incident.
- Multiple arrays downloaded at the same time are automatically merged together into a single data set.
Connecting with caArray
The "Open File" dialog can be reached by selecting the Workspace node and then either right-clicking, or using "File->Open File(s)" in the top menu bar.
Click the caArray 2.5 radio button. The Open File dialog window will be expanded to include remote caArray sources as shown here.
Connection and Query Controls
The buttons at the bottom of the remote file dialog are:
- caArray (Source) menu - Shows a list of caArray instances that have been configured. Entries for the "Production" and "Stage" instances of caArray maintained by NCI are preconfigured.
- Go button - downloads a list of all available experiments from the remote source.
- Filtering - The list of available experiments can be "filtered" to show only those matching particular criteria.
- The available filtering options are:
- Categories: Experiment
- Field Selection: Array Provider, Organism, Principal Investigator
- Values:
- Array Provider: Affymetrix, Agilent, GenePix, Illumina, ImaGene, Niblegen, ScanArray, UCSF Spot.
- Organism: many entries, including human, mouse, fly etc...
- Principal Investigator: PIs of listed public experiments in the particular instance of caArray.
- Add A New Profile button - Opens the Data Source Definition Page used to add a new instance of caArray.
- Edit Profile button - Edit the currently selected profile.
- Delete Profile - Remove the currently selected profile.
Experiment Selection and Download Controls
- caArray Experiments - This pane will contain the list of expriments retrieved from a particular instance of caArray.
- Experiment Information - clicking on any experiment in the list will show its details here.
- Number of Assays - The number of hybridizations for which data is available in this experiment, only available after "Show Arrays" has been pushed.
- Show Arrays - Retrieve the full list of available hybridizations for this the selected experiment.
- Open - Download the hybridizations which have been selected in the list.
- Cancel - Dismiss the Remote dialog.
Loading data from an instance of caArray
Setting up the connection
On the Open File dialog, you can use
- Add a New Profile to set up a new connection to an instance of caArray. Use
- Edit Profile to change the settings of an existing profile.
The settings for the production instance of caArray at the NCI are shown here:
- Profile Name - assigns a name to the profile
- Protocol - Method by which to communicate with the remote caArray server.
- HTTP - this is the only protocol currently supported.
- HTTPS - not used.
- RMI - not used.
- Hostname - the URL for the desired caArray server.
- Port - the port on the remote caArray server at which caArray is available.
- Username and Password (optional) - caArray supports retrieval of data from private experiments. These fields allow the user to provide his or her caArray credentials to gain access to any private experiments to which he or she has rights. No username or password is needed for accessing public experiments.
- Note - Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench. However you can still put in a different username/password combination. This is a property of the caArray server-side code.
Searching and viewing available experiments
If you click on the Go button next to the caArray data source at the bottom of the dialog, all available caArray experiments at that location will be displayed.
Instead, you can select only particular kinds of experiments by pushing the Filter button. Here we show experiments of type "Human" being selected.
The below figure shows the matching entries in the database.
Experiment ID - Note that the caArray Experiment ID of the selected experiment is shown in the upper right-hand corner of the dialog. This same ID will be seen e.g. if you browse caArray through its own web interface.
Viewing the list of arrays available in a given experiment
Select an experiment and push the Show Arrays button to see the individual array datasets available for download for that experiment.
You can also right-click directly on an experiment and select Show Arrays.
(As of geWorkbench v2.2.2, the list of arrays will be displayed sorted alphabetically.)
Downloading select array datasets
Now we will select four of the arrays of type HG-U133A and push the Open button to begin the download. Multiple selections will automatically be merged into a single dataset in geWorkbench.
You will be prompted to select the quantitation type from those available for the experiment. Although all available quantitation types available for the dataset are listed, usually only the signal channel should be used. Here we select CHPSignal (Affymetrix derived data CHP file):
You will also be prompted to load an annotation file, if desired. Here we choose the Affymetrix HG-U133A annotation file.
As of geWorkbench 2.4.0, you will also be prompted to specify the type of annotation file type.
The two supported annotation file types are
- Affymetrix 3' Expression (CSV format)
- Affymetrix WT Gene/Exon ST, transcript level (CSV-format) - This parser supports both the Gene 1.0 ST and Gene 2.0 ST annotation files (same format).
Here we select the 3' Expression format.
A progress bar will track the download process:
The resulting data set will appear in the Workspace. The downloaded dataset is name for the experiment name shown in caArray.
If multiple arrays were downloaded, they were merged into a single dataset, and the individual arrays are listed in the Arrays component:
Cannot merge different array types
geworkbench does not currently support merging data from arrays of different types, e.g. HG-U-133A with HG-U133B. If by accident such a merge is requested, the following error will appear: