Difference between revisions of "CaArray"

(The Remote Open File dialog (caArray))
 
(55 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
  
=The Remote Open File dialog (caArray)=
+
=The caArray Open File dialog=
 +
 
 +
 
 
geWorkbench can retrieve gene expression data from remote instances of the NCICB's caArray database.  These may be copies maintained by the NCI itself, or copies maintained locally at your or another institution.  You can maintain settings for any number of different caArray installations.
 
geWorkbench can retrieve gene expression data from remote instances of the NCICB's caArray database.  These may be copies maintained by the NCI itself, or copies maintained locally at your or another institution.  You can maintain settings for any number of different caArray installations.
  
To connect to a remote caArray, right-click on '''Project''' in the geWorkbench main window,
+
==Recent Changes==
 +
* The caArray 2.5.0 communication client code is not backward compatible with that used in caArray 2.4 or earlier versions.  With the release of geWorkbench 2.4.0, only communication with caArray 2.5 installations will be directly supported.
  
 +
* With geWorkbench 2.4.0, this dialog was renamed from "Remote" to "caArray 2.5" to more accurately reflect its current function.
  
[[Image:T_ProjectFolders_OpenFiles.png]]
+
* As of release 2.3.0, improvements in memory management allow large numbers of arrays to be downloaded.  Downloads of more than 500 arrays at one time have been tested without incident.
  
 +
* Multiple arrays downloaded at the same time are automatically merged together into a single data set.
  
which will bring up the '''Open File(s)''' dialog.  Click the '''Remote''' radio button.  The '''Open File''' dialog window will be expanded to include remote caArray sources as shown here.
 
  
 +
==Connecting with caArray==
  
[[Image:T_OpenFile_Remote.png]]
+
The "Open File" dialog can be reached by selecting the [[Workspace]] node and then either right-clicking, or using "File->Open File(s)" in the top menu bar.
 +
 
 +
 
 +
[[Image:Workspace_Open_Files.png]]
 +
 
 +
Click the '''caArray 2.5''' radio button.  The '''Open File''' dialog window will be expanded to include remote caArray sources as shown here.
 +
 
 +
 
 +
[[Image:Open_File_caArray.png]]
  
 
==Connection and Query Controls==
 
==Connection and Query Controls==
Line 21: Line 34:
 
* '''caArray (Source)''' menu - Shows a list of caArray instances that have been configured.  Entries for the "Production" and "Stage" instances of caArray maintained by NCI are preconfigured.
 
* '''caArray (Source)''' menu - Shows a list of caArray instances that have been configured.  Entries for the "Production" and "Stage" instances of caArray maintained by NCI are preconfigured.
 
* '''Go''' button - downloads a list of all available experiments from the remote source.
 
* '''Go''' button - downloads a list of all available experiments from the remote source.
* '''Filtering''' - The list of available experiments can be "filtered" to show only those matching particular criteria. The available options are:
+
* '''Filtering''' - The list of available experiments can be "filtered" to show only those matching particular criteria.  
** Categories: Experiment
+
 
** Field Selection: Array Provider, Organism, Principal Investigator
+
[[Image:T_OpenFile_Remote_Filtered_human_plus.png]]
** Values:
+
 
*** Array Provider: Affymetrix, Agilent, GenePix, Illumina, ImaGene, Niblegen, ScanArray, UCSF Spot.
+
* The available filtering options are:
*** Organism: many entries, including human, mouse, fly etc...
+
** '''Categories''': Experiment
*** Principal Investigator: PIs of listed public experiments in the particular instance of caArray.
+
** '''Field Selection''': Array Provider, Organism, Principal Investigator
 +
** '''Values''':
 +
*** '''Array Provider''': Affymetrix, Agilent, GenePix, Illumina, ImaGene, Niblegen, ScanArray, UCSF Spot.
 +
*** '''Organism''': many entries, including human, mouse, fly etc...
 +
*** '''Principal Investigator''': PIs of listed public experiments in the particular instance of caArray.
 
* '''Add A New Profile''' button - Opens the Data Source Definition Page used to add a new instance of caArray.
 
* '''Add A New Profile''' button - Opens the Data Source Definition Page used to add a new instance of caArray.
 
* '''Edit Profile''' button - Edit the currently selected profile.
 
* '''Edit Profile''' button - Edit the currently selected profile.
Line 44: Line 61:
 
==Setting up the connection==
 
==Setting up the connection==
  
You can '''Add a New Resource''' or '''Edit''' existing connection settings to set up a connection to an instance of caArray.   The configuration for connecting to the production instance of caArray at the NCI is shown here:
+
On the '''Open File''' dialog, you can use
 +
* '''Add a New Profile''' to set up a new connection to an instance of caArray. Use
 +
* '''Edit Profile''' to change the settings of an existing profile.
 +
 
 +
The settings for the production instance of caArray at the NCI are shown here:
  
 
[[Image:T_OpenFile_Remote_Edit_Profile.png]]
 
[[Image:T_OpenFile_Remote_Edit_Profile.png]]
 +
 +
* '''Profile Name''' - assigns a name to the profile
 +
* '''Protocol''' - Method by which to communicate with the remote caArray server.
 +
** '''HTTP'''  - this is the only protocol currently supported.
 +
** '''HTTPS''' - not used.
 +
** '''RMI''' - not used.
 +
* '''Hostname''' - the URL for the desired caArray server.
 +
* '''Port''' - the port on the remote caArray server at which caArray is available.
 +
* '''Username''' and '''Password''' (optional) - caArray supports retrieval of data from private experiments.  These fields allow the user to provide his or her caArray credentials to gain access to any private experiments to which he or she has rights.  No username or password is needed for accessing public experiments.
 +
** '''Note''' -  Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench.  However you can still put in a different username/password combination.  This is a property of the caArray server-side code.
  
 
==Searching and viewing available experiments==
 
==Searching and viewing available experiments==
  
If you click on the red '''Go''' button next to the caArray data source at the bottom of the dialog, all available caArray experiments at that location will be displayed.   
+
If you click on the '''Go''' button next to the caArray data source at the bottom of the dialog, all available caArray experiments at that location will be displayed.   
  
 
Instead, you can select only particular kinds of experiments by pushing the '''Filter''' button.  Here we show experiments of type "Human" being selected.
 
Instead, you can select only particular kinds of experiments by pushing the '''Filter''' button.  Here we show experiments of type "Human" being selected.
  
  
[[Image:T_OpenFile_Remote_Filtered_human_plus.png]]
+
[[Image:OpenFile_Remote_Filtered_human.png]]
 +
 
 +
 
 +
The below figure shows the matching entries in the database. 
 +
 
 +
'''Experiment ID''' - Note that the '''caArray Experiment ID''' of the selected experiment is shown in the upper right-hand corner of the dialog.  This same ID will be seen e.g. if you browse caArray through its own web interface.
 +
 
  
 +
[[Image:Open_File_caArray_Human.png]]
  
And here are the resulting entries in the database:
+
==Viewing the list of arrays available in a given experiment==
  
 +
Select an experiment and push the '''Show Arrays''' button to see the individual array datasets available for download for that experiment.
  
[[Image:T_OpenFile_Remote_Filtered_human.png]]
+
You can also right-click directly on an experiment and select '''Show Arrays'''.
  
 +
(As of geWorkbench v2.2.2, the list of arrays will be displayed sorted alphabetically.)
  
Select an experiment and push the '''Show Arrays''' button to see the individual array datasets available for download for this experiment.
 
  
[[Image:T_OpenFile_Remote_GliomaExpt.png]]
+
[[Image:Open_File_caArray_show_arrays.png]]
  
 
==Downloading select array datasets==
 
==Downloading select array datasets==
Now we will select four of the arrays of type HG-U133A and push the '''Open''' button to begin the download.  Dont' forget to click the '''Merge''' button first if desired to merge the data into a single dataset.
+
Now we will select four of the arrays of type HG-U133A and push the '''Open''' button to begin the download.  Multiple selections will automatically be merged into a single dataset in geWorkbench.
  
  
[[Image:T_OpenFile_Remote_GliomaExpt_choose.png]]
+
[[Image:Open_File_caArray_download_selected.png]]
  
  
  
You will be prompted to select the quantitation type from those available for the experiment.  Here we select CHP Signal:
+
You will be prompted to select the quantitation type from those available for the experiment.  Although all available quantitation types available for the dataset are listed, usually only the signal channel should be used.  Here we select CHPSignal (Affymetrix derived data CHP file):
  
 
[[Image:T_OpenFile_Remote_QuantiationSelectionChoices.png]]
 
[[Image:T_OpenFile_Remote_QuantiationSelectionChoices.png]]
 +
 +
 +
You will also be prompted to load an annotation file, if desired.  Here we choose the Affymetrix HG-U133A annotation file.
 +
 +
 +
[[Image:OpenFile_Remote_Annotation_Glioma.png]]
 +
 +
 +
As of geWorkbench 2.4.0, you will also be prompted to specify the type of annotation file type. 
 +
 +
The two supported annotation file types are
 +
* '''Affymetrix 3' Expression''' (CSV format)
 +
* '''Affymetrix WT Gene/Exon ST, transcript level''' (CSV-format) - This parser supports both the Gene 1.0 ST and Gene 2.0 ST annotation files (same format).
 +
 +
 +
Here we select the 3' Expression format.
 +
 +
 +
[[Image:Open_File_Select_Annotation_Type.png]]
 +
  
  
 
A progress bar will track the download process:
 
A progress bar will track the download process:
  
[[Image:T_OpenFile_Remote_ProgressBar.png]]
+
[[Image:OpenFile_Remote_Loading_GliomaExpt.png]]
 +
 
 +
 
 +
 
 +
The resulting data set will appear in the [[Workspace]].
 +
The downloaded dataset is name for the experiment name shown in caArray.
 +
 
 +
If multiple arrays were downloaded, they were merged into a single dataset, and the individual arrays are listed in the Arrays component:
 +
 
  
 +
[[Image:Open_File_caArray_merged_node.png]]
  
 +
==Cannot merge different array types==
  
The resulting data set will appear in the Project Folders component:
+
geworkbench does not currently support merging data from arrays of different types, e.g. HG-U-133A with HG-U133B.  If by accident such a merge is requested, the following error will appear:
  
  
[[Image:T_OpenFile_Remote_MergedSet.png]]
+
[[Image:OpenFile_Remote_Open_Merge_error.png]]

Latest revision as of 15:04, 22 April 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot



The caArray Open File dialog

geWorkbench can retrieve gene expression data from remote instances of the NCICB's caArray database. These may be copies maintained by the NCI itself, or copies maintained locally at your or another institution. You can maintain settings for any number of different caArray installations.

Recent Changes

  • The caArray 2.5.0 communication client code is not backward compatible with that used in caArray 2.4 or earlier versions. With the release of geWorkbench 2.4.0, only communication with caArray 2.5 installations will be directly supported.
  • With geWorkbench 2.4.0, this dialog was renamed from "Remote" to "caArray 2.5" to more accurately reflect its current function.
  • As of release 2.3.0, improvements in memory management allow large numbers of arrays to be downloaded. Downloads of more than 500 arrays at one time have been tested without incident.
  • Multiple arrays downloaded at the same time are automatically merged together into a single data set.


Connecting with caArray

The "Open File" dialog can be reached by selecting the Workspace node and then either right-clicking, or using "File->Open File(s)" in the top menu bar.


Workspace Open Files.png

Click the caArray 2.5 radio button. The Open File dialog window will be expanded to include remote caArray sources as shown here.


Open File caArray.png

Connection and Query Controls

The buttons at the bottom of the remote file dialog are:

  • caArray (Source) menu - Shows a list of caArray instances that have been configured. Entries for the "Production" and "Stage" instances of caArray maintained by NCI are preconfigured.
  • Go button - downloads a list of all available experiments from the remote source.
  • Filtering - The list of available experiments can be "filtered" to show only those matching particular criteria.

T OpenFile Remote Filtered human plus.png

  • The available filtering options are:
    • Categories: Experiment
    • Field Selection: Array Provider, Organism, Principal Investigator
    • Values:
      • Array Provider: Affymetrix, Agilent, GenePix, Illumina, ImaGene, Niblegen, ScanArray, UCSF Spot.
      • Organism: many entries, including human, mouse, fly etc...
      • Principal Investigator: PIs of listed public experiments in the particular instance of caArray.
  • Add A New Profile button - Opens the Data Source Definition Page used to add a new instance of caArray.
  • Edit Profile button - Edit the currently selected profile.
  • Delete Profile - Remove the currently selected profile.

Experiment Selection and Download Controls

  • caArray Experiments - This pane will contain the list of expriments retrieved from a particular instance of caArray.
  • Experiment Information - clicking on any experiment in the list will show its details here.
  • Number of Assays - The number of hybridizations for which data is available in this experiment, only available after "Show Arrays" has been pushed.
  • Show Arrays - Retrieve the full list of available hybridizations for this the selected experiment.
  • Open - Download the hybridizations which have been selected in the list.
  • Cancel - Dismiss the Remote dialog.

Loading data from an instance of caArray

Setting up the connection

On the Open File dialog, you can use

  • Add a New Profile to set up a new connection to an instance of caArray. Use
  • Edit Profile to change the settings of an existing profile.

The settings for the production instance of caArray at the NCI are shown here:

T OpenFile Remote Edit Profile.png

  • Profile Name - assigns a name to the profile
  • Protocol - Method by which to communicate with the remote caArray server.
    • HTTP - this is the only protocol currently supported.
    • HTTPS - not used.
    • RMI - not used.
  • Hostname - the URL for the desired caArray server.
  • Port - the port on the remote caArray server at which caArray is available.
  • Username and Password (optional) - caArray supports retrieval of data from private experiments. These fields allow the user to provide his or her caArray credentials to gain access to any private experiments to which he or she has rights. No username or password is needed for accessing public experiments.
    • Note - Once a username and password have been entered and submitted to caArray, you cannot go back to using no username/password, except by restarting geWorkbench. However you can still put in a different username/password combination. This is a property of the caArray server-side code.

Searching and viewing available experiments

If you click on the Go button next to the caArray data source at the bottom of the dialog, all available caArray experiments at that location will be displayed.

Instead, you can select only particular kinds of experiments by pushing the Filter button. Here we show experiments of type "Human" being selected.


OpenFile Remote Filtered human.png


The below figure shows the matching entries in the database.

Experiment ID - Note that the caArray Experiment ID of the selected experiment is shown in the upper right-hand corner of the dialog. This same ID will be seen e.g. if you browse caArray through its own web interface.


Open File caArray Human.png

Viewing the list of arrays available in a given experiment

Select an experiment and push the Show Arrays button to see the individual array datasets available for download for that experiment.

You can also right-click directly on an experiment and select Show Arrays.

(As of geWorkbench v2.2.2, the list of arrays will be displayed sorted alphabetically.)


Open File caArray show arrays.png

Downloading select array datasets

Now we will select four of the arrays of type HG-U133A and push the Open button to begin the download. Multiple selections will automatically be merged into a single dataset in geWorkbench.


Open File caArray download selected.png


You will be prompted to select the quantitation type from those available for the experiment. Although all available quantitation types available for the dataset are listed, usually only the signal channel should be used. Here we select CHPSignal (Affymetrix derived data CHP file):

T OpenFile Remote QuantiationSelectionChoices.png


You will also be prompted to load an annotation file, if desired. Here we choose the Affymetrix HG-U133A annotation file.


OpenFile Remote Annotation Glioma.png


As of geWorkbench 2.4.0, you will also be prompted to specify the type of annotation file type.

The two supported annotation file types are

  • Affymetrix 3' Expression (CSV format)
  • Affymetrix WT Gene/Exon ST, transcript level (CSV-format) - This parser supports both the Gene 1.0 ST and Gene 2.0 ST annotation files (same format).


Here we select the 3' Expression format.


Open File Select Annotation Type.png


A progress bar will track the download process:

OpenFile Remote Loading GliomaExpt.png


The resulting data set will appear in the Workspace. The downloaded dataset is name for the experiment name shown in caArray.

If multiple arrays were downloaded, they were merged into a single dataset, and the individual arrays are listed in the Arrays component:


Open File caArray merged node.png

Cannot merge different array types

geworkbench does not currently support merging data from arrays of different types, e.g. HG-U-133A with HG-U133B. If by accident such a merge is requested, the following error will appear:


OpenFile Remote Open Merge error.png