Difference between revisions of "Workspace"

(Overview)
 
(115 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{TutorialsTopNav}}
 
{{TutorialsTopNav}}
  
==Outline==
+
==Overview==
In this tutorial, you will learn how to:
 
  
*Create a new Project.
 
*Load microarray data.
 
*Merge data from several loaded microarray experiments.
 
*Rename a project and/or project node.
 
*Remove a project and/or project node.
 
*Save project files that you have created.
 
*Load, add, and/or modify remote data.
 
  
 +
The Workspace is located in the upper-left-hand corner of the application. It is used to contain open data files and store analysis results during a geWorkbench session. When geWorkbench is launched, an empty Workspace folder is displayed.
  
  
==Supported data formats==
+
[[Image:Workspace.png]]
*Microarray
 
**Affymetrix MAS5/GCOS files - produced by the Affymetrix data analysis programs.
 
**Affymetrix File Matrix - a spreadsheet-type multi-experiment format; this is the native file type created by geWorkbench from merged datasets.
 
**Tab-delimited text (RMAExpress file) - A simple columnar file format, produced by the program RMAExpress.
 
**Affy Excel or txt data file - formats for single Affymetrix experiments (not supported).
 
**Genepix files - Produced by a popular analysis program for two-color microarrays.
 
*Other
 
**FASTA files. DNA or amino-acid sequence files in FASTA format.
 
**Pattern files - sequence motifs produced using the Pattern Discovery component of geWorkbench.
 
**Genotypic data files - (not supported).
 
  
  
==Data organization==
+
The workspace as a whole, with all its projects and data nodes, can be saved and restored.  However, only one workspace can be open at one time.  Creating a new workspace or loading a saved workspace will overwrite the current workspace.
  
===Workspaces and Projects===
 
In the '''Project Folders''' component there is a top-level object called a workspace.  The workspace can contain one or more separate projects, and each proejct can contain opened data files and analysis results.  The workspace as a whole, with all its projects and data nodes, can be saved and restored.  Projects allow data to be grouped, for example by experiment.  A project can contain many different types of data, for example microarray data, fasta sequence files and graphical images.
 
  
===Microrray data and merging===
 
A file from disk or from the network is be opened within a given project.  Creation of a new project is described below.  When working with microarray data, all data to be analyzed must be present within one data node in a project.  If the data exists as multiple files containing results from single arrays, the data must be merged into a single node before it can be used.  geWorkbench can perform this merging step either at the time data is read in, or later in a separate step.  Once merged, such a dataset can be saved out to disk; it will be saved in the geWorkbench matrix file format.
 
  
 +
* To view the next level down in the hierarchy, click on the “+” icon to expand the branch.
  
==Limitations==
+
* To collapse a branch, click on the -” icon.
Only one data node can be selected at one time.  If you wish to save a data node to a file, in most cases you must specify a file type extension, such as ".exp" for the geWorkbench merged file matrix format, or ".fasta" for a sequence file.  At present, the only type of remote data source which can be opened is NCICB's caArray database.  The remote file open feature is not multi-threaded, so you cannot perform other tasks in geWorkbench while downloading remote files.
 
  
  
 +
The Workspace may contain several heterologous datasets. These datasets can include input (source) data and derived data (results) associated with an experiment as well as image files. Source data can be loaded from the user’s local storage or from remote servers.  Loading datasets into the geWorkbench Workspace does not change their physical storage locations.
  
==Creating a new project and loading microarray data files==
+
'''Note''' - The top menu-bar items [[Menu_Bar#File|File]] and [[Menu_Bar#Edit|Edit]] also apply to items in the Workspace.  They offer many of the same options shown below, except e.g. Microarray merging is only available from the top level [[Menu_Bar#File|File]] menu.
  
 +
==Workspace Menu Options==
  
In this example, we will load 10 individual Affymetrix MAS5 format files, and merge them into a single dataset.  The origin of these file is described in the section [[Tutorial_-_Data]]
+
Right-clicking on the Workspace node gives a menu with the following options
  
All data must belong to a project.  Right-click on the '''Workspace''' entry in the '''Project Folders''' window at upper left to create a new project.
 
  
[[Image:T_NewProject.png]]
+
[[Image:Workspace_right_click_menu.png]]
  
  
 +
===Open Files===
 +
* Loading data from local files is covered in the chapter [[Local_Data_Files | Local Data Files]]
 +
* Retrieving data from remote sources (caArray) is covered in [[Remote_Data_Sources | Remote Data Sources]]
  
Next, right-click on the '''New Project''' entry and select '''Open Files'''.
 
  
[[Image:T_OpenFiles.png]]
+
===Open PDB File from RCSB Protein Data Bank===
 +
If '''Open PDB File from RCSB Protein Data Bank''' is chosen, a dialog box appears.
  
  
Here, we will select file type '''Affymetrix GCOS/MAS5''' as shown.
+
[[Image:Project_Folders_Project_Open_RCSB_PDB.png]]
  
Make sure to check the '''Merge files''' checkbox.  This will created the merged data node as the files are read in.
 
  
We will select 10 MAS5 format text files from the directory geworkbench\data\training\cardiogenomics.med.harvard.edu, which is included in the geWorkbench download.  
+
Type in the name of a PDB structure entry and it will be retrieved from the RCSB Protein Data Bank and loaded into geWorkbench.
  
Click '''Open'''.
+
==Workspace Data Node Menu Options==
  
[[Image:T_OpenFile_CardioMerge.png]]
+
Right-clicking on a data node will produce a popup menu with the following options:
  
  
 +
[[Image:Workspace_data_node_right_click.png]]
  
You may see the message "The chip type HG_U95Av2 is recognized..."
 
  
[[Image:T_OpenFile_ChipRecog.png]]
+
===Save===
  
 +
Save the currently selected data node.  This is implemented for at least the below data types.  If saving of a particular type has not been implemented, the "Save" option will be disabled (grayed-out).
  
 +
* '''Microarray gene expression''' -  data is saved using the geWorkbench ".exp" format, regardless of the original format. This allows saving e.g. a merged dataset, and/or any array and marker sets that may have been created.
 +
* '''FASTA''' - saved in FASTA format (.fasta).
 +
* '''PDB''' - saved in PDB format (.pdb).
 +
* '''Network''' - saved using the Adjacency Matrix "ADJ" format (.adj).
 +
* '''t-test result''' - saved as comma separated value (.csv) file.
 +
* '''Image''' - saved as PNG file (.png).
  
The merged dataset is listed in the Project folder. The data is displayed, in single array format, in the '''Microarray Viewer'''. Note we have increased the intensity slider to maximum here. You can scroll through the arrays from first to last using the slider. The display in the '''Microarray Viewer''' is by marker in the linear order the markers appear in the data file.  It does not correspond in any way to a physical picture or representation of the actual 2-D microarray.
+
For each file type, a file browser with a filter for the target file type extension (e.g. .fasta) will be opened.
  
[[Image:T_FullApp_MergedData.png]]
+
===Export to tab-delim===
 +
This option will only appear for microarray gene expression datasets.  It allows the microarray dataset to be exported in a spreadsheet format, as a tab-delimited text file.  The first row contains array names and the first column contains the marker names.
  
 +
This export format does not preserve array or marker sets that may have been defined in geWorkbench for the dataset.  However, it can be used to save a copy of e.g. merged, filtered, and/or normalized data in a format easily used by other programs.
  
==Merging microarray datafiles after they have already been loaded.==
+
When exporting, the file save dialog will display the name of the dataset, minus any recognized file-type suffixes that may be present (e.g. .soft).
  
If Affymetrix data files are not merged at the time they are read in, they can also be merged later, as long as they are from the same chip type.
+
===Rename===
  
 +
A dialog box will appear in which a new name can be entered.
  
'''1.''' Select the read-in data files that you want to merge.
 
  
'''2.''' Click on '''File''' in the menu bar, and choose '''Merge Datasets'''.
+
[[Image:Workspace_Rename_Node.png]]
  
The picture shows the resulting merged dataset created from several individual data files.
+
===Remove===
  
[[Image:T_ProjectFolder_MergeIndivid.png]]
+
The selected data nodes and any child data nodes will be removed.  Multiple selections can be made.
  
 +
==Data Node Hover-text Information==
  
The result is a new data node containing the merged data.  The original data nodes are still present.
+
For microarray datasets, adjacency matrices (network nodes), sequence and pattern nodes, moving the mouse cursor over the data nodes will display additional details about a dataset.
  
[[Image:T_ProjectFolder_IndividMerged.png]]
+
Microarray datasets: hover text displays number of markers and arrays.
  
  
==Renaming a project or a data node==
+
[[Image:Dataset_hover_microarray.png]]
  
  
===Renaming a project===
+
Adjacency Matrix: hover text displays number of nodes and edges in the network.
  
'''1.''' Right-click on '''Project''' folder.
 
  
'''2.''' Select '''Rename'''.
+
[[Image:Workspace_Dataset_hover_network.png]]
  
  
[[Image:T_ProjectFolder_RenameProject.png]]
+
Sequence node:  
  
  
'''3.''' In the pop-up screen rename your project.
+
[[Image:Workspace_hover_sequences.png]]
  
'''4.''' Click on the '''OK''' button
 
  
 +
Pattern node:
  
===Renaming a project data node===
 
  
'''1.''' Right-click on a Project Folder data node.
+
[[Image:Workspace_Pattern_Hover.png]]
  
'''2.''' Select '''Rename'''.
+
==Workspaces==
  
[[Image:T_ProjectFolder_RenameDataset.png]]
+
===Saving the Workspace===
  
 +
Saving the workspace saves all its data to a file on disk.  The workspace can later be reloaded to resume work. 
  
 +
====Special considerations on saving and restoring workspaces====
 +
* '''Versions''' - Workspaces in general may not be compatible across different versions of geWorkbench.
 +
* '''Loaded components''' - The configuration in the [[Component_Configuration_Manager| CCM]] of which components are loaded and which are not is not saved when the workspace is saved; it is maintained separately.
 +
* '''Changes to loaded components''' - If a workspace is saved, and then changes are made to which components are loaded in the [[Component_Configuration_Manager| CCM]], then in rare cases problems may occur when one attempts to reload the saved workspace. 
  
'''3.''' In the pop-up screen rename your data node.
 
  
[[Image:T_ProjectFolder_RenameDataset2.png]]
+
[[Image:File_Save_Workspace.png]]
  
 +
===Opening a Saved Workspace===
  
'''4.''' Click on the '''OK''' button.
+
File->Open-Workspace.
  
 +
Only one workspace at a time can be loaded in geWorkbench.  Opening a saved workspace will destroy the existing workspace.  For this reason, if you opt to open a workspace, you will be prompted as to whether to save the existing workspace first.
  
  
==Removing a project or a data node==
+
[[Image:File_Open_Workspace.png]]
  
===Removing a project===
 
  
'''1.''' Right-click on '''Project''' folder.
+
A dialog box will appear in which the location and file name to which to save the workspace can be chosen.
  
'''2.''' Select '''Remove'''.
 
  
 +
[[Image:File_Save_Workspace_Dialog.png]]
  
 +
===Creating a New Workspace===
  
===Removing a project data node===
+
Only one workspace at a time can be loaded in geWorkbench.  Creating a new workspace will destroy the existing workspace.  For this reason, if you opt to create a new workspace, you will be prompted as to whether to save the existing workspace first.
  
'''1.''' Right-click on the data node.
+
A new workspace can only be created from the top level menu bar.
  
'''2.''' Select '''Remove'''.
 
  
 +
[[Image:File_New_Workspace.png]]
  
==Saving a data node to a file==
 
  
It is here that, among other things, you can create the matrix multi-experiment file format used by geWorkbench from a merged dataset.
+
Select '''File->New->Workspace'''.
 
 
'''1.''' Right-click on data node that you want to save.
 
 
 
'''2.''' Click '''Save'''.
 
 
 
[[Image:T_ProjectFolder_SaveNode.png]]
 
 
 
 
 
A standard file '''Save''' screen will come up.
 
 
 
'''3.''' Choose a location.
 
 
 
'''4.''' Enter a name.  Here you should be careful to enter an appropriate file type extension. as this is not automatic.  For example for the merged multi-experiment matrix file type you should include the extension ".exp" in the filename.
 
 
 
'''5.''' Click on the '''Save''' button.
 
 
 
 
 
==Working with remote data sources==
 
 
 
===The remote Open File dialog===
 
geWorkbench can retrieve data from certain remote data sources; currently only instances of the NCICB's caArray database are supported.  The Open File dialog allows remote sources to be added to the list of those available either manually or through discovery using grid services.  Entries (locations, parameters) for non-grid services can be edited.
 
 
 
As before, right-click on '''Project''' which will bring up the '''Open File''' dialog.  Click the '''Remote''' radio button.  The '''Open File''' dialog window will be expanded to include remote sources.
 
 
 
Note the distinction between the "Open File" button, which opens a local or remote file, and the "Go" button, described below, which connects to a chosen remote resource to allow browsing.
 
 
 
[[Image:(T)MEditRemoteData.png]]
 
 
 
Four additional buttons appear.  They are:
 
 
 
'''caArray''' button - lists remote resources.
 
 
 
'''Go''' button - connects to the selected remote source.
 
 
 
'''Add A New Resource''' button - Opens the Data Source Definition Page used to add a remote data source.
 
 
 
'''Edit''' button - Edits remote source parameters.
 
 
 
 
 
===Loading data from a remote instance of caArray===
 
 
 
Click on the '''Go''' button next to the caArray data source at the bottom of the dialog.  All available caArray experiments at that location will be displayed.
 
 
 
[[Image:T_ProjectFolder_caArrayExpts.png]]
 
 
 
Note that the type of experiment data provided here in caArray is of type "derived bioassay".  This is data that has been processed from raw data, for example using RMA.
 
 
 
Select an experiment that has derived bioassays.  Here we depict the experiment ending in *99049.  The number of derived bioassays, 12, is displayed, along with the experiment information. (A new dataset, "Public Rembrandt" has subsequently been added, which would also be good to use for experimenting with caArray data download.  It has 53 bioassays available).
 
 
 
To retrieve the bioassays themselves, right click on the experiment and press '''Get bioassays'''.  This will download the list of available bioassays into geWorkbench.
 
 
 
[[Image:T_ProjectFolder_GetRemoteBioassays.png]]
 
 
 
 
 
To actually retrieve bioassay data, select one or more desired arrays and push the '''Open''' button.  (Although below we show retrieving multiple array datasets, for demonstration purposes you might want to first select just one, as each can take several minutes to download).
 
 
 
You can either select the merge option here, or wait until all data has been successfully download to perform a merge later.
 
 
 
Note that downloading data from a remote resource is not multi-threaded, so you will not be able to perform other actions in geWorkbench while the data is downloaded.
 
 
 
[[Image:T_ProjectFolder_OpenRemoteBioassays.png]]
 
 
 
===To add a remote source===
 
 
 
(Note - currently only caArray data sources are supported).
 
 
 
'''1.''' Click on the '''Add A New Resource''' button.
 
 
 
[[Image:(T)MRemoteData2.png]]
 
This is the Data Source Definition Page
 
 
 
'''2.''' Fill in the Data Source definition page. URL and Short Name are required fields.
 
 
 
'''3.''' Click on the OK button.
 
 
 
The configuration is set up to automatically reflect your additional Data Source.
 
 
 
 
 
===To modify a remote source===
 
 
 
The specification of the remote resource can be edited.
 
 
 
'''1.''' Click on the '''Edit''' button at the bottom of the '''Open File''' dialog.
 
 
 
'''2.''' Make the changes that you need.
 
 
 
'''3.''' Click on the '''OK''' button
 
 
 
[[Image:T_ProjectPanel_EditRemote.png]]
 

Latest revision as of 17:07, 10 January 2014

Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials

Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot


Overview

The Workspace is located in the upper-left-hand corner of the application. It is used to contain open data files and store analysis results during a geWorkbench session. When geWorkbench is launched, an empty Workspace folder is displayed.


Workspace.png


The workspace as a whole, with all its projects and data nodes, can be saved and restored. However, only one workspace can be open at one time. Creating a new workspace or loading a saved workspace will overwrite the current workspace.


  • To view the next level down in the hierarchy, click on the “+” icon to expand the branch.
  • To collapse a branch, click on the “-” icon.


The Workspace may contain several heterologous datasets. These datasets can include input (source) data and derived data (results) associated with an experiment as well as image files. Source data can be loaded from the user’s local storage or from remote servers. Loading datasets into the geWorkbench Workspace does not change their physical storage locations.

Note - The top menu-bar items File and Edit also apply to items in the Workspace. They offer many of the same options shown below, except e.g. Microarray merging is only available from the top level File menu.

Workspace Menu Options

Right-clicking on the Workspace node gives a menu with the following options


Workspace right click menu.png


Open Files


Open PDB File from RCSB Protein Data Bank

If Open PDB File from RCSB Protein Data Bank is chosen, a dialog box appears.


Project Folders Project Open RCSB PDB.png


Type in the name of a PDB structure entry and it will be retrieved from the RCSB Protein Data Bank and loaded into geWorkbench.

Workspace Data Node Menu Options

Right-clicking on a data node will produce a popup menu with the following options:


Workspace data node right click.png


Save

Save the currently selected data node. This is implemented for at least the below data types. If saving of a particular type has not been implemented, the "Save" option will be disabled (grayed-out).

  • Microarray gene expression - data is saved using the geWorkbench ".exp" format, regardless of the original format. This allows saving e.g. a merged dataset, and/or any array and marker sets that may have been created.
  • FASTA - saved in FASTA format (.fasta).
  • PDB - saved in PDB format (.pdb).
  • Network - saved using the Adjacency Matrix "ADJ" format (.adj).
  • t-test result - saved as comma separated value (.csv) file.
  • Image - saved as PNG file (.png).

For each file type, a file browser with a filter for the target file type extension (e.g. .fasta) will be opened.

Export to tab-delim

This option will only appear for microarray gene expression datasets. It allows the microarray dataset to be exported in a spreadsheet format, as a tab-delimited text file. The first row contains array names and the first column contains the marker names.

This export format does not preserve array or marker sets that may have been defined in geWorkbench for the dataset. However, it can be used to save a copy of e.g. merged, filtered, and/or normalized data in a format easily used by other programs.

When exporting, the file save dialog will display the name of the dataset, minus any recognized file-type suffixes that may be present (e.g. .soft).

Rename

A dialog box will appear in which a new name can be entered.


Workspace Rename Node.png

Remove

The selected data nodes and any child data nodes will be removed. Multiple selections can be made.

Data Node Hover-text Information

For microarray datasets, adjacency matrices (network nodes), sequence and pattern nodes, moving the mouse cursor over the data nodes will display additional details about a dataset.

Microarray datasets: hover text displays number of markers and arrays.


Dataset hover microarray.png


Adjacency Matrix: hover text displays number of nodes and edges in the network.


Workspace Dataset hover network.png


Sequence node:


Workspace hover sequences.png


Pattern node:


Workspace Pattern Hover.png

Workspaces

Saving the Workspace

Saving the workspace saves all its data to a file on disk. The workspace can later be reloaded to resume work.

Special considerations on saving and restoring workspaces

  • Versions - Workspaces in general may not be compatible across different versions of geWorkbench.
  • Loaded components - The configuration in the CCM of which components are loaded and which are not is not saved when the workspace is saved; it is maintained separately.
  • Changes to loaded components - If a workspace is saved, and then changes are made to which components are loaded in the CCM, then in rare cases problems may occur when one attempts to reload the saved workspace.


File Save Workspace.png

Opening a Saved Workspace

File->Open-Workspace.

Only one workspace at a time can be loaded in geWorkbench. Opening a saved workspace will destroy the existing workspace. For this reason, if you opt to open a workspace, you will be prompted as to whether to save the existing workspace first.


File Open Workspace.png


A dialog box will appear in which the location and file name to which to save the workspace can be chosen.


File Save Workspace Dialog.png

Creating a New Workspace

Only one workspace at a time can be loaded in geWorkbench. Creating a new workspace will destroy the existing workspace. For this reason, if you opt to create a new workspace, you will be prompted as to whether to save the existing workspace first.

A new workspace can only be created from the top level menu bar.


File New Workspace.png


Select File->New->Workspace.