Difference between revisions of "Tutorials"
Line 1: | Line 1: | ||
− | I. Getting Started | + | ===I. Getting Started=== |
With geWorkbench you can work with both mircoarray gene expression data and with gene or protein sequences. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analyis. Furthermore, genomic sequences around markers of interest found in microarray experiements can be easily retrieved and, for example, used for promoter/TF analysis. | With geWorkbench you can work with both mircoarray gene expression data and with gene or protein sequences. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analyis. Furthermore, genomic sequences around markers of interest found in microarray experiements can be easily retrieved and, for example, used for promoter/TF analysis. | ||
Line 15: | Line 15: | ||
+ | Right-click on the '''Workspace''' entry in the '''Project Folders''' window at upper left to create a new project. | ||
+ | [[Image:T_NewProject.png]] | ||
− | [[Image: | + | Next, right-click on the new project entry and select '''Open Files'''. |
+ | |||
+ | [[Image:T_OpenFiles.png]] | ||
+ | |||
− | + | Here we will select 10 MAS5 format text files from the directory geworkbench\data\training\cardiogenomics.med.harvard.edu: | |
[[Image:T_SelectMAS5.png]] | [[Image:T_SelectMAS5.png]] | ||
+ | |||
+ | |||
+ | The chip type HG_U95 is recognized... | ||
[[Image:T_Chip_type_message.png]] | [[Image:T_Chip_type_message.png]] | ||
+ | |||
+ | |||
+ | The read in data is displayed in the '''Microarray Panel'''. Note we have increased the instensity slider to maximum here. | ||
[[Image:T_MAS5_display.png]] | [[Image:T_MAS5_display.png]] | ||
+ | |||
+ | |||
+ | We can now assign phenotypes to each chip. We will place the phenotypes in the default group, however you can create new phenotype groups by pushing the '''New''' button on the '''Phenotype Panel''' at lower left. | ||
[[Image:T_PanelLabel.png]] | [[Image:T_PanelLabel.png]] | ||
+ | |||
+ | |||
+ | Here we select the arrays which contain samples from the congestive cardiomyopathy disease state... | ||
[[Image:T_PanelLabelCardio.png]] | [[Image:T_PanelLabelCardio.png]] | ||
+ | |||
+ | |||
+ | After similarly labeling the remaining arrays as "Normal".... | ||
[[Image:T_PhenotypesPriorToCase.png]] | [[Image:T_PhenotypesPriorToCase.png]] | ||
+ | |||
+ | We will now specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control. The type is set by left-clicking on the thumb-tack in front of the phenotype name... | ||
[[Image:T_PhenotypeSettingCase.png]] | [[Image:T_PhenotypeSettingCase.png]] | ||
+ | A red thumbtack indicates the arrays have been specified as "Case". | ||
[[Image:T_PhenotypeCaseSet.png]] | [[Image:T_PhenotypeCaseSet.png]] | ||
+ | |||
+ | |||
+ | We can also rename the merged dataset by clicking on its entry in the '''Project Panel'''. | ||
[[Image:T_RenameDataset.png]] | [[Image:T_RenameDataset.png]] | ||
+ | |||
+ | |||
+ | Here we will call it CCMP. | ||
[[Image:T_RenamingDataset.png]] | [[Image:T_RenamingDataset.png]] | ||
+ | |||
+ | |||
+ | With the datasets merged, classified and named, we can save the dataset for future use. We will call it cardiomyopathy.exp (.exp is the default extension for the geWorkbench matrix file type). | ||
+ | |||
+ | [[Image:T_SaveProject.png]] | ||
+ | |||
+ | |||
+ | |||
+ | The default display of microarray data is an absolute display. We can change it to a relative display by selecting Tools:Preferences from the top menubar. We have removed the dataset so that we can read it back in using the new preferences. | ||
[[Image:T_ChangePrefs.png]] | [[Image:T_ChangePrefs.png]] | ||
+ | |||
+ | |||
+ | Here we select one of the several types of relative display available.... | ||
[[Image:T_ChangePrefsToRelative.png]] | [[Image:T_ChangePrefsToRelative.png]] | ||
+ | |||
+ | |||
+ | Returning to the Open File dialog as we before by right-clicking on the project entry, we will select the cardiomyopathy.exp file we previously saved... | ||
[[Image:T_OpenCardio.png]] | [[Image:T_OpenCardio.png]] | ||
+ | |||
+ | |||
+ | Resulting in the following colorful display of the array data for the first array.... | ||
[[Image:T_RelativeDisplay.png]] | [[Image:T_RelativeDisplay.png]] |
Revision as of 19:56, 27 January 2006
I. Getting Started
With geWorkbench you can work with both mircoarray gene expression data and with gene or protein sequences. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analyis. Furthermore, genomic sequences around markers of interest found in microarray experiements can be easily retrieved and, for example, used for promoter/TF analysis.
geWorkbench is designed from the ground up to be extensible. New modules can be programmed to interact directly with its framework, or existing code can be wrapped in a geWorkbench adaptor to allow seamless communications with the framework and other modules.
To start using geWorkbench, one must supply initial datafiles. For microarray data, several formats are currently available, including MAS5/GCOS text files, GenePix files, and a simple, geWorkbench-specific matrix format. In the next section, we will show how to read in MAS5 format files and write out a matrix file. For sequence data, fasta format files are accepted.
II. Loading Data
When first started, geWorkbench appears so:
Right-click on the Workspace entry in the Project Folders window at upper left to create a new project.
Next, right-click on the new project entry and select Open Files.
Here we will select 10 MAS5 format text files from the directory geworkbench\data\training\cardiogenomics.med.harvard.edu:
The chip type HG_U95 is recognized...
The read in data is displayed in the Microarray Panel. Note we have increased the instensity slider to maximum here.
We can now assign phenotypes to each chip. We will place the phenotypes in the default group, however you can create new phenotype groups by pushing the New button on the Phenotype Panel at lower left.
Here we select the arrays which contain samples from the congestive cardiomyopathy disease state...
After similarly labeling the remaining arrays as "Normal"....
We will now specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control. The type is set by left-clicking on the thumb-tack in front of the phenotype name...
A red thumbtack indicates the arrays have been specified as "Case".
We can also rename the merged dataset by clicking on its entry in the Project Panel.
Here we will call it CCMP.
With the datasets merged, classified and named, we can save the dataset for future use. We will call it cardiomyopathy.exp (.exp is the default extension for the geWorkbench matrix file type).
The default display of microarray data is an absolute display. We can change it to a relative display by selecting Tools:Preferences from the top menubar. We have removed the dataset so that we can read it back in using the new preferences.
Here we select one of the several types of relative display available....
Returning to the Open File dialog as we before by right-clicking on the project entry, we will select the cardiomyopathy.exp file we previously saved...
Resulting in the following colorful display of the array data for the first array....