geWorkbench - User contributions [en]

Workspace

2006-05-02T15:34:06Z

Daly: /* Supported data formats */

{{TutorialsTopNav}}

==Outline==
In this tutorial, you will learn how to:

*Create a new Project.
*Load microarray data.
*Merge data from several loaded microarray experiments.
*Rename a project and/or project node.
*Remove a project and/or project node.
*Save project files that you have created.
*Load, add, and/or modify remote data.

==Supported data formats==
*Microarray
**Affymetrix MAS5/GCOS Files.
**Affymetrix File Matrix - this is the native file type created by geWorkbench.
**RMA Express File - RMA Express is a sophisticated tool for combining data from multiple Affymetrix chips.
**Affy Excel or txt data file.
**Genepix Files - An analysis program for two color arrays.
*Other
**FASTA Files. DNA or protein sequence files in FASTA format.
**Pattern Files.
**Genotypic data Files.

==Loading data files into a project==

In this example, we will load 10 individual Affymetrix MAS5 format files, and merge them into a single dataset.

All data must belong to a project. Right-click on the '''Workspace''' entry in the '''Project Folders''' window at upper left to create a new project.

[[Image:T_NewProject.png]]

Next, right-click on the '''New Project''' entry and select '''Open Files'''.

[[Image:T_OpenFiles.png]]

Here, we will select file type '''Affymetrix GCOS/MAS5''' as shown.

Make sure to check the '''Merge files''' checkbox.

We select 10 MAS5 format text files from the directory geworkbench\data\training\cardiogenomics.med.harvard.edu, which is included in the geWorkbench download.

Click '''Open'''.

[[Image:T_OpenFile_CardioMerge.png]]

The chip type HG_U95Av2 is recognized...

[[Image:T_OpenFile_ChipRecog.png]]

The merged dataset is listed in the Project folder. The data is displayed, in single array format, in the '''Microarray Viewer'''. Note we have increased the intensity slider to maximum here.

[[Image:T_FullApp_MergedData.png]]

==Merging microarray datafiles after they have already been loaded.==

If Affymetrix data files are not merged at the time they are read in, they can also be merged later, as long as they are from the same chip type.

'''1.''' Select the read-in data files that you want to merge.

'''2.''' Click on '''File''' in the menu bar, and choose '''Merge Datasets'''.

The picture shows the resulting merged dataset created from several individual data files.

[[Image:T_ProjectFolder_MergeIndivid.png]]

The result is a new data node containing the merged data. The original data nodes are still present.

[[Image:T_ProjectFolder_IndividMerged.png]]

==Renaming a project or a data node==

===Renaming a project===

'''1.''' Right-click on '''Project''' folder.

'''2.''' Select '''Rename'''.

[[Image:T_ProjectFolder_RenameProject.png]]

'''3.''' In the pop-up screen rename your project.

'''4.''' Click on the '''Okay''' button

===Renaming a project data node===

'''1.''' Right-click on a Project Folder data node.

'''2.''' Select '''Rename'''.

[[Image:T_ProjectFolder_RenameDataset.png]]

'''3.''' In the pop-up screen rename your data node.

[[Image:T_ProjectFolder_RenameDataset2.png]]

'''4.''' Click on the '''Okay''' button.

==Removing a project or a data node==

===Removing a project===

'''1.''' Right-click on '''Project''' folder.

'''2.''' Select '''Remove'''.

===Removing a project data node===

'''1.''' Right-click on the data node.

'''2.''' Select '''Remove'''.

==Saving a data node to a file==

'''1.''' Right-click on data nodes that you want to save.

'''2.''' Click '''Save'''.

[[Image:T_ProjectFolder_SaveNode.png]]

A standard file '''Save''' screen will come up.

'''3.''' Choose a location.

'''4.''' Enter a name.

'''5.''' Click on the '''Save''' button.

==Working with remote data sources==

===The remote Open File dialog===
geWorkbench can retrieve data from certain remote data sources, for example instances of the NCI's caArray database. The Open File dialog allows remote sources to be added to the list of those available either manually or through discovery using grid services. Entries (locations, parameters) for non-grid services can be edited.

As before, right-click on '''Project''' which will bring up the '''Open File''' dialog. Click the '''Remote''' radio button. The '''Open File''' dialog window will be expanded to include remote sources.

[[Image:(T)MEditRemoteData.png]]

Four additional buttons appear. They are:

'''caArray''' button - Gives you a listing of your Remote Resources.

'''Go''' button - Accesses the Remote Source that you selected.

'''Add A New Resource''' button - Opens the Data Source Definition Page used to add Remote Data.

'''Edit''' button - Edits Remote Source Parameters.

===Loading data from a remote instance of caArray===

Click on the Go button next to the caArray data source at the bottom of the dialog. All available caArray experiments will be displayed.

[[Image:T_ProjectFolder_caArrayExpts.png]]

Select an experiment that has bioassays. Here we depict the experiment ending in *99049. The number of derived bioassays, 12, is displayed, along with the experiment information. (A new dataset, "Public Rembrandt" has subsequently been added, which would also be good to use for experimenting with caArray data download. It has 53 bioassays available).

To retrieve the bioassays themselves, right click on the experiment and press '''Get bioassays'''. This will download the list of available bioassays into geWorkbench.

[[Image:T_ProjectFolder_GetRemoteBioassays.png]]

To actually retrieve bioassay data, select the desired arrays and push the '''Open''' button. (Although below we show retrieving multiple array datasets, you might want to first select just one, as each can take several minutes to download).

[[Image:T_ProjectFolder_OpenRemoteBioassays.png]]

===To add a remote source===

'''1.''' Click on the '''Add A New Resource''' button.

[[Image:(T)MRemoteData2.png]]
This is the Data Source Definition Page

'''2.''' Fill in the Data Source definition page. URL and Short Name are required fields.

'''3.''' Click on the OK button.

The configuration is set up to automatically reflect your additional Data Source.

===To modify a remote source===

The specification of the remote resource can be edited.

'''1.''' Click on the '''Edit''' button at the bottom of the '''Open File''' dialog.

'''2.''' Make the changes that you need.

'''3.''' Click on the '''OK''' button

[[Image:T_ProjectPanel_EditRemote.png]]

User:Daly

2006-04-03T16:21:36Z

Daly:

= Overview=

=header=
geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image:E_panel.png]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In this tutorial, you will:

* Map markers to Gene Ontology category definitions

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters is described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. The mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell (1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provide an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

Overview

2006-03-23T18:26:58Z

Daly: /* Introduction */

==Introduction==

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extensible collection of tools for the management, analysis, visualization and annotation of biomedical data. Many kinds of analysis are supported - for microarrays, there are filtering and normalization, basic statistical analyses, clustering, network reverse engineering, as well as many common visualization tools. For sequence data there are routines such as BLAST, pattern detection, transcription factor mapping, and syntenic region analysis. Furthermore, genomic sequences around markers of interest found in microarray experiments can be easily retrieved and, for example, used for promoter/transcription factor analysis.

Specific types of data supported include:

*Microarray Gene Expression
**Affymetrix GCOS/MAS5
**Matrix format (geWorkbench)
**RMAExpress
**GenePix
*DNA and Protein Sequences
**FASTA
*Pathways
**BioCarta
*Patterns
**Regular Expressions
*Gene Ontology
*Networks

Most importantly, geWorkbench provides an environment which supports moving from one data type to another in a seamless fashion, e.g. from gene expression to sequences to patterns.

[[Image:slide1.gif]]

==Developing for geWorkbench==

geWorkbench has been designed using a plug-in framework which allows new modules to be developed with relative ease. A repository will be maintained for community-developed modules. Developers can take advantage of all the existing capabilities for data management and visualization, and thus concentrate development efforts on the more important, novel aspects of their project.

==geWorkbench as an interface to external data and computational resources==

geWorkbench provides access to a variety of external data sources, including:
*Microarray gene expression repositories (caArray)
*Gene annotation pages (via CGAP)
*DNA sequence retrieval
*Pathway diagrams (BioCarta)

geWorkbench also provides a gateway to several computational services currently hosted on Columbia servers and clusters, including:
*BLAST
*Pattern Discovery
*Synteny

==Basic Layout of the Graphical User Interface==

The graphical user interface for geWorkbench is divided into four major sections, for

1. Projects - Data management (upper left)

2. Marker and Array/Phenotype set selection and management (lower left)

3. Visualization tools (upper right)

4. Analytical tools (lower right)

[[Image:(T)MGettingStarted.png]]

The Data Management area can hold one workspace, and a workspace in turn can hold one or more projects. Projects can be used as wished to group different data sets. Each opened data file or analysis result is stored in a project. A workspace and all the data it contains can be saved and returned to later.

The GUI provides a menu bar at top with a standard choice of commands. Many commands that are available in the menu bar are also available by right-clicking on data objects.

Plugins

2006-03-09T15:45:35Z

Daly: /* Microarray Visualization */

The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of fucntionality for a number of different genomic data modalities.

__TOC__

==Microarray Visualization==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Color Mosaic||Heat maps for microarray expression data, organized by phenotypic or gene groupings ([[media:Color-mosaic.png|screenshot]]).
|-
|-

|width="150"|Dendrogram||Tree-structured diagrams reflecting the results of hierarchical clustering analysis ([[media:Dendrogram.png|screenshot]]).
|-
|-
|width="150"|Expression Profiles||Line graph of genes expression profiles across several arrays/ hybridizations ([[media:Expression-profile.png|screenshot]]).
|-
|-
|width="150"|Expression Value Distribution||Distribution plot of marker expression values across one or more microarrays.
|-
|-

|width="150"|Microarray Viewer||Color-gradient representation of gene expression values ([[media:Microarray-panel.png|screenshot]]).
|-
|-

|width="150"|Scatter Plot || Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values ([[media:Scatterplot.png|screenshot]]).
|-
|-

|width="150"|SOM Clusters Viewer ||Visualization of gene clusters produced by the self-organizing maps analysis ([[media:Somcluster.png|screenshot]]).
|-
|-

|width="150"|Tabular Microarray Viewer|| Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray ([[media:Tabular.png|screenshot]]).
|-
|}

==Data Management==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Marker Component ||Definition of data views consisting of marker subgroups. The views control the amount of data displayed.
|-
|-
|width="150"|Phenotype/Array Component ||Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.
|-
|-
|}

==Normalizers==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Array-Based Centering||Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.
|-
|-
|width="150"|Marker-Based Centering||Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.
|-
|-
|width="150"|Mean-Variance Normalizer||Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.
|-
|-
|width="150"|Missing Value Calculation||Replacement of missing values with consensus values.
|-
|-
|width="150"|Threshold Normalizer ||Adjustment of values that fall outside a user-specified threshold.
|-
|width="150"|Quantile || Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.
|-
|-
|width="150"|Housekeeping || Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.
|-
|}

== Filters==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Affy Detection Call||(''Affymetrix data only)'' Filtering of measurements based on the value of their "detection call" attribute.
|-
|-
|width="150"|Deviation|| Filtering of markers with low dynamic range.
|-
|-
|width="150"|Expression Threshold|| Elimination of measurements that fall outside a range of explression values.
|-
|-
|width="150"|2-channel Threshold||(''Genepix data only)'' Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.
|-
|-
|width="150"|Genepix Flag Filter|| ''(Genepix data only)'' Filtering of measurements based on the value of their "Flags" attribute.
|-
|}

==Annotation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Dataset History||Log of data transformations induced by data-modifying operations.
|-
|-
|width="150"|Dataset Annotation||Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online "lab notebook".
|-
|-
|width="150"|Experiment Information||Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of of experiment) are also displayed.
|-
|-
|width="150"|Marker Annotations||Retrieval of gene and pathway information for markers on a microarray.
|-
|-
|width="150"|caBIO Pathway Listing||Visualization of BioCarta pathway diagrams ([[media:Cabiopathway.png|screenshot]]).
|-
|width="150"|Gene Ontology||Enrichment analysis of selected groups of genes against Gene Ontology (http://www.geneontology.org) annotations ([[media:Go_Terms_Panel.png|screenshot]]).
|-
|-
|}

==Network Generation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|ARACNE Reverse Engineering|| Analysis of large amount of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks ([[media:Reverseengineering.png|screenshot]]).
|-
|-
|-
|width="150"|Cytoscape||Visualization of gene regulatory network created in Reverse Engineering using [http://www.cytoscape.org/ Cytoscape 2.0]([[media:Cytoscape.png|screenshot]]).
|-
|}

==Analysis==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Hierarchical Clustering||Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plugin.
|-
|-
|width="150"|Self Organizing Map (SOM)|| Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plugin.
|-
|-
|width="150"|T Test||Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance ([[media:Volcanoplot.png|screenshot]]).
|-

|}

== Sequence Analysis & Visualization ==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Sequence Alignment||Server-based versions of BLAST and Smith-Waterman alignment ([[media:Synteny Blastresults.png|screenshot]]).
|-
|-
|width="150"|Synteny|| Comparison of sequence similarity between two genomic regions. The comparison results are represented as a dot matrix augmented with detailed annotation for both regions ([[media:Synteny Dotmatrix.png|screenshot]]).
|-
|-
|width="150"|Promoter Analysis||Identification of putative transcription factor binding sites in DNA sequences ([[media:Promoterpanel.png|screenshot]]). The analysis use the profiles in the [http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl JASPAR Transcription Factor Binding Profile Database.]
|-
|-
|width="150"|Pattern Discovery|| Discovery of sequence motifs in sets of DNA and protein sequences.
|-
|-
|width="150"|Position Histogram || Visualization of results from the Pattern Discovery plugin. Motif/pattern support is plotted against relative sequence position of the motif match ([[media:Histogram.png|screenshot]]).
|-
|-
|width="150"|Sequence Panel || Visualization of results from the Pattern Discovery plugin, displaying the motif match location over each sequence from the input data set.
|-
|}

Plugins

2006-03-09T15:44:53Z

Daly: /* Microarray Visualization */

The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of fucntionality for a number of different genomic data modalities.

__TOC__

==Microarray Visualization==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Color Mosaic||Heat maps for microarray expression data, organized by phenotypic or gene groupings ([[media:Color-mosaic.png|screenshot]]).
|-
|-

|width="150"|Dendrogram||Tree-structured diagrams reflecting the results of hierarchical clustering analysis ([[media:Dendrogram.png|screenshot]]).
|-
|-
|width="150"|Expression Profiles||Line graph of genes expression profiles across several arrays/ hybridizations ([[media:Expression-profile.png|screenshot]]).
|-
|-
|width="150"|Expression Value Distribution||Distribution plot of marker expression values across one or more microarrays.
|-
|-

|width="150"|Microarray Viewer||Color-gradient representation of gene expression values ([[media:Microarray-panel.png|screenshot]]).
|-
|-

|width="150"|Scatter Plot || Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values ([[media:Scatterplot.png|screenshot]]).
|-
|-

|width="150"|SOM Clusters Viewer ||Visualization of gene clusters produced by the self-organizing maps analysis ([[media:Somcluster.png|screenshot]]).
|-
|-

|width="150"|Tabular Microarray Panel|| Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray ([[media:Tabular.png|screenshot]]).
|-
|}

==Data Management==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Marker Component ||Definition of data views consisting of marker subgroups. The views control the amount of data displayed.
|-
|-
|width="150"|Phenotype/Array Component ||Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.
|-
|-
|}

==Normalizers==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Array-Based Centering||Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.
|-
|-
|width="150"|Marker-Based Centering||Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.
|-
|-
|width="150"|Mean-Variance Normalizer||Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.
|-
|-
|width="150"|Missing Value Calculation||Replacement of missing values with consensus values.
|-
|-
|width="150"|Threshold Normalizer ||Adjustment of values that fall outside a user-specified threshold.
|-
|width="150"|Quantile || Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.
|-
|-
|width="150"|Housekeeping || Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.
|-
|}

== Filters==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Affy Detection Call||(''Affymetrix data only)'' Filtering of measurements based on the value of their "detection call" attribute.
|-
|-
|width="150"|Deviation|| Filtering of markers with low dynamic range.
|-
|-
|width="150"|Expression Threshold|| Elimination of measurements that fall outside a range of explression values.
|-
|-
|width="150"|2-channel Threshold||(''Genepix data only)'' Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.
|-
|-
|width="150"|Genepix Flag Filter|| ''(Genepix data only)'' Filtering of measurements based on the value of their "Flags" attribute.
|-
|}

==Annotation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Dataset History||Log of data transformations induced by data-modifying operations.
|-
|-
|width="150"|Dataset Annotation||Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online "lab notebook".
|-
|-
|width="150"|Experiment Information||Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of of experiment) are also displayed.
|-
|-
|width="150"|Marker Annotations||Retrieval of gene and pathway information for markers on a microarray.
|-
|-
|width="150"|caBIO Pathway Listing||Visualization of BioCarta pathway diagrams ([[media:Cabiopathway.png|screenshot]]).
|-
|width="150"|Gene Ontology||Enrichment analysis of selected groups of genes against Gene Ontology (http://www.geneontology.org) annotations ([[media:Go_Terms_Panel.png|screenshot]]).
|-
|-
|}

==Network Generation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|ARACNE Reverse Engineering|| Analysis of large amount of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks ([[media:Reverseengineering.png|screenshot]]).
|-
|-
|-
|width="150"|Cytoscape||Visualization of gene regulatory network created in Reverse Engineering using [http://www.cytoscape.org/ Cytoscape 2.0]([[media:Cytoscape.png|screenshot]]).
|-
|}

==Analysis==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Hierarchical Clustering||Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plugin.
|-
|-
|width="150"|Self Organizing Map (SOM)|| Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plugin.
|-
|-
|width="150"|T Test||Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance ([[media:Volcanoplot.png|screenshot]]).
|-

|}

== Sequence Analysis & Visualization ==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Sequence Alignment||Server-based versions of BLAST and Smith-Waterman alignment ([[media:Synteny Blastresults.png|screenshot]]).
|-
|-
|width="150"|Synteny|| Comparison of sequence similarity between two genomic regions. The comparison results are represented as a dot matrix augmented with detailed annotation for both regions ([[media:Synteny Dotmatrix.png|screenshot]]).
|-
|-
|width="150"|Promoter Analysis||Identification of putative transcription factor binding sites in DNA sequences ([[media:Promoterpanel.png|screenshot]]). The analysis use the profiles in the [http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl JASPAR Transcription Factor Binding Profile Database.]
|-
|-
|width="150"|Pattern Discovery|| Discovery of sequence motifs in sets of DNA and protein sequences.
|-
|-
|width="150"|Position Histogram || Visualization of results from the Pattern Discovery plugin. Motif/pattern support is plotted against relative sequence position of the motif match ([[media:Histogram.png|screenshot]]).
|-
|-
|width="150"|Sequence Panel || Visualization of results from the Pattern Discovery plugin, displaying the motif match location over each sequence from the input data set.
|-
|}

Plugins

2006-03-06T20:21:03Z

Daly: /* Sequence Analysis & Visualization */

The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of fucntionality for a number of different genomic data modalities.

__TOC__

==Microarray Visualization==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Color Mosaic||Heat maps for microarray expression data, organized by phenotypic or gene groupings ([[media:Color-mosaic.png|screenshot]]).
|-
|-

|width="150"|Dendrogram||Tree-structured diagrams reflecting the results of hierarchical clustering analysis ([[media:Dendrogram.png|screenshot]]).
|-
|-
|width="150"|Expression Profiles||Line graph of genes expression profiles across several arrays/ hybridizations ([[media:Expression-profile.png|screenshot]]).
|-
|-
|width="150"|Expression Value Distribution||Distribution plot of marker expression values across one or more microarrays.
|-
|-

|width="150"|Microarray Panel||Color-gradient representation of gene expression values ([[media:Microarray-panel.png|screenshot]]).
|-
|-

|width="150"|Scatter Plot || Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values ([[media:Scatterplot.png|screenshot]]).
|-
|-

|width="150"|SOM Clusters Viewer ||Visualization of gene clusters produced by the self-organizing maps analysis ([[media:Somcluster.png|screenshot]]).
|-
|-

|width="150"|Tabular Microarray Panel|| Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray ([[media:Tabular.png|screenshot]]).
|-
|}

==Data Management==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Marker Component ||Definition of data views consisting of marker subgroups. The views control the amount of data displayed.
|-
|-
|width="150"|Phenotype/Array Component ||Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.
|-
|-
|}

==Normalizers==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Array-Based Centering||Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.
|-
|-
|width="150"|Marker-Based Centering||Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.
|-
|-
|width="150"|Mean-Variance Normalizer||Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.
|-
|-
|width="150"|Missing Value Calculation||Replacement of missing values with consensus values.
|-
|-
|width="150"|Threshold Normalizer ||Adjustment of values that fall outside a user-specified threshold.
|-
|width="150"|Quantile || Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.
|-
|-
|width="150"|Housekeeping || Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.
|-
|}

== Filters==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Affy Detection Call||(''Affymetrix data only)'' Filtering of measurements based on the value of their "detection call" attribute.
|-
|-
|width="150"|Deviation|| Filtering of markers with low dynamic range.
|-
|-
|width="150"|Expression Threshold|| Elimination of measurements that fall outside a range of explression values.
|-
|-
|width="150"|2-channel Threshold||(''Genepix data only)'' Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.
|-
|-
|width="150"|Genepix Flag Filter|| ''(Genepix data only)'' Filtering of measurements based on the value of their "Flags" attribute.
|-
|}

==Annotation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Dataset History||Log of data transformations induced by data-modifying operations.
|-
|-
|width="150"|Dataset Annotation||Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online "lab notebook".
|-
|-
|width="150"|Experiment Information||Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of of experiment) are also displayed.
|-
|-
|width="150"|Marker Annotations||Retrieval of gene and pathway information for markers on a microarray.
|-
|-
|width="150"|caBIO Pathway Listing||Visualization of BioCarta pathway diagrams ([[media:Cabiopathway.png|screenshot]]).
|-
|width="150"|Gene Ontology||Enrichment analysis of selected groups of genes against Gene Ontology (http://www.geneontology.org) annotations ([[media:Go_Terms_Panel.png|screenshot]]).
|-
|-
|}

==Network Generation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|ARACNE Reverse Engineering|| Analysis of large amount of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks ([[media:Reverseengineering.png|screenshot]]).
|-
|-
|-
|width="150"|Cytoscape||Visualization of gene regulatory network created in Reverse Engineering using [http://www.cytoscape.org/ Cytoscape 2.0]([[media:Cytoscape.png|screenshot]]).
|-
|}

==Analysis==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Hierarchical Clustering||Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plugin.
|-
|-
|width="150"|Self Organizing Map (SOM)|| Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plugin.
|-
|-
|width="150"|T Test||Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance ([[media:Volcanoplot.png|screenshot]]).
|-

|}

== Sequence Analysis & Visualization ==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Sequence Alignment||Server-based versions of BLAST and Smith-Waterman alignment ([[media:Synteny Blastresults.png|screenshot]]).
|-
|-
|width="150"|Synteny|| Comparison of sequence similarity between two genomic regions. The comparison results are represented as a dot matrix augmented with detailed annotation for both regions ([[media:Synteny Dotmatrix.png|screenshot]]).
|-
|-
|width="150"|Promoter Analysis||Identification of putative transcription factor binding sites in DNA sequences ([[media:Promoterpanel.png|screenshot]]). The analysis use the profiles in the [http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl JASPAR Transcription Factor Binding Profile Database.]
|-
|-
|width="150"|Pattern Discovery|| Discovery of sequence motifs in sets of DNA and protein sequences.
|-
|-
|width="150"|Position Histogram || Visualization of results from the Pattern Discovery plugin. Motif/pattern support is plotted against relative sequence position of the motif match ([[media:Histogram.png|screenshot]]).
|-
|-
|width="150"|Sequence Panel || Visualization of results from the Pattern Discovery plugin, displaying the motif match location over each sequence from the input data set.
|-
|}

Plugins

2006-03-06T17:04:18Z

Daly: /* Data Management */

The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of fucntionality for a number of different genomic data modalities.

__TOC__

==Microarray Visualization==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Color Mosaic||Heat maps for microarray expression data, organized by phenotypic or gene groupings ([[media:Color-mosaic.png|screenshot]]).
|-
|-

|width="150"|Dendrogram||Tree-structured diagrams reflecting the results of hierarchical clustering analysis ([[media:Dendrogram.png|screenshot]]).
|-
|-
|width="150"|Expression Profiles||Line graph of genes expression profiles across several arrays/ hybridizations ([[media:Expression-profile.png|screenshot]]).
|-
|-
|width="150"|Expression Value Distribution||Distribution plot of marker expression values across one or more microarrays.
|-
|-

|width="150"|Microarray Panel||Color-gradient representation of gene expression values ([[media:Microarray-panel.png|screenshot]]).
|-
|-

|width="150"|Scatter Plot || Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values ([[media:Scatterplot.png|screenshot]]).
|-
|-

|width="150"|SOM Clusters Viewer ||Visualization of gene clusters produced by the self-organizing maps analysis ([[media:Somcluster.png|screenshot]]).
|-
|-

|width="150"|Tabular Microarray Panel|| Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray ([[media:Tabular.png|screenshot]]).
|-
|}

==Data Management==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Marker Component ||Definition of data views consisting of marker subgroups. The views control the amount of data displayed.
|-
|-
|width="150"|Phenotype/Array Component ||Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.
|-
|-
|}

==Normalizers==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Array-Based Centering||Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.
|-
|-
|width="150"|Marker-Based Centering||Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.
|-
|-
|width="150"|Mean-Variance Normalizer||Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.
|-
|-
|width="150"|Missing Value Calculation||Replacement of missing values with consensus values.
|-
|-
|width="150"|Threshold Normalizer ||Adjustment of values that fall outside a user-specified threshold.
|-
|width="150"|Quantile || Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.
|-
|-
|width="150"|Housekeeping || Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.
|-
|}

== Filters==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Affy Detection Call||(''Affymetrix data only)'' Filtering of measurements based on the value of their "detection call" attribute.
|-
|-
|width="150"|Deviation|| Filtering of markers with low dynamic range.
|-
|-
|width="150"|Expression Threshold|| Elimination of measurements that fall outside a range of explression values.
|-
|-
|width="150"|2-channel Threshold||(''Genepix data only)'' Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.
|-
|-
|width="150"|Genepix Flag Filter|| ''(Genepix data only)'' Filtering of measurements based on the value of their "Flags" attribute.
|-
|}

==Annotation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Dataset History||Log of data transformations induced by data-modifying operations.
|-
|-
|width="150"|Dataset Annotation||Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online "lab notebook".
|-
|-
|width="150"|Experiment Information||Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of of experiment) are also displayed.
|-
|-
|width="150"|Marker Annotations||Retrieval of gene and pathway information for markers on a microarray.
|-
|-
|width="150"|caBIO Pathway Listing||Visualization of BioCarta pathway diagrams ([[media:Cabiopathway.png|screenshot]]).
|-
|width="150"|Gene Ontology||Enrichment analysis of selected groups of genes against Gene Ontology (http://www.geneontology.org) annotations ([[media:Go_Terms_Panel.png|screenshot]]).
|-
|-
|}

==Network Generation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|ARACNE Reverse Engineering|| Analysis of large amount of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks ([[media:Reverseengineering.png|screenshot]]).
|-
|-
|-
|width="150"|Cytoscape||Visualization of gene regulatory network created in Reverse Engineering using [http://www.cytoscape.org/ Cytoscape 2.0]([[media:Cytoscape.png|screenshot]]).
|-
|}

==Analysis==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Hierarchical Clustering||Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plugin.
|-
|-
|width="150"|Self Organizing Map (SOM)|| Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plugin.
|-
|-
|width="150"|T Test||Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance ([[media:Volcanoplot.png|screenshot]]).
|-

|}

== Sequence Analysis & Visualization ==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Sequence Alignment||Server-based versions of BLAST and Smith-Waterman alignment ([[media:Synteny Blastresults.png|screenshot]]).
|-
|-
|width="150"|Synteny|| Comparison of sequence similarity between two genomic regions. The comparison results are represented as a dot matrix augmented with detailed annotation for both regions ([[media:Synteny Dotmatrix.png|screenshot]]).
|-
|-
|width="150"|Promoter Analysis||Identification of putative transcription factor binding sites in DNA sequences ([[media:Promoterpanel.png|screenshot]]). The analysis use the profiles in the [http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl JASPAR Transcription Factor Binding Profile Database.]
|-
|-
|width="150"|Pattern Discovey || Discovery of sequence motifs in sets of DNA and protein sequences.
|-
|-
|width="150"|Position Histogram || Visualization of results from the Pattern Discovery plugin. Motif/pattern support is plotted against relative sequence position of the motif match ([[media:Histogram.png|screenshot]]).
|-
|-
|width="150"|Sequence Panel || Visualization of results from the Pattern Discovery plugin, displaying the motif match location over each sequence from the input data set.
|-
|}

Plugins

2006-03-06T17:03:30Z

Daly: /* Annotation */

The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of fucntionality for a number of different genomic data modalities.

__TOC__

==Microarray Visualization==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Color Mosaic||Heat maps for microarray expression data, organized by phenotypic or gene groupings ([[media:Color-mosaic.png|screenshot]]).
|-
|-

|width="150"|Dendrogram||Tree-structured diagrams reflecting the results of hierarchical clustering analysis ([[media:Dendrogram.png|screenshot]]).
|-
|-
|width="150"|Expression Profiles||Line graph of genes expression profiles across several arrays/ hybridizations ([[media:Expression-profile.png|screenshot]]).
|-
|-
|width="150"|Expression Value Distribution||Distribution plot of marker expression values across one or more microarrays.
|-
|-

|width="150"|Microarray Panel||Color-gradient representation of gene expression values ([[media:Microarray-panel.png|screenshot]]).
|-
|-

|width="150"|Scatter Plot || Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values ([[media:Scatterplot.png|screenshot]]).
|-
|-

|width="150"|SOM Clusters Viewer ||Visualization of gene clusters produced by the self-organizing maps analysis ([[media:Somcluster.png|screenshot]]).
|-
|-

|width="150"|Tabular Microarray Panel|| Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray ([[media:Tabular.png|screenshot]]).
|-
|}

==Data Management==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Gene Panel ||Definition of data views consisting of marker subgroups. The views control the amount of data displayed.
|-
|-
|width="150"|Phenotype Panel ||Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.
|-
|-
|}

==Normalizers==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Array-Based Centering||Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.
|-
|-
|width="150"|Marker-Based Centering||Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.
|-
|-
|width="150"|Mean-Variance Normalizer||Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.
|-
|-
|width="150"|Missing Value Calculation||Replacement of missing values with consensus values.
|-
|-
|width="150"|Threshold Normalizer ||Adjustment of values that fall outside a user-specified threshold.
|-
|width="150"|Quantile || Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.
|-
|-
|width="150"|Housekeeping || Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.
|-
|}

== Filters==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Affy Detection Call||(''Affymetrix data only)'' Filtering of measurements based on the value of their "detection call" attribute.
|-
|-
|width="150"|Deviation|| Filtering of markers with low dynamic range.
|-
|-
|width="150"|Expression Threshold|| Elimination of measurements that fall outside a range of explression values.
|-
|-
|width="150"|2-channel Threshold||(''Genepix data only)'' Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.
|-
|-
|width="150"|Genepix Flag Filter|| ''(Genepix data only)'' Filtering of measurements based on the value of their "Flags" attribute.
|-
|}

==Annotation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Dataset History||Log of data transformations induced by data-modifying operations.
|-
|-
|width="150"|Dataset Annotation||Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online "lab notebook".
|-
|-
|width="150"|Experiment Information||Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of of experiment) are also displayed.
|-
|-
|width="150"|Marker Annotations||Retrieval of gene and pathway information for markers on a microarray.
|-
|-
|width="150"|caBIO Pathway Listing||Visualization of BioCarta pathway diagrams ([[media:Cabiopathway.png|screenshot]]).
|-
|width="150"|Gene Ontology||Enrichment analysis of selected groups of genes against Gene Ontology (http://www.geneontology.org) annotations ([[media:Go_Terms_Panel.png|screenshot]]).
|-
|-
|}

==Network Generation==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|ARACNE Reverse Engineering|| Analysis of large amount of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks ([[media:Reverseengineering.png|screenshot]]).
|-
|-
|-
|width="150"|Cytoscape||Visualization of gene regulatory network created in Reverse Engineering using [http://www.cytoscape.org/ Cytoscape 2.0]([[media:Cytoscape.png|screenshot]]).
|-
|}

==Analysis==

{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|width="150"|Hierarchical Clustering||Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plugin.
|-
|-
|width="150"|Self Organizing Map (SOM)|| Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plugin.
|-
|-
|width="150"|T Test||Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance ([[media:Volcanoplot.png|screenshot]]).
|-

|}

== Sequence Analysis & Visualization ==
{|style="border: 1px solid lightGray"
!Plugin||Description
|-
|-
|width="150"|Sequence Alignment||Server-based versions of BLAST and Smith-Waterman alignment ([[media:Synteny Blastresults.png|screenshot]]).
|-
|-
|width="150"|Synteny|| Comparison of sequence similarity between two genomic regions. The comparison results are represented as a dot matrix augmented with detailed annotation for both regions ([[media:Synteny Dotmatrix.png|screenshot]]).
|-
|-
|width="150"|Promoter Analysis||Identification of putative transcription factor binding sites in DNA sequences ([[media:Promoterpanel.png|screenshot]]). The analysis use the profiles in the [http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl JASPAR Transcription Factor Binding Profile Database.]
|-
|-
|width="150"|Pattern Discovey || Discovery of sequence motifs in sets of DNA and protein sequences.
|-
|-
|width="150"|Position Histogram || Visualization of results from the Pattern Discovery plugin. Motif/pattern support is plotted against relative sequence position of the motif match ([[media:Histogram.png|screenshot]]).
|-
|-
|width="150"|Sequence Panel || Visualization of results from the Pattern Discovery plugin, displaying the motif match location over each sequence from the input data set.
|-
|}

File:E deac rightclick.png

2006-02-28T17:46:13Z

Daly:

File:E deactive.png

2006-02-28T17:44:04Z

Daly:

File:E set describ.png

2006-02-28T16:41:17Z

Daly:

File:E savepanel.png

2006-02-28T16:39:42Z

Daly:

File:E renamepanel.png

2006-02-28T16:39:07Z

Daly:

File:E classifypanel.png

2006-02-28T16:38:05Z

Daly:

User:Daly

2006-02-24T20:47:32Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image:E_panel.png]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In this tutorial, you will:

* Map markers to Gene Ontology category definitions

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters is described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. The mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell (1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provide an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T20:47:09Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image:E_panel.png]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In this tutorial, you will:

* Map markers to Gene Ontology category definitions

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters is described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. The mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provide an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T20:01:00Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image:E_panel.png]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Map markers to Gene Ontology category definitions

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters are described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E panel.png

2006-02-24T20:00:20Z

Daly:

User:Daly

2006-02-24T19:41:31Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Map markers to Gene Ontology category definitions

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters are described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:33:08Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters are described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at

For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:23:18Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium. A detailed description of the Gene Ontology parameters are described in online help.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:20:27Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings ( pvalue metric, Min#)||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:19:22Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ''' Recompute: ''' This re-computes the statistics based on the current widget settings||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the Step Size.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:16:15Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| ||

'''Plot:''' The ‘Plot’ will draw the plot based on selections in the Tree View and widget settings in the ‘P-Value setting’ tab.

|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:12:47Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| insert ||'''Slider:''' The slider on the right is a log-scaled slider that specifies a minimum P-Value for every profile to have for it to be shown in the display

'''Step-size:''' The ‘Step Size’ widget allows for setting the number of genes in increments to be used for the P-Value plot (the data points on the x-axis to be used). This allows for smoothing of the curve especially for very large gene list sizes.

'''Plot:''' The ‘Plot’ will redraw the plot based on selections in the Tree View and widget settings in the ‘P-Value setting’ tab.

'''Save Profiles:''' The ‘Save Profile’ allows for the computed profiles to be saved to the file-system as tab-delimited values
|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:11:57Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
| insert ||Slider: The slider on the right is a log-scaled slider that specifies a minimum P-Value for every profile to have for it to be shown in the display

Step-size: The ‘Step Size’ widget allows for setting the number of genes in increments to be used for the P-Value plot (the data points on the x-axis to be used). This allows for smoothing of the curve especially for very large gene list sizes.

Plot: The ‘Plot’ will redraw the plot based on selections in the Tree View and widget settings in the ‘P-Value setting’ tab.

Save Profiles: The ‘Save Profile’ allows for the computed profiles to be saved to the file-system as tab-delimited values
|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:11:01Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
||-
| [[Image:E_gotable.png]]||Slider: The slider on the right is a log-scaled slider that specifies a minimum P-Value for every profile to have for it to be shown in the display

Step-size: The ‘Step Size’ widget allows for setting the number of genes in increments to be used for the P-Value plot (the data points on the x-axis to be used). This allows for smoothing of the curve especially for very large gene list sizes.

Plot: The ‘Plot’ will redraw the plot based on selections in the Tree View and widget settings in the ‘P-Value setting’ tab.

Save Profiles: The ‘Save Profile’ allows for the computed profiles to be saved to the file-system as tab-delimited values
|--

|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T19:00:07Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
|-
|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:59:47Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

The Table View and P Value Trends provides an alternate view of the mappings obtained in the Tree View.

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
|-
|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:58:06Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalueT.png]]
|-
|-
|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E pvalueT.png

2006-02-24T18:57:23Z

Daly:

User:Daly

2006-02-24T18:52:17Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
| [[Image:E_gotable.png]]||[[Image:E_pvalue.png]]
|-
|-
|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E gotable.png

2006-02-24T18:51:40Z

Daly:

User:Daly

2006-02-24T18:51:26Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

{|style="border: 1px solid lightGray"
!TABLE VIEW||P VALUE TREND||
|-
|-|-
|-
|5. ||[[Image:E_pvalue.png]]
|-
|-
|}

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:49:14Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

[[Image:E_pvalue.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E pvalue.png

2006-02-24T18:45:48Z

Daly:

User:Daly

2006-02-24T18:38:23Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. THe mapping is based on GO mapping Annotation provided by Affymetrix in their annotation files.[[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:35:53Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. [[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058''').
*The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel.
*The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:35:12Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. [[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_goon.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:34:43Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. [[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:Image:E_goon.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E goon.png

2006-02-24T18:34:11Z

Daly:

User:Daly

2006-02-24T18:29:38Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' As the application creates a tree for the selected ontology, a pop-up window will display the progress. [[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T18:28:12Z

Daly:

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' The application creates a tree for the selected ontology. [[Image:E_map.png]]

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

File:E map.png

2006-02-24T18:27:43Z

Daly:

User:Daly

2006-02-24T17:15:08Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

2. Click '''Map List.''' The application creates a tree for the selected ontology.

3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T17:14:50Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

1. Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).
2. Click '''Map List.''' The application creates a tree for the selected ontology.
3. The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T17:14:22Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

# Create a Marker panel with the follow markers:
* AFFX-hum_alu_at
* AFFX-LysX-M_at
* AFFX-LysX-5_at
* AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).
# Click '''Map List.''' The application creates a tree for the selected ontology.
# The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T17:13:30Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

# Create a Marker panel with the follow markers:
AFFX-hum_alu_at
AFFX-LysX-M_at
AFFX-LysX-5_at
AFFX-LysX-3_at For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

# Click '''Map List.''' The application creates a tree for the selected ontology.
# The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T17:13:11Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

# Create a Marker panel with the follow markers:
AFFX-hum_alu_at
AFFX-LysX-M_at
AFFX-LysX-5_at
AFFX-LysX-3_at
For assistance in creating a marker panel, please refer to Working with Markers and Phenotype Tutorial (INSERT LINK).

# Click '''Map List.''' The application creates a tree for the selected ontology.
# The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.

User:Daly

2006-02-24T17:10:08Z

Daly: /* Enrichment Analysis */

= Overview=

geWorkbench is an open-source bioinformatics platform that offers a comprehensive and extendible collection of tools for the management, analysis, visualization and annotation of biomedical data. This tool is aimed at providing researchers a centralized repository for the data analysis and visualization of gene expression data, gene sequences and protein sequences.

'''What can you do with geWorkbench?'''

* Use one program that integrates with multiple existing bioinformatics modules for analysis and visualization.

* Access remote servers and clusters for the performance of computationally intensive calculations - quicker!

* Streamline data analysis with:
** flexible import options that support merging files from various sources ''(insert possible data source ie RMA express).''
** support for a variety of genomic data including microarrays, sequences, pathways, networks, alignments and phenotypes.
** access analyses with biological annotations from the National Cancer Institute.

* Community: Insert developer benefit ( plugin)

The diagram illustrates the use of geWorkbench by Researchers.

[[Image:slide1.gif]]

For detailed system documentation, please see the Documentation section (INSERT LINK).

= Tutorial=

===Welcome! ===

Welcome to the geWorkbench tutorial. In the following tutorials you will learn how to use geWorkbench. This is an example based, illustrated guide for both beginners and experienced users. This tutorial was last updated in '''DATE''' and reflects changes to geWorkbench through '''DATE'''.

===Using the Tutorial ===

These tutorials assume that you have successfully completed the installation instructions. If you haven't installed the entire program, please go [[Download]] and follow the installation instructions.

The tutorials are organized into a number of separate topics. Sample data sets that are used throughout the tutorials. Each tutorial is self contained and does not depend on any other tutorial for sample data. Sample data is located '''''(INSERT LINK or Instructions on how to download)'''''

'''(we should have sample data labeled Tutorial #)'''

Within the tutorials there are two basis types of text: 
*Text that looks like this explains topics. 

# Text in numbered steps is instructions for you to follow using tutorial data files.

''' insert instruction on how to navigate between tutorials'''.

==Getting Started== MV
* Starting the application
* GUI elements
** Panels
** Navigation
== Loading Data== MV
* Data formats

== Working with Marker and Phenotype Panels==
In this tutorial, you will:
* Become familiar with the use of panels in geWorkbench
* Create active phenotype panels
* Classify the panels you created

Before you can continue, geworkbench should be running. For help with this, please refer to the '''''Getting Started''''' (insertlink) section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Creating Panels===

When working with microarrays, geWorkBench uses the term '''''marker''''' to refer to a gene probe (in other cases, it can be individual items from other data sets, such as sequences).
'''''Phenotype''''' refers to any user-defined grouping of microarrays. These microarrays will often share some common property that in most cases is phenotypic, although this is not a requirement. For example, one such “phenotype” might be a single experiment on a tumor tissue sample, with a second “phenotype” defined as a collection of experiments performed on normal tissue samples.

====Assign Panel====

1. In the Phenotype panel, select and the following arrays in the contain samples from the congestive cardiomyopathy disease state.
JB-ccmp_0120.txt, JB-ccmp_0218.txt, JB-ccmp0718.txt, JB-ccmp0811.txt, JB-ccmp1003.txt, JB-ccmp_1109.tx

2. Right-click, select '''Add to Panel'''.

3. Enter "Cardio" in the input box and click OK.

[[Image:T_PanelLabelCardio.png]]

4. Next, similarly label the follow arrays as "Normal" ('' repeat steps 2 & 3 '').
JB-n_0106.txt , JB-n_0821.txt, JB-n_0915.txt, JB-n_1303.txt

5. Select the checked boxes next to the panel name to indicate that these groups of arrays are "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

[[Image:T_PhenotypesPriorToCase.png]]

====Classify panel====

For statistical tests such as the t-test, Case and Control groups can be specified.

# Left-click on the thumb-tack icon in front of the phenotype name.
# Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
[[Image:T_PhenotypeSettingCase.png]]

A red thumbtack indicates the arrays have been specified as "Case".

[[Image:T_PhenotypeCaseSet.png]]

== Visualize Gene Expression==
In the this tutorial, you will:

* Get acquainted with the various geWorkbench visualization tools
* View a dataset in geWorkbench
* Modify the visualization preference settings

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink)section. The file '''''cardiomyopathy.exp''''' is used in the this tutorial is from the Load Data (insertlink)tutorial.

===Visualization Tools===

Visualization tools provide a view of the chip(s) under investigation and can be used for ascertaining the quality of the data. Active gene and phenotype panels (insertlink) restrict the number of markers/arrays displayed. The images created can be saved and exported. A detailed description on how to manipulate visualization componenets is described in online help.

{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|'''Microarray View''': Used to inspect each separate microarray using the Array scroll bar.||[[Image: Ema.png]]
|-
|-
|'''Tabular Microarray Panel''': Presents the numerical values of the expression measurements in a table format. One row is created per individual marker/probe and one column per microarray.||[[Image:Etab.png]]
|-
|-
|-
|'''Color Mosaic''': Heat maps for microarray expression data, organized by phenotypic or gene groupings.||[[Image:Ed_cm.png]]
|-
|-
|-
|'''Expression Profiles''': This is a line graph of genes expression profiles across several arrays/ hybridizations.Each marker is a separate color line.||[[Image:Eep.png]]
|-
|-
|-
|'''Scatter Plot''': A pairwise (''array vs. array and marker vs. marker'') comparison and plotting of expression values.||[[Image:Esp.png]]
|-
|-
|-
|-
|}

'''View a dataset'''
# Select a '''''cardiomyopathy.exp''''' in the Project Panel.
# Select the '''''Microarray Panel''''' visualization component in the View Area at the top-right section of the interface.
# Deselect the All Markers checkbox to display the entire dataset.

Note: ''''''''All Arrays'''''''' and ''''''''All Markers'''''''' checkboxes determine which data points are included in the display.
If neither is checked, then the entire data set is shown. The All Arrays control is useful when working with data sets comprising multiple arrays. In this case, only those arrays that are included in a currently activated phenotype panel will be displayed.

[[Image: Allm.png]]

=== Preferences===
The Preferences selection in the Tools menu allows users to specify how certain aspects of the system will behave. Once your preferences are set, these preferences are persistent between application sessions and are applied at once.

'''Modifying Settings'''

1. From the main menu, click on '''Tools>Preferences'''.

2. In the Preferences pop-up window, you can define settings for:
*Text Editor: The editor selected will be used to open and inspect data sets loaded in a project. Notepad is the default setting.
* Visualization: The color scheme to be applied to color mosaic images.
**Absolute: (default) Let M = max{|min|, |max|} over all expression measurements, across all arrays. If expression value x > 0, assign it the red spectrum x / M * 256. If expression value x is negative, assign it to the green spectrum -x / M * 256.
** Relative: This is similar to the setting for Absolute, but each marker is mean-variance normalized first.
* Genepix Value Computation: You can specify how compute the value displayed for Genepix array. The default setting is Option (Mean F635 - Mean B635) / (Mean F532 - Mean B532).

Select '''Relative''' for the visualization preference.

3. Click on '''OK'''.

[[Image: Preferences.gif]]

== Filter and Normalize Data==
In this tutorial, you will:
* Get acquainted with the various filters and normalizers available in geWorkbench
* Apply a filter and normalizer on a tutorial dataset

Before you can continue, geworkbench should be running. For help with installation, please refer to the '''''Getting Started''''' (INSERT LINK)section. Load the tutorial file '''cardiomyopathy.exp''' . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===Filter===

Filtering can be used to screen out missing data points, remove low quality data or reduce the size of the dataset by removing less interesting data. Available geWorkbench filters are as follows:

{|style="border: 1px solid lightGray"
!Filter||Description||
|-
|-|-
|-

|'''Affy Detection Call'''||Applicable to Affymetrix data only. Sets all measurements whose detection status is any user-defined combination of A, P or M as missing.
|-

|'''Missing values''' ||Discards all markers that have “missing” measurements in at least n microarrays, where n is defined by the user. Another filter must first be applied however, in order to generate the missing values upon which this filter can operate.
|-
|-
|-

|'''Deviation''' ||Sets all markers whose measurements deviate below a given value across all microarrays as missing.
|-
|-
|-

|'''Expression Threshold''' ||Sets all markers whose measurements are inside (or outside) a user-defined range as missing.
|-
|-
|-

|'''2 Channel''' ||Applicable to 2-channel arrays (Genepix) data only. Defines applicable ranges for each channel, and sets all values for which either channel intensity is inside (or outside) the defined range as missing.
|-
|-
|-
|}

[[Image:Filterpanel.gif]]

Perform the following steps to filter out data called absent in an Affymetrix file:
# In the Filtering Panel, select'' Affy Detection Call Filter''.
# Select ‘A’ (Absent) checkbox and '''Filter.''' Values that were removed (marked as missing) are highlighted in yellow.
# In the Filtering Panel, select '''Missing Values Filter'''.
# Choose the maximum number of arrays that can have missing values before marker is removed – default is 0.
# Click '''Filter'''. Markers with more than 0 missing values are removed. You’ll notice the yellow values are gone

{|style="border: 1px solid lightGray"
!'''Affy Detection Call Filter'''||'''Missing Values Filter'''||
|-
|-|-
|-
| [[Image: Filtered.gif]]||[[Image:Mvfilter.gif]]
|-
|}

===Normalize===
Normalization can be used to decrease the effects of systematic differences across a set of experiments.
In geWorkbench, normalization results in replacing values with new values. Available geWorkbench normalization methods are as follows:''

{|style="border: 1px solid lightGray"
!Normalizer||Description||
|-
|-

|Missing value calculation ||Replaces every missing value with either the mean value of that marker across all microarrays or with the mean measurement of all markers in the microarray where the missing value is observed
|-
|-
|Log2 Transformation ||Applies a log2 transformation to all measurements in a microarray
|-
|-
|-
|Threshold Normalizer ||All data points whose value is less than (or greater than) a user-specified minimum (maximum) value are raised (reduced) to that minimum (maximum) value
|-
|-
|-
|-
|Marker-based Centering ||Subtracts the mean (median) measurement of a marker profile from every measurement in the profile
|-
|-
|-
|Array-based centering ||Subtracts the mean (median) measurement of a microarray from every measurement in that microarray
|-
|-
|-
|Mean-variance normalizer ||For every marker profile, the mean measurement of the entire profile is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation
|-
|-
|}
[[Image:Normalpanel.gif]]

'''Apply Quantile Normalizer'''

1. In the Normalization Panel, select ''Quantile Normalizer''.

2. Leave the default averaging method of ''Mean Profile Marker'' to indicate handling of missing values..

3. Click '''Normalize'''. The View Area is updated to reflect normalization (after the screen has been refreshed).
Note: The first value in the second row was update from ''41,394.6'' to ''55,779.26''.

{|style="border: 1px solid lightGray"
!PRENORMALIZATION||NORMALIZED||
|-
|-|-
|-
| [[Image:Prenormalizer_ed.gif ]] || [[Image: Postnormalizera.gif ]].
|-
|-
|-
|}

== Clustering Gene Expression Data== ken
*Hierarchical Clustering
* Self Organizing Map (SOM)

== Differential Expression ==
In this tutorial, you will:

* Get acquainted with the T Test and Multi T Test
* Apply a T Test and Multi T Test

Before you can continue, geworkbench should be running. For help with installation, please refer to the Getting Started (INSERT LINK)section. Load the tutorial file cardiomyopathy.exp . Please refer to Load Data (insertlink)tutorial if you need assistance loading this file.

===T Test===

T Test analysis identifies markers with statistically significant differential expression between sets of microarrays. The t-test determines for each marker if there is a significant difference between the two groups (case and control). To perform this analysis, you must classify the panels, set the analysis parameters and view the results in the visualization components. A detailed description of the T Test parameters is described in online help.

'''Classify the Panels'''

1. Mark the '''Cardio''' phenotype a 'Case'. By default, panels are marked as control. Panels classified Case is shown with a red thumbtack icon.
* Right-click on '''Cardio''' phenotype.
* Select '''Classification'''>'''Case'''.
2. Activate the arrays '''Normal''' and '''Cardio''' by selecting the checkboxes next to the panel name.

[[Image:T_PhenotypeSettingCase.png]]

'''Set Analysis Parameters'''
# From the Analysis Panel, select '''T-Test Analysis'''.
# Populate the below parameters values and click on '''Analyze'''.

* Alpha-corrections tab: Just Alpha.
* P-Value Parameters tab: p-values based on t-distribution. Note that the default alpha (critical p-value) is set to 0.01.
* Degree of Freedom tab: Welch approximation - unequal group variances.

[[Image:Ttest.gif]] 

'''T-Test Results'''

{|style="border: 1px solid lightGray"
!|| ||
|-
|-
| Markers which met the significance test are included in a new gene panel called “Significant Genes”. || [[Image:E_ttestgpanel.png]]
|-
|-
| Ancillary dataset is created in the project window. || [[Image:Ed_ttestproj.png]]
|-
|}

The values of the T-Test can be seen in the Color Mosaic panel and the Volcano Plot.

{|style="border: 1px solid lightGray"
!VOLCANO PLOT||COLOR MOSAIC||
|-
|-|-
|-
|-
| [[Image:Vplot.png]] || [[Image:Ed_cm.png]]
|-
|-
|-
| Clicking on any of the spots highlights the marker selected in the Marker Panel. * Insert another description ||
* The label to the right displays the Significance value ( lower the value, most likely different) and gene name for the displayed genes. The genes are displayed in ascending order by Significance Value.

* Gene height and width values can be altered to modify the display.

* The intensity slider is used to modify the intensity of the color codings.

* Accession: Includes the accesion number in the label.

* Printer Icon: Prints the displayed image.

* Display: Must be toggled on to display data.

* ''Pat, Abs, Ratio Overlapping Pages Icon: Not the T Test display.''

|-
|-
|}

=== Multi T Test=== '''(IN PROGRESS)'''

== Regulatory Network ==
In this tutorial, you will:

* Create a gene network in Reverse Engineering
* View the network in Cytoscape
* Create a Marker panel from the Cytoscape network

Before you can continue, geworkbench should be running. The file [[Webmatrix.EXP]] used in the this tutorial. Please refer to Load Data (insertlink)tutorial if you need assistance loading this file

===Reverse Engineering===

The Reverse Engineering component is used to analyze a large amount of microarray data to reverse engineer the underlying gene regulatory network. The details of this algorithm is described in "ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context", Califano et. al. (http://arxiv.org/abs/q-bio.MN/0410037).

'''Create a Network'''
{|style="border: 1px solid lightGray"
||||
|-
|-
|-

|1. In the Gene Panel, enter '''1973_s_at''' in the '''Find Next''' text box. The list box will navigate '''1973_s_at''', click on that gene.||[[Image: E_panel.PNG]]

|-
|2. Go to the Reverse Engineering in the View pane. You should be in the first tab “Profiler” and in “Basic”. The marker '''1973_s_at''' should be listed in the Hub Gene box.||[[Image:E_RE1.PNG]]
|-
|3. Click on '''Analyze 2D'''. A list of markers is returned sorted by interaction strength below '''Search Box'''. ||
|-
|-
|-
|4. Select all the genes with score >= 5 by clicking on the first marker '''[12.53] 37724_at''', using Shift-down arrow to highlight the values >= 5.||[[Image: E_re2.png]]
|-
|-
|-
|5. Click '''Create Network''' The network created is displayed the Cytoscape view.||
|-
|-
|}

'''View the Network in Cytoscape'''

A full description of the capabilities and functionality of Cytoscape can be found at http://www.cytoscape.org/.

# In the Cytoscape Layout menu, select '''yFiles/ Organic''' to modify the network display.
# In this network image, select the central marker. It should turn yellow. [[Image:E_network.png]]
# In the Select menu, select Nodes>First neighbors of selected nodes>Shape>Triangle. The first neighbors is updated from a square shape to triangle.
# Ctrl+ mouse select the network. The nodes selected nodes are yellow. These selected genes are a returned to the Gene Panel with name Selected Genes[Cytoscape].

== Enrichment Analysis==

In the this tutorial, you will:

* Gene Ontology

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

'''Gene Ontology'''

Gene Ontology component provides categorization of genes in terms of function their products perform, their cellular localization or involvement in high level biological processes. The actual category definitions are provided by the Gene Ontology Consortium (http://www.geneontology.org ) which we make use of in generating a tree structure which approximates the Directed Acyclic Graph structure used by the Consortium.

# Create a Marker panel with the follow markers:
# Click '''Map List.''' The application creates a tree for the selected ontology.
# The resulting nodes in the mapped tree will display two numbers separated by a forward slash such as '''cell(1/6058'''). The first number is the number of Genes mapped to a specific category and all its child nodes from the active marker panel. The second numbers after the slash are the mappings as above for the reference list (or the entire chip based on the reference list check-box selection and if a reference list is loaded from the file system).

[[Image:E_go.png]]

== Sequence Analysis== ken
* Sequence Retrieval
* Sequence Homology Analysis
** Blast
** Other

==Pattern Discovery== ken
* Position Histogram

== Promoter Analysis==

==Integrated Annotation Information==
In this tutorial, you will:

* rrrrr
* rrrrr

Before you can continue, geworkbench should be running. For help with this, please refer to the Getting Started (insertlink) section. The file (INSERT FILE NAME) used in the this tutorial.

Perform the following steps to view annotations of a dataset('''''Not recommended for large sets of genes):

1. In the Marker Annotations in the View pane, click '''Use Panels.'''

A table of gene names and links to pathway diagrams is returned if available from the NCI CGAP database.

2. Clicking on gene names returns annotations from the NCI CGAP database in your default web browser window.

3. Clicking on the pathway diagram name will display a pathway diagram in the caBIO Pathways panel in the View pane.
View pathway display in caBIO pathways viewer - pathway genes symbols can in turn be clicked to display their gene information in the default web browser window.