Difference between revisions of "Array Sets"
|  (→Number of members displayed) |  (→Activating Sets of Markers and Arrays) | ||
| Line 22: | Line 22: | ||
| ===Activating Sets of Markers and Arrays=== | ===Activating Sets of Markers and Arrays=== | ||
| − | In the Markers and Arrays components, sets of markers and arrays can be defined by the user | + | In the Markers and the Arrays components, sets of markers and arrays can be defined by the user.  Such sets are also created as the outcome of various analysis methods.  Adjacent to each set in the graphical interface is a checkbox.  Checking this box '''activates''' the subset.    | 
| − | * '''Activating''' a set restricts many geWorkbench components to using as input only the markers or arrays that are in such activated sets. | + | * '''Activating''' a set restricts many geWorkbench components to using as input only the markers or arrays that are in one or more such activated sets. | 
| * '''Marker Sets''' | * '''Marker Sets''' | ||
Revision as of 18:01, 16 February 2011
Contents
- 1 Overview of Marker and Array Sets
- 2 Common Principles of Operation of Marker and Array Sets
- 3 Controls
- 4 Examples
Overview of Marker and Array Sets
The Markers/Arrays component, located at lower left in the geWorkbench graphical interface, allows the user to define and use sets of arrays and markers for a number of purposes.
As used in geWorkbench, the term "marker" includes genes, probes/probesets, and individual sequences, depending on the type of data loaded. Sets of markers can be returned by various analysis routines. For example, the t-test returns a list of markers showing significant differential expression, and after hierarchical clustering, the markers in a subtree of the resulting dendrogram can be saved to a list.
Sets of microarrays can be used to group arrays in a meaningful fashion for statistical analysis. For example, two such phenotypes might be the diseased and normal states of a tissue from which samples have been taken. geWorkbench uses the terms "Case" and "Control" to categorize these, but in biological setting the equivalent would be "Experimental" vs "Control".
This chapter discusses the use of sets of microarrays. Please see the chapter Data Subsets - Markers for a discussion of the use of Marker sets.
The figure below shows the Arrays/Phenotypes component located below the Project Folders component in geWorkbench. The Markers component is located in the same space, under a separate tab.
Common Principles of Operation of Marker and Array Sets
Rather than using all arrays or all markers in a data set for a particular analysis or visualization, the user may wish to restrict those used to only some subset.
Activating Sets of Markers and Arrays
In the Markers and the Arrays components, sets of markers and arrays can be defined by the user. Such sets are also created as the outcome of various analysis methods. Adjacent to each set in the graphical interface is a checkbox. Checking this box activates the subset.
- Activating a set restricts many geWorkbench components to using as input only the markers or arrays that are in one or more such activated sets.
-  Marker Sets
- If no Marker set is active, all Markers are used.
- If at least one Marker set is activated, affected components will only use markers that are in activated sets.
 
-  Array Sets
- If no Array set is active, all Arrays are used.
- If at least one Array set is activated, affected components will only use arrays that are in activated sets.
 
-  Selection set - this is a special, default set.  One is present in both the Markers component and the Arrays component.  The "Selection" set has the following properties:
- The "Selection" set cannot be deleted.
- Double-clicking on a marker or array entry in the upper pane list will add that entry to the default "Selection" set. Double-clicking the same entry again will remove it from the default set.
 
Number of members displayed
The number of members in a set is given inside square brackets just to the right of the set name.
Controls
Upper Pane
The list in upper pane of the Arrays component shows the arrays loaded in the current data set.
The upper pane of the Arrays/Phenotypes component has the following controls:
- Search text field - Search for arrays by typing in a name or portion of a name. As one types, the first array matching the entry so far will be highlighted. In some cases however, the Find Next button must be pushed to find a match. If the typed entry matches no arrays, it will be displayed in red.
- Find Next button - find the next array matching the typed entry.
Selecting one or more arrays in the list and then right-clicking gives the following choices in a pop-up menu:
- Add to Set - Add the selected arrays to a new or existing subset.
- Clear Selection - Clear the contents of the default "Selection" array set.
Lower Pane
The lower pane of the Arrays/Phenotypes component has the following controls:
- Array/Phenotype Sets menu - Select which named list of array sets to display. Each list can contain a different arrangement of the arrays into sets. Note - The Affymetrix Matrix File microarray data file format, native to geWorkbench, supports multiple such lists of array sets being defined and saved.
- New button - Create a new list for array sets.
If you right-click on a subset, a menu with the following choices appears:
- Rename - Rename the selected set.
- Copy - Make a copy of the selected set.
- Activate - Activate the selected set (see explanation in previous section). This can also be done directly by checking the check box before the sets name.
- Deactivate - Deactivate the selected set (see explanation in previous section). This can also be done directly by unchecking the check box before the sets name.
- Delete - Delete the selected set.
-  Combine - Combine the selected sets into a new set.  Methods are:
- Union - Include all arrays that appear in one or more of the selected sets.
- Intersection - Include only arrays that are present in each of the selected sets.
- XOR - Include arrays that are present in one and only one of the selected sets. Note that this usage differs from the logic gate XOR function.
 
- Print - Print the names of the arrays contained in the selected set(s) to a printer.
- Visual Properties - Change the color and shape of points representing arrays in graphical components, e.g. in the Scatter Plot. See details in following section.
- Classification - Designate the experimental class of an array set, chosen from: Case, Control, Test and Ignore.
-  Save - Save the chosen set of arrays as a simple list (CSV format, one array per line) to a file on disk.  If more than one array set is highlighted, two choices are offered:
- Merge into one set - The arrays in all highlighted sets are merged into a single list and written out to a file. A file browser window appears that allows the user to specify the location and name of the new file.
- Save as multiple sets - Each highlighted array set will be saved to a separate file, using the set name as the new file name. A file browser will appear which will allow the user to specify where to save the new files.
 
Examples
In this tutorial we will start with the same data files that were used in the Local Data Files tutorial. Load the ten individual MAS5 data files as shown there in the section "Loading microarray data files - local". Be sure to check the "merge" option.
Using the default Selection set
Adding an array
Double clicking on an array in the upper list will add it to the default "Selection" set in the lower pane.
Removing an array
Double-clicking on the upper list entry again will remove an array from the default "Selection" set. More generally, for any array in the "Selection" set, double-clicking on its entry in the upper list will remove it from the set.
Assigning arrays to an new or existing set
We will place the new sets of arrays in the "Default" list, however you can create a new list by pushing the New button on Array/Phenotype Sets at lower left.
First, we will select and label arrays which contain samples from the congestive cardiomyopathy disease state:
1. In the Arrays/Phenotypes component, select the six arrays beginning with JB-ccmp, which represent the samples from the congestive cardiomyopathy disease state.
2. Right-click,  select Add to Set.  In the dialog box, you can enter the name of either an existing set, or of a new set to be created.
3. Enter the new subset name "CCMP" in the input box and click OK.
4. Next, add the arrays beginning with JB-n to a new set with name "Normal" ( repeat steps 2 & 3 ):
The Array/Phenotype Sets component will now show the two sets added. Note that the number of arrays in each set is shown in square brackets to the right of the set name.
Adding arrays to an existing set - shortcut
If you wish to add additional arrays to an existing set, you can avoid typing in its name again in the dialog box by first selecting the target set in the lower pane. Then right-click on a selection of arrays above and select "Add to Set" from the pop-up menu. The name of the existing set will appear in the "Add to Set" dialog.
Removing an array from a set
One or more arrays can be removed from a set by highlighting them and the right-clicking. A menu will appear with option "Remove from Set".
Manipulating array sets
Right-clicking on an array set produces a menu with actions that can be applied to it, as already described in the Controls section. A few will be demonstrated in more detail in the following sections.
Activate / Deactivate sets
Checking the box next to one or more set names "activates" those sets.
The boxes can either be
- checked directly, or
- one or more sets can be highlighted in the list and their state set using the "Activate" or "Deactivate" choices in the right-click menu.
Classifying a set
For statistical tests such as the t-test, sets can be classified using several preset labels, e.g. "Case" and "Control".
- Left-click on the thumb-tack icon in front of the phenotype name.
- Select a classification, choosing from "Case", "Control", "Test", and "Ignore". The default classification is "Control".
A red thumbtack indicates the arrays have been specified as "Case".
The thumbtack color key for classification is shown at the bottom of the Arrays component. The classifications and their colors are:
- Case - red
- Control - no color
- Test - green
- Ignore - gray.
Save
If more than one array set is highlighted, the right-click menu will provide two options for saving array sets.
Visual Properties
Selecting the "Visual Properties" menu item, here selected for the array set "GC B cell",
causes a properties editor to appear. In it, the shape and color of points representing arrays in various geWorkbench graphical components can be globally altered.
The color chooser:
The shape chooser:
Chooser showing selection of a green "plus" shape to represent the array set.
After the visual properties of the "GC B-cell" set have been altered, we can view their appearance in, for example, the Scatter Plot component.  Here we have activated both the "GC B-cell" set and the "non-GC B-cell" set.  The former now uses the green plus signs, whereas the later uses a system assigned default color and shape.
Creating a new list to contain sets
Pushing the "New" button will bring up a dialog box in which the name of a new list can be entered.
In turn, once the new list is created, a new collection of sets can be created within it.
Working with multiple lists of array sets
There can be different groupings of the same arrays in the Arrays/Phenotypes components. The example data file "BCell-100.exp" predefines 3 such lists of sets. After loading this file of type "Affymetrix File Matrix" into geWorkbench, the following lists of sets can be seen in the Arrays/Phenotypes group pulldown menu.
- Default
- Class
- Source - short
- Source - detailed
Each such list can contain a different arrangement of the arrays into subsets.
If we choose the list called "Class", the following sets of arrays are displayed:
If instead we choose the list "Source - short", a different division into subsets of the same arrays is seen:



























