Array Sets
Contents
- 1 Overview of Marker and Array Sets
- 2 Common Principles of Operation of Marker and Array Sets
- 3 Controls
- 4 Examples
- 4.1 Add an array to the default set by double-clicking
- 4.2 Removing an array from a subset
- 4.3 Assigning arrays to an new or existing subset
- 4.4 Adding arrays to an existing subset - shortcut
- 4.5 Manipulating array subsets
- 4.6 Activating subsets
- 4.7 Classifying a subset
- 4.8 Creating a new set to contain subsets
- 4.9 Example of working with multiple array sets
Overview of Marker and Array Sets
The Markers/Arrays component, located at lower left in the geWorkbench graphical interface, allows the user to define and use sets of arrays and markers for a number of purposes.
As used in geWorkbench, the term "marker" includes genes, probes/probesets, and individual sequences, depending on the type of data loaded. Sets of markers can be returned by various analysis routines. For example, the t-test returns a list of markers showing significant differential expression, and after hierarchical clustering, the markers in a subtree of the resulting dendrogram can be saved to a list.
Sets of microarrays can be used to group arrays in a meaningful fashion for statistical analysis. For example, two such phenotypes might be the diseased and normal states of a tissue from which samples have been taken. geWorkbench uses the terms "Case" and "Control" to categorize these, but in biological setting the equivalent would be "Experimental" vs "Control".
This chapter discusses the use of sets of microarrays. Please see the chapter Data Subsets - Markers for a discussion of the use of Marker sets.
The figure below shows the Arrays/Phenotypes component located below the Project Folders component in geWorkbench. The Markers component is located in the same space, under a separate tab.
Common Principles of Operation of Marker and Array Sets
Rather than using all arrays or all markers in a data set for a particular analysis or visualization, the user may wish to restrict those used to only some subset.
Activating Sets of Markers and Arrays
In the Markers and Arrays components, sets of markers and arrays can be defined by the user, and also are created as the outcome of various analysis methods. Adjacent to each set in the graphical interface is a checkbox. Marking this box "checked" activates the subset.
- Activating a set restricts many geWorkbench components to using as input only the markers or arrays that are in such activated sets.
- Marker Sets
- If no Marker set is active, all Markers are used.
- If at least one Marker set is activated, affected components will only use markers that are in activated sets.
- Array Sets
- If no Array set is active, all Arrays are used.
- If at least one Array set is activated, affected components will only use arrays that are in activated sets.
Controls
Upper Pane
The list in upper pane of the Arrays component shows the arrays loaded in the current data set.
The upper pane of the Arrays/Phenotypes component has the following controls:
- Search text field - Search for arrays by typing in a name or portion of a name. As one types, the first array matching the entry so far will be highlighted. In some cases however, the Find Next button must be pushed to find a match. If the typed entry matches no arrays, it will be displayed in red.
- Find Next button - find the next array matching the typed entry.
Selecting one or more arrays in the list and then right-clicking gives the following choices in a pop-up menu:
- Add to Set - Add the selected arrays to a new or existing subset.
- Clear Selection - unhighlights the selected arrays.
Lower Pane
The lower pane of the Arrays/Phenotypes component has the following controls:
- Array/Phenotype Sets menu - Select which named list of array sets to display. Each can contain a different arrangement of the arrays into subsets.
- New button - Create a new array set.
If you right-click on a subset, a menu with the following choices appears:
- Rename - Rename the selected set.
- Copy - Make a copy of the selected set.
- Activate - Activate the selected set. This can also be done directly by checking the check box before its entry.
- Deactivate - Deactivate the selected set. This can also be done directly by unchecking the check box before its entry.
- Delete - Delete the selected set.
- Combine - Combine the selected sets into a new set. Methods are:
- Union - Add all arrays from all selected sets.
- Intersection - Add arrays that are in each selected set.
- XOR - Add arrays that are in only one of the selected sets.
- Print - Print the selected set of arrays.
- Visual Properties - Change the color and shape of points representing arrays in graphical components, e.g. in the Scatter Plot.
- Classification - Designate the experimental class of an array set, chosen from: Case, Control, Test and Ignore.
- Save - Save the chosen set of arrays as a simple list (CSV format, one array per line) to a file on disk.
Examples
In this tutorial we will start with the same data files that were used in Tutorial - Local Data Files. Load the ten individual MAS5 data files as shown there in the section "Loading microarray data files - local".
Add an array to the default set by double-clicking
Removing an array from a subset
One or more arrays can be removed from a set by highlighting them and the right-clicking. A menu will appear with option "Remove from Set".
Assigning arrays to an new or existing subset
We will place the new subsets of arrays in the "Default" set, however you can create a new set by pushing the New button on Array/Phenotype Sets at lower left.
First, we will select and label arrays which contain samples from the congestive cardiomyopathy disease state:
1. In the Arrays/Phenotypes component, select the six arrays beginning with JB-ccmp, which represent the samples from the congestive cardiomyopathy disease state.
2. Right-click, select Add to Set. In the dialog box, you can enter the name of either an existing subset, or of a new subset to be created.
3. Enter the new subset name "CCMP" in the input box and click OK.
4. Next, similarly label the arrays beginning with JB-n as "Normal" ( repeat steps 2 & 3 ):
The Array/Phenotype Sets component will now show the two subsets added:
Adding arrays to an existing subset - shortcut
If you wish to add additional arrays to an existing subset, you can avoid having to type in its name again in the dialog box by first selecting the target subset in the lower pane. Then right-click on a selection of arrays above and select "Add to Set" from the pop-up menu. The name of the existing subset will appear in the "Add to Set" dialog.
Manipulating array subsets
Right-clicking on an array subset produces a menu with actions that can be applied to it, as already described in the Controls section. A few will be demonstrated in more detail in the following sections.
Activating subsets
The check boxes next to the subset name can be checked to indicate that a subset of arrays is "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.
Classifying a subset
For statistical tests such as the t-test, Case and Control sets can be specified.
- Left-click on the thumb-tack icon in front of the phenotype name.
- Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default labeled control.
A red thumbtack indicates the arrays have been specified as "Case".
Creating a new set to contain subsets
Pushing the "New" button will bring up a dialog box in which the name of a new set can be entered.
In turn, once the new set is created, a new collection of subsets can be created within it.
Example of working with multiple array sets
There can be different groupings of the same arrays in the Arrays/Phenotypes and Marker components. Here we show how there are several different set groupings which are predefined in the example data file "BCell-100.exp". After loading this file into geWorkbench as type "Affymetrix File Matrix", the following listed sets can be seen in the Arrays/Phenotypes group pulldown menu.
- Default
- Class
- Source- short
- Source - detailed
Each such set can contain a different arrangement of the arrays into subsets.
If we choose the set called "Class", the following subsets of arrays are displayed:
If instead we choose the set "Source - short", a different division into subsets of the same arrays is seen: