Difference between revisions of "Tutorial Data"
(→Tutorial data files) |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{TutorialsTopNav}} | {{TutorialsTopNav}} | ||
− | + | __TOC__ | |
− | |||
+ | ==Tutorial data files== | ||
+ | The file [[Media:Bcell-100.zip|Bcell-100.zip]] contains the file Bcell-100.exp, which is a geWorkbench data matrix format file containing 100 Bcell experiments using Affymetrix U95-Av2 chips. It contains the same data as the file "webmatrix2.exp" which has been used in the tutorials previously. The array groups have been given more descriptive names. The data has been processed using the Affymetrix MAS5 normalization routine. It comes from the lab of Dr. Ricardo Dalla-Favera at the Institute for Cancer Research at Columbia University. The 100 arrays represent B-cell experiments drawn from a variety of conditions, including cancerous and normal, germinal center and non-germinal center. | ||
− | * '''cardiogenomics.med.harvard.edu/''' -Contains 10 individual MAS5/GCOS format data files. | + | |
− | * ''' | + | [[Media:Bcell-100_log2.zip|Bcell-100_log2.zip]] is the BCell-100 data normalized with the Threshold Normalizer, with the minimum value set to 1.0, and then log2 transformed. |
+ | |||
+ | |||
+ | The file [[Media:Tutorial_data.zip|Tutorial_data.zip]] (3.586 MB) contains data files in several different formats useful for the tutorials or just trying out geWorkbench. It contains the following files: | ||
+ | |||
+ | *'''cardiogenomics.med.harvard.edu/''' -Contains 10 individual MAS5/GCOS format data files. | ||
+ | *'''webmatrix2_quantile_log2_dev1.2_mv0.exp''' -A geWorkbench "exp" format matrix file containing filtered, normalized data. This data originally derives from the file "webmatrix.exp", but one group of columns has been rearranged so that each condition (phenotype) is kept in one block (webmatrix2.exp). | ||
* '''NM_024426-Wilms.fasta''' -A Genbank nucleotide seqeuence file. | * '''NM_024426-Wilms.fasta''' -A Genbank nucleotide seqeuence file. | ||
* '''NP_077744-Wilms.fasta''' -A Genbank protein seqeuence file. | * '''NP_077744-Wilms.fasta''' -A Genbank protein seqeuence file. | ||
* '''H1H5_HistoneDB_NHGRI.fasta''' -Contains H1 and H5 histone sequences from the NHGRI. | * '''H1H5_HistoneDB_NHGRI.fasta''' -Contains H1 and H5 histone sequences from the NHGRI. | ||
− | * '''ClusterTree38_Sequences.fasta''' -Contains sequences derived from hierarchical clustering. | + | * '''cluster_tree_total_pearsons_84_markers.csv''' - Contains a list of 84 markers derived from hierarchical clustering of webmatrix2. |
− | * '''cluster_tree_12markers.csv''' -Contains a list of markers derived from hierarchical clustering. | + | * '''640f84ClusterPearsonsSeqs.fasta''' - Contains FASTA sequences for 64 of the 84 markers in the list above. |
+ | * '''cluster_tree_total_pearsons_64of84_markers.csv''' -Contains the marker set for the 64 sequences in the above set. | ||
+ | |||
+ | The following are from older examples being phased out. | ||
+ | * '''ClusterTree38_Sequences.fasta''' -Contains sequences derived from hierarchical clustering (old example). | ||
+ | * '''cluster_tree_12markers.csv''' -Contains a list of markers derived from hierarchical clustering (old example). | ||
+ | |||
+ | ==More about Bcell-100.exp== | ||
+ | Among the different B-cell lines represented in Bcell-100.exp are the following, with their abbreviations as used in the dataset: | ||
+ | |||
+ | Cancerous | ||
+ | # CLL/P-CLL - B cell chronic lymphocytic leukemia (P indicates purified, otherwise not) | ||
+ | # BL - Burkitt lymphoma | ||
+ | # DLCD/DLCL - purified DLBCL | ||
+ | # FL - follicular lymphoma | ||
+ | # HCL - hairy cell leukemia | ||
+ | # PEL - primary effusion lymphoma | ||
+ | # D - DLBCL | ||
+ | |||
+ | where DLBCL is diffuse large B cell lymphoma | ||
+ | |||
+ | Non-cancerous | ||
+ | # CB - centroblasts | ||
+ | # CC - centrocytes | ||
+ | # M - memory B cells | ||
+ | # N - naive B cells | ||
+ | |||
+ | ==About the cardiogenomics microarray dataset== | ||
+ | |||
+ | These example MAS5 format data files were obtained from the following site at Harvard University: | ||
+ | |||
+ | http://cardiogenomics.med.harvard.edu/project-detail?project_id=229 | ||
+ | |||
+ | A number of MAS5 format data files are available there. | ||
+ | |||
+ | The specific project is the "Belgium Dataset of Aortic Stenosis, Congestive Cardiomyopathy and Normal LV Function", and the data is downloadable from: | ||
+ | |||
+ | http://cardiogenomics.med.harvard.edu/groups/proj1/pages/download_Hs-belgium.html | ||
+ | |||
+ | An abstract describing the study that produced them is also available, at: | ||
+ | |||
+ | http://cardiogenomics.med.harvard.edu/groups/proj2/pages/Hs-belgium_home.html |
Latest revision as of 15:19, 11 March 2015
Home | Quick Start | Basics | Menu Bar | Preferences | Component Configuration Manager | Workspace | Information Panel | Local Data Files | File Formats | caArray | Array Sets | Marker Sets | Microarray Dataset Viewers | Filtering | Normalization | Tutorial Data | geWorkbench-web Tutorials |
Analysis Framework | ANOVA | ARACNe | BLAST | Cellular Networks KnowledgeBase | CeRNA/Hermes Query | Classification (KNN, WV) | Color Mosaic | Consensus Clustering | Cytoscape | Cupid | DeMAND | Expression Value Distribution | Fold-Change | Gene Ontology Term Analysis | Gene Ontology Viewer | GenomeSpace | genSpace | Grid Services | GSEA | Hierarchical Clustering | IDEA | Jmol | K-Means Clustering | LINCS Query | Marker Annotations | MarkUs | Master Regulator Analysis | (MRA-FET Method) | (MRA-MARINa Method) | MatrixREDUCE | MINDy | Pattern Discovery | PCA | Promoter Analysis | Pudge | SAM | Sequence Retriever | SkyBase | SkyLine | SOM | SVM | T-Test | Viper Analysis | Volcano Plot |
Contents
Tutorial data files
The file Bcell-100.zip contains the file Bcell-100.exp, which is a geWorkbench data matrix format file containing 100 Bcell experiments using Affymetrix U95-Av2 chips. It contains the same data as the file "webmatrix2.exp" which has been used in the tutorials previously. The array groups have been given more descriptive names. The data has been processed using the Affymetrix MAS5 normalization routine. It comes from the lab of Dr. Ricardo Dalla-Favera at the Institute for Cancer Research at Columbia University. The 100 arrays represent B-cell experiments drawn from a variety of conditions, including cancerous and normal, germinal center and non-germinal center.
Bcell-100_log2.zip is the BCell-100 data normalized with the Threshold Normalizer, with the minimum value set to 1.0, and then log2 transformed.
The file Tutorial_data.zip (3.586 MB) contains data files in several different formats useful for the tutorials or just trying out geWorkbench. It contains the following files:
- cardiogenomics.med.harvard.edu/ -Contains 10 individual MAS5/GCOS format data files.
- webmatrix2_quantile_log2_dev1.2_mv0.exp -A geWorkbench "exp" format matrix file containing filtered, normalized data. This data originally derives from the file "webmatrix.exp", but one group of columns has been rearranged so that each condition (phenotype) is kept in one block (webmatrix2.exp).
- NM_024426-Wilms.fasta -A Genbank nucleotide seqeuence file.
- NP_077744-Wilms.fasta -A Genbank protein seqeuence file.
- H1H5_HistoneDB_NHGRI.fasta -Contains H1 and H5 histone sequences from the NHGRI.
- cluster_tree_total_pearsons_84_markers.csv - Contains a list of 84 markers derived from hierarchical clustering of webmatrix2.
- 640f84ClusterPearsonsSeqs.fasta - Contains FASTA sequences for 64 of the 84 markers in the list above.
- cluster_tree_total_pearsons_64of84_markers.csv -Contains the marker set for the 64 sequences in the above set.
The following are from older examples being phased out.
- ClusterTree38_Sequences.fasta -Contains sequences derived from hierarchical clustering (old example).
- cluster_tree_12markers.csv -Contains a list of markers derived from hierarchical clustering (old example).
More about Bcell-100.exp
Among the different B-cell lines represented in Bcell-100.exp are the following, with their abbreviations as used in the dataset:
Cancerous
- CLL/P-CLL - B cell chronic lymphocytic leukemia (P indicates purified, otherwise not)
- BL - Burkitt lymphoma
- DLCD/DLCL - purified DLBCL
- FL - follicular lymphoma
- HCL - hairy cell leukemia
- PEL - primary effusion lymphoma
- D - DLBCL
where DLBCL is diffuse large B cell lymphoma
Non-cancerous
- CB - centroblasts
- CC - centrocytes
- M - memory B cells
- N - naive B cells
About the cardiogenomics microarray dataset
These example MAS5 format data files were obtained from the following site at Harvard University:
http://cardiogenomics.med.harvard.edu/project-detail?project_id=229
A number of MAS5 format data files are available there.
The specific project is the "Belgium Dataset of Aortic Stenosis, Congestive Cardiomyopathy and Normal LV Function", and the data is downloadable from:
http://cardiogenomics.med.harvard.edu/groups/proj1/pages/download_Hs-belgium.html
An abstract describing the study that produced them is also available, at:
http://cardiogenomics.med.harvard.edu/groups/proj2/pages/Hs-belgium_home.html