Creates a GCT file from a set of CEL files from Affymetrix ST arrays.
Author: David Eby, Broad Institute
Contact:
gp-help@broadinstitute.org
Algorithm Version:
Please Note that version 0.14 is currently only available in beta on GenePattern Team hosted servers. We are working to release updates which will be available for use on all platforms. Feel free to contact us with any questions.
This module creates a gene expression dataset from a set of CEL files for Affymetrix ST arrays. It is similar to ExpressionFileCreator, which operates on CEL files from the older 3' biased IVT-based Affymetrix arrays. The conversion is done using the Robust Multi-array Average (RMA) algorithm as provided by the 'oligo' package in Bioconductor. The result is a matrix containing one intensity value per probe set per sample in the GCT file format.
Note that the RMA algorithm will log-transform the data during processing. This may affect downstream processing by other modules, some of which will produce erroneous results with log-transformed data unless adjustments are made. For example, the ComparativeMarkerSelection module has a parameter that must be set for it to accept and adjust for log-transformed data.
Multiple CEL files can be uploaded directly to the input file parameter for processing. The parameter also accepts CELs packaged as a ZIP or TAR bundle or supplied as a directory input if your GenePattern server is configured to allow it. You can provide multiple ZIPs, TARs, or directory inputs as well, or mix all of these forms. The CEL files can be compressed in GZ format and the TAR bundles can be in GZ, XZ, or BZ2 format. Any directory inputs will be recursively searched for CEL files (uncompressed or in GZ format) to include in the dataset; ZIPs and TARs in these inputs will not be included, however.
You can supply an optional CLM file listing the CEL files to be included in the dataset, their order, their phenotypic categories, and their alternate sample names. Note that if there are any files submitted for a job but not listed in the included CLM file, those files will not be included in the dataset. The column order of the dataset will match the order of the CLM listing. If no CLM file is provided, the CEL file names will be used as sample names and the order will match the module's processing order. This can be somewhat unpredictable, so if order is important then the use of a CLM is recommended.
Carvalho BS and Irizarry RA (2010). “A Framework for Oligonucleotide Microarray Preprocessing.” Bioinformatics. ISSN 1367-4803.
Carvalho BS and Irizarry RA (2014). "Package 'oligo'" documentation from Bioconductor 2.14.
Name | Description |
---|---|
input file * | One or more Affymetrix ST CEL files either uploaded directly, packaged into a ZIP or TAR bundle, or supplied through a directory input. The CEL files can be in GZ format and the TAR can be in GZ, XZ, or BZ2 format. The parameter will accept multiple inputs in any of these forms. |
normalize * | Whether to normalize data using quantile normalization. |
background correct * | Whether to perform background correction. |
clm file | A tab-delimited text file containing one scan, sample, and class per line. |
annotate probes * | Whether to annotate probes with the gene symbol and description. |
output file base * | The base name of the output file(s). File extensions will be added automatically. |
* - required
[Yet to be posted]
Requires R 3.1.3 and a set of R package dependencies from CRAN and Bioconductor. R 3.1.3 must be installed and configured by the GenePattern administrator before this module can be installed [Instructions yet to be posted. Will link to an updated version of our Admin Guide on the subject]. The package dependencies will be automatically installed when the module is installed.
Task Type:
Preprocess & Utilities
CPU Type:
Operating System:
any
Language:
R
Version | Release Date | Description |
---|---|---|
0.14 | 2015-10-22 | Updated to make use of the R package installer. |