Microarrays and Bioinformatics
Experiments and Databases
Gene expression values from microarray experiments can be represented as heat maps to visualize the result of data analysis.
The advent of inexpensive microarray experiments created several specific bioinformatics challenges:
Experimental Design
Due to the biological complexity of gene expression, the considerations of experimental design that are discussed in the expression profiling article are of critical importance if statistically and biologically valid conclusions are to be drawn from the data.
There are three main elements to consider when designing a
microarray experiment. First, replication of the biological samples is
essential for drawing conclusions from the experiment. Second,
technical replicates (two RNA samples obtained from each experimental
unit) help to ensure precision and allow for testing differences within
treatment groups. The technical replicates may be two independent RNA
extractions or two aliquots
of the same extraction. Third, spots of each cDNA clone or
oligonucleotide are present as replicates (at least duplicates) on the
microarray slide, to provide a measure of technical precision in each
hybridization. It is critical that information about the sample
preparation and handling is discussed, in order to help identify the
independent units in the experiment and to avoid inflated estimates of statistical significance.[14]
Standardization
Microarray data is difficult to exchange due to the lack of standardization in arrays. This presents an interoperability problem in bioinformatics. Various grass-roots open-source projects are trying to ease the exchange and analysis of data produced with non-proprietary chips:
- For example, the "Minimum Information About a Microarray Experiment" (MIAME) checklist helps define the level of detail that should exist and is being adopted by many journals
as a requirement for the submission of papers incorporating microarray
results. But MIAME does not describe the format for the information, so
while many formats can support the MIAME requirements, as of 2007 no
format permits verification of complete semantic compliance.
- The "MicroArray Quality Control (MAQC) Project" is being conducted by the US Food and Drug Administration
(FDA) to develop standards and quality control metrics which will
eventually allow the use of MicroArray data in drug discovery, clinical
practice and regulatory decision-making. [15]
- The MicroArray and Gene Expression Data (MGED) group is working on the standardization of the representation of gene expression data and relevant annotations.
Statistical analysis
The analysis of DNA microarrays poses a large number of statistical problems, including the normalization
of the data. There are dozens of proposed normalization methods in the
published literature some of which are platform specific; as in many
other cases where authorities disagree, a sound conservative approach
is to try a number of popular normalization methods and compare the
conclusions reached: how sensitive are the main conclusions to the
method chosen?
Also, experimenters must account for multiple comparisons: even if the statistical P-value
assigned to a gene indicates that it is extremely unlikely that
differential expression of this gene was due to random rather than
treatment effects, the very high number of genes on an array makes it
likely that differential expression of some genes represent false positives or false negatives.
Statistical methods tailored to microarray analyses have recently
become available that assess statistical power based on the variation
present in the data and the number of experimental replicates, and can
help minimize type I and type II errors in the analyses.[16]
A basic difference between microarray data analysis and much
traditional biomedical research is the dimensionality of the data. A
large clinical study might collect 100 data items per patient for
thousands of patients. A medium-size microarray study will obtain many
thousands of numbers per sample for perhaps a hundred samples. Many
analysis techniques treat each sample as a single point in a space with
thousands of dimensions, then attempt by various techniques to reduce
the dimensionality of the data to something humans can visualize. [17]
Relation between probe and gene
The relation between a probe and the mRNA that it is expected to
detect is problematic. On the one hand, some mRNAs may cross-hybridize
probes in the array that are supposed to detect another mRNA. On the
other hand, probes that are designed to detect the mRNA of a particular
gene may be relying on genomic EST information that is incorrectly associated with that gene.
Data Warehousing
Microarray data was found to be more useful when compared to other similar datasets. The sheer volume (in bytes), specialized formats (such as MIAME), and curation efforts associated with the datasets require specialized databases to store the data.
For more information about specific Microarray Databases, see Microarray databases.
Detailed Microarray Experiment
steps involved in a microarray experiment (some steps omitted)
This is an example of a DNA microarray experiment, detailing a particular case to better explain DNA microarray experiments, while enumerating possible alternatives.
- The two samples to be compared (pairwise comparison) are grown/acquired. In this example treated sample (case) and untreated sample (control).
- The nucleic acid of interest is purified: this can be all RNA for expression profiling, DNA for comparative hybridization, or DNA/RNA bound to a particular protein which is immunoprecipitated (ChIP-on-chip) for epigenetic or regulation studies. In this example total RNA is isolated (total as it is nuclear and cytoplasmic) by Guanidinium thiocyanate-phenol-chloroform extraction (e.g. Trizol) which isolates most RNA (whereas column methods have a cut off of 200 nucleotides) and if done correctly has a better purity.
- The purified RNA is analysed for quality (by capillary electrophoresis) and quantity (by using a nanodrop spectrometer): if enough material (>1μg) is present the experiment can continue.
- The labelled product is generated via reverse transcription and sometimes with an optional PCR
amplification. the RNA is reverse transcribed with either polyT primers
which amplify only mRNA or random primers which amplify all RNA which
is mostly rRNA, miRNA microarray ligate an oligonucleotide to the
purified small RNA (isolated with a fractionator) and then RT and
amplified. The label is added either in the RT step or in an additional
step after amplification if present.The sense that is labelled depends
on the microarray, which means that if the label is added with the RT
mix, the cDNA is on the template strand while the probes on the sense
strand (unless they are negative controls). The label is typically fluorescent, only one machine uses radiolabels.The
labelling can be direct (not used) or indirect which requires a
coupling stage. The coupling stage can occur before hybridization
(two-channel arrays) using aminoallyl-UTP and NHS amino-reactive dyes (like cyanine dyes)
or after (single-channel arrays) using biotin and labelled streptavin.
The modified nucleotides (typically a 1 aaUTP: 4 TTP mix) are added
enzymatically at a lower rate compared to normal nucleotides, typically
resulting in 1 every 60 bases (measured with a spectrophotometer). The
aaDNA is then purified with a column
(using solution containing phosphate buffer as Tris contains amine
groups). The aminoallyl group is an amine group on a long linker
attached to the nucleobase, which reacts with a reactive dye. A dye
flip is a type of replicate done to remove any Dye effects in
two-channel dyes, in one slide one same is labeled with cy3 the other
with cy5, this is reversed in a different slide. In this example, in
the presence of aminoallyl-UTP added in the RT mix
- The labeled samples are then mixed with a propriety hybridization solution which may contain SDS, SSC, dextran sulfate, a blocking agent (such as COT1 DNA, salmon sperm DNA, calf thymum DNA, PolyA or PolyT), Denhardt's solution and formamine
- This mix is denatured and added to a pin hole in a microarray, which can be a gene chip
(holes in the back) or a glass microarray which is bound by a cover,
called a mixer containing two pinholes and sealed with the slide at the
perimeter
- The holes are sealed and the microarray hybridized, either in a hyb
oven, where the microarray is mixed by rotation, or in a mixer, where
the microarray is mixed by alternating pressure at the pinholes.
- After an overnight hybridization, all no specific binding is washed off (SDS and SSC)
- The microarray is dried and scanner is a special machine where a laser exits the dye and a detector measures its emission
- The image is gridded with a template and the intensities of the features (several pixels make a feature) are quantified
- The raw data is normalized, the simplest way is to subtract the
background intensity and then divide the intensities making either the
total intensity of the features on each channel equal or the
intensities of a reference gene and then the t-value for all the intensities is calculated. More sophisticated methods, include z-ratio, loess and lowess regression
and RMA (robust multichip analysis) for Affymetrix chips
(single-channel, silicon chip, in situ synthesised short
oligonucleotides).
References
- Gibson and Muse, A primer of genome science etc ISBN:0-87893-232-1
- Chomczynski, P. & Sacchi, N. Single-step method of RNA
isolation by acid guanidinium thiocyanate-phenol-chloroform
extraction:Twenty-something years on. Nature Prot. 1, 581–585 (2006).
- Sambrook and Russell (2001). Molecular Cloning: A Laboratory Manual, 3rd edition, Cold Spring Harbor Laboratory Press.
- ^ Churchill GA (2002). "Fundamentals of experimental design for cDNA microarrays". Nature genetics suppliment 32: 490. doi:10.1038/ng1031.
- ^ NCTR Center for Toxicoinformatics - MAQC Project
- ^ Wei C, Li J, Bumgarner RE. (2004). "Sample size for detecting differentially expressed genes in microarray experiments". BMC Genomics 5: 87. doi:10.1186/1471-2164-5-87. PMID 15533245.
- ^ Wouters
L, Gõhlmann HW, Bijnens L, Kass SU, Molenberghs G, Lewi PJ (2003).
"Graphical exploration of gene expression data: a comparative study of
three multivariate methods". Biometrics 59: 1131–1139. doi:10.1111/j.0006-341X.2003.00130.x.
Lab protocols found on microarray labs: [1][2] [3][4]
Microarray Databases
The term microarray database is usually used to describe a repository containing microarray gene expression
data. The key features of a microarray database are to store the
measurement data, manage a searchable index, and make the data
available to other applications for analysis and interpretation (either
directly, or via user downloads).
Microarray databases can fall into two distinct classes:
- A peer reviewed, public repository that adheres to academic or
industry standards and is designed to be used by many analysis
applications and groups. A good example of this is the Gene Expression Omnibus from NCBI or ArrayExpress from EBI.
- A specialized repository associated primarily with the brand of a
particular entity (lab, company, university, consortium, group), an
application suite, a topic, or an analysis method, whether it is
commercial, non-profit, or academic. These databases may be
characterized by:
- A subscription or license may be needed to gain full access,
- The content may come primarily from a specific group (e.g. SMD, or UPSC-BASE),
- There may be limits on how who can use the data, and for what purpose,
- Special permission may be required to submit new data, or there may be no obvious process at all,
- Only certain applications may be equipped to use the data, often also associated with the same entity (for example, caArray at NCI is specialized for the caBIG),
- Further processing or reformatting of the data may be required for standard applications or analysis,
- They claim to address the 'urgent need' to have a standard, centralized repository for microarray data. (See YMD, last updated in 2003, for example),
- There is a claim to an incremental improvement over one of the public repositories,
- A meta-analysis application, which claims to be a database, but instead incorporates studies from one or more public databases (e.g. Gemma uses GEO studies)
Some of the most known public, curated microarray databases are:
| Database |
Scope |
Microarray experiment sets |
Sample profiles |
As of date |
| Gene Expression Omnibus - NCBI |
any curated MIAME compliant molecular abundance study |
8094 |
205148 |
March 11, 2008 |
| Stanford Microarray database |
?? |
12742 |
? |
April 1, 2007 |
| ArrayExpress at EBI |
Any curated MIAME or MINSEQE compliant ranscriptomics data |
4194 |
110731 |
Mai, 2008 |
| UPenn RAD database |
MIAMI compliant public and private studies, associated with ArrayExpress |
~100 |
~2500 |
Sept. 1, 2007 |
| UNC Microarray database |
?? |
~31 |
2093 |
April 1, 2007 |
| MUSC database |
?? |
~45 |
555 |
April 1, 2007 |
| caArray at NCI |
Cancer data, prepared for analysis on caBIG |
41 |
1741 |
November 15, 2006 |
| UPSC-BASE |
data generated by microarray analysis within Umeå Plant Science Centre (UPSC). |
~100 |
? |
November 15, 2007 |
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "DNA Microarray"
|
|