Bioinformatics Encyclopedia
Home Bioinformatics Science Fair Projects Bioinformatics Resources Bioinformatics Books Biology Jokes and Evolution
 
 


Microarrays and Bioinformatics
Experiments and Databases



Gene expression values from microarray experiments can be represented as heat maps to visualize the result of data analysis.
Gene expression values from microarray experiments can be represented as heat maps to visualize the result of data analysis.

The advent of inexpensive microarray experiments created several specific bioinformatics challenges:


Experimental Design

Due to the biological complexity of gene expression, the considerations of experimental design that are discussed in the expression profiling article are of critical importance if statistically and biologically valid conclusions are to be drawn from the data.

There are three main elements to consider when designing a microarray experiment. First, replication of the biological samples is essential for drawing conclusions from the experiment. Second, technical replicates (two RNA samples obtained from each experimental unit) help to ensure precision and allow for testing differences within treatment groups. The technical replicates may be two independent RNA extractions or two aliquots of the same extraction. Third, spots of each cDNA clone or oligonucleotide are present as replicates (at least duplicates) on the microarray slide, to provide a measure of technical precision in each hybridization. It is critical that information about the sample preparation and handling is discussed, in order to help identify the independent units in the experiment and to avoid inflated estimates of statistical significance.[14]

Standardization

Microarray data is difficult to exchange due to the lack of standardization in arrays. This presents an interoperability problem in bioinformatics. Various grass-roots open-source projects are trying to ease the exchange and analysis of data produced with non-proprietary chips:

  • For example, the "Minimum Information About a Microarray Experiment" (MIAME) checklist helps define the level of detail that should exist and is being adopted by many journals as a requirement for the submission of papers incorporating microarray results. But MIAME does not describe the format for the information, so while many formats can support the MIAME requirements, as of 2007 no format permits verification of complete semantic compliance.
  • The "MicroArray Quality Control (MAQC) Project" is being conducted by the US Food and Drug Administration (FDA) to develop standards and quality control metrics which will eventually allow the use of MicroArray data in drug discovery, clinical practice and regulatory decision-making. [15]
  • The MicroArray and Gene Expression Data (MGED) group is working on the standardization of the representation of gene expression data and relevant annotations.

Statistical analysis

The analysis of DNA microarrays poses a large number of statistical problems, including the normalization of the data. There are dozens of proposed normalization methods in the published literature some of which are platform specific; as in many other cases where authorities disagree, a sound conservative approach is to try a number of popular normalization methods and compare the conclusions reached: how sensitive are the main conclusions to the method chosen?

Also, experimenters must account for multiple comparisons: even if the statistical P-value assigned to a gene indicates that it is extremely unlikely that differential expression of this gene was due to random rather than treatment effects, the very high number of genes on an array makes it likely that differential expression of some genes represent false positives or false negatives. Statistical methods tailored to microarray analyses have recently become available that assess statistical power based on the variation present in the data and the number of experimental replicates, and can help minimize type I and type II errors in the analyses.[16]

A basic difference between microarray data analysis and much traditional biomedical research is the dimensionality of the data. A large clinical study might collect 100 data items per patient for thousands of patients. A medium-size microarray study will obtain many thousands of numbers per sample for perhaps a hundred samples. Many analysis techniques treat each sample as a single point in a space with thousands of dimensions, then attempt by various techniques to reduce the dimensionality of the data to something humans can visualize. [17]

Relation between probe and gene

The relation between a probe and the mRNA that it is expected to detect is problematic. On the one hand, some mRNAs may cross-hybridize probes in the array that are supposed to detect another mRNA. On the other hand, probes that are designed to detect the mRNA of a particular gene may be relying on genomic EST information that is incorrectly associated with that gene.

Data Warehousing

Microarray data was found to be more useful when compared to other similar datasets. The sheer volume (in bytes), specialized formats (such as MIAME), and curation efforts associated with the datasets require specialized databases to store the data.

Detailed Microarray Experiment

steps involved in a microarray experiment (some steps omitted)
steps involved in a microarray experiment (some steps omitted)

This is an example of a DNA microarray experiment, detailing a particular case to better explain DNA microarray experiments, while enumerating possible alternatives.


  1. The two samples to be compared (pairwise comparison) are grown/acquired. In this example treated sample (case) and untreated sample (control).
  2. The nucleic acid of interest is purified: this can be all RNA for expression profiling, DNA for comparative hybridization, or DNA/RNA bound to a particular protein which is immunoprecipitated (ChIP-on-chip) for epigenetic or regulation studies. In this example total RNA is isolated (total as it is nuclear and cytoplasmic) by Guanidinium thiocyanate-phenol-chloroform extraction (e.g. Trizol) which isolates most RNA (whereas column methods have a cut off of 200 nucleotides) and if done correctly has a better purity.
  3. The purified RNA is analysed for quality (by capillary electrophoresis) and quantity (by using a nanodrop spectrometer): if enough material (>1μg) is present the experiment can continue.
  4. The labelled product is generated via reverse transcription and sometimes with an optional PCR amplification. the RNA is reverse transcribed with either polyT primers which amplify only mRNA or random primers which amplify all RNA which is mostly rRNA, miRNA microarray ligate an oligonucleotide to the purified small RNA (isolated with a fractionator) and then RT and amplified. The label is added either in the RT step or in an additional step after amplification if present.The sense that is labelled depends on the microarray, which means that if the label is added with the RT mix, the cDNA is on the template strand while the probes on the sense strand (unless they are negative controls). The label is typically fluorescent, only one machine uses radiolabels.The labelling can be direct (not used) or indirect which requires a coupling stage. The coupling stage can occur before hybridization (two-channel arrays) using aminoallyl-UTP and NHS amino-reactive dyes (like cyanine dyes) or after (single-channel arrays) using biotin and labelled streptavin. The modified nucleotides (typically a 1 aaUTP: 4 TTP mix) are added enzymatically at a lower rate compared to normal nucleotides, typically resulting in 1 every 60 bases (measured with a spectrophotometer). The aaDNA is then purified with a column (using solution containing phosphate buffer as Tris contains amine groups). The aminoallyl group is an amine group on a long linker attached to the nucleobase, which reacts with a reactive dye. A dye flip is a type of replicate done to remove any Dye effects in two-channel dyes, in one slide one same is labeled with cy3 the other with cy5, this is reversed in a different slide. In this example, in the presence of aminoallyl-UTP added in the RT mix
  5. The labeled samples are then mixed with a propriety hybridization solution which may contain SDS, SSC, dextran sulfate, a blocking agent (such as COT1 DNA, salmon sperm DNA, calf thymum DNA, PolyA or PolyT), Denhardt's solution and formamine
  6. This mix is denatured and added to a pin hole in a microarray, which can be a gene chip (holes in the back) or a glass microarray which is bound by a cover, called a mixer containing two pinholes and sealed with the slide at the perimeter
  7. The holes are sealed and the microarray hybridized, either in a hyb oven, where the microarray is mixed by rotation, or in a mixer, where the microarray is mixed by alternating pressure at the pinholes.
  8. After an overnight hybridization, all no specific binding is washed off (SDS and SSC)
  9. The microarray is dried and scanner is a special machine where a laser exits the dye and a detector measures its emission
  10. The image is gridded with a template and the intensities of the features (several pixels make a feature) are quantified
  11. The raw data is normalized, the simplest way is to subtract the background intensity and then divide the intensities making either the total intensity of the features on each channel equal or the intensities of a reference gene and then the t-value for all the intensities is calculated. More sophisticated methods, include z-ratio, loess and lowess regression and RMA (robust multichip analysis) for Affymetrix chips (single-channel, silicon chip, in situ synthesised short oligonucleotides).

References

Lab protocols found on microarray labs: [1][2] [3][4]

Microarray Databases

The term microarray database is usually used to describe a repository containing microarray gene expression data. The key features of a microarray database are to store the measurement data, manage a searchable index, and make the data available to other applications for analysis and interpretation (either directly, or via user downloads).

Microarray databases can fall into two distinct classes:

  1. A peer reviewed, public repository that adheres to academic or industry standards and is designed to be used by many analysis applications and groups. A good example of this is the Gene Expression Omnibus from NCBI or ArrayExpress from EBI.
  2. A specialized repository associated primarily with the brand of a particular entity (lab, company, university, consortium, group), an application suite, a topic, or an analysis method, whether it is commercial, non-profit, or academic. These databases may be characterized by:
    • A subscription or license may be needed to gain full access,
    • The content may come primarily from a specific group (e.g. SMD, or UPSC-BASE),
    • There may be limits on how who can use the data, and for what purpose,
    • Special permission may be required to submit new data, or there may be no obvious process at all,
    • Only certain applications may be equipped to use the data, often also associated with the same entity (for example, caArray at NCI is specialized for the caBIG),
    • Further processing or reformatting of the data may be required for standard applications or analysis,
    • They claim to address the 'urgent need' to have a standard, centralized repository for microarray data. (See YMD, last updated in 2003, for example),
    • There is a claim to an incremental improvement over one of the public repositories,
    • A meta-analysis application, which claims to be a database, but instead incorporates studies from one or more public databases (e.g. Gemma uses GEO studies)

Some of the most known public, curated microarray databases are:


Database  ↓ Scope  ↓ Microarray experiment sets  ↓ Sample profiles  ↓ As of date  ↓
Gene Expression Omnibus - NCBI any curated MIAME compliant molecular abundance study 8094 205148 March 11, 2008
Stanford Microarray database  ?? 12742  ? April 1, 2007
ArrayExpress at EBI Any curated MIAME or MINSEQE compliant ranscriptomics data 4194 110731 Mai, 2008
UPenn RAD database MIAMI compliant public and private studies, associated with ArrayExpress ~100 ~2500 Sept. 1, 2007
UNC Microarray database  ?? ~31 2093 April 1, 2007
MUSC database  ?? ~45 555 April 1, 2007
caArray at NCI Cancer data, prepared for analysis on caBIG 41 1741 November 15, 2006
UPSC-BASE data generated by microarray analysis within Umeå Plant Science Centre (UPSC). ~100  ? November 15, 2007
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "DNA Microarray"

Most Popular

Bioinformatics Introduction

Sequence Alignment

Sequence Database

Phylogenetics

Protein Structure Prediction


Bioinformatics Books

































Site Map   About Us

Comments and inquiries could be addressed to:
webmaster@juliantrubin.com


Last updated: July 2008
Copyright © 2003-2008 Julian Rubin