Genomics and Computational Genomics
Genomics is the study of an organism's entire genome. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy
and other interactions between loci and alleles within the genome. In
contrast, the investigation of single genes, their functions and roles,
something very common in today's medical and biological research, and a
primary focus of molecular biology,
does not fall into the definition of genomics, unless the aim of this
genetic, pathway, and functional information analysis is to elucidate
its effect on, place in, and response to the entire genome's networks.
Computational genomics is the study of deciphering biology from genome sequences using computational analysis including both DNA and RNA. Computational genomics focuses on understanding the human genome, and more generally the principles of how DNA controls the biology of any species at the molecular level. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.
History
Genomics was established by Fred Sanger
when he first sequenced the complete genomes of a virus and a
mitochondrion. His group established techniques of sequencing, genome
mapping, data storage, and bioinformatic analyses in the 1970-1980s. A
major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics.
The actual term 'genomics' is thought to have been coined by Dr. Tom
Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) over
beer at a meeting held in Maryland on the mapping of the human genome
in 1986.
In 1972, Walter Fiers and his team at the Laboratory of Molecular Biology of the University of Ghent (Ghent, Belgium) were the first to determine the sequence of a gene: the gene for Bacteriophage MS2 coat protein.[1] In 1976, the team determined the complete nucleotide-sequence of bacteriophage MS2-RNA.[2] The first DNA-based genome to be sequenced in its entirety was that of bacteriophage Φ-X174; (5,368 bp), sequenced by Frederick Sanger in 1977[3]. The first free-living organism to be sequenced was that of Haemophilus influenzae (1.8 Mb) in 1995, and since then genomes are being sequenced at a rapid pace. A rough draft of the human genome was completed by the Human Genome Project in early 2001, creating much fanfare.
As of September 2007, the complete sequence was known of about 1879 viruses [4], 577 bacterial species and roughly 23 eukaryote organisms, of which about half are fungi. [5] Most of the bacteria whose genomes have been completely sequenced are problematic disease-causing agents, such as Haemophilus influenzae.
Of the other sequenced species, most were chosen because they were
well-studied model organisms or promised to become good models. Yeast (Saccharomyces cerevisiae) has long been an important model organism for the eukaryotic cell, while the fruit fly Drosophila melanogaster has been a very important tool (notably in early pre-molecular genetics). The worm Caenorhabditis elegans is an often used simple model for multicellular organisms. The zebrafish Brachydanio rerio is used for many developmental studies on the molecular level and the flower Arabidopsis thaliana is a model organism for flowering plants. The Japanese pufferfish (Takifugu rubripes) and the spotted green pufferfish (Tetraodon nigroviridis) are interesting because of their small and compact genomes, containing very little non-coding DNA compared to most species. [6] [7] The mammals dog (Canis familiaris), [8] brown rat (Rattus norvegicus), mouse (Mus musculus), and chimpanzee (Pan troglodytes) are all important model animals in medical research.
Bacteriophage genomics
Bacteriophages have played and continue to play a key role in bacterial genetics and molecular biology. Historically, they were used to define gene structure and gene regulation. Also the first genome to be sequenced was a bacteriophage.
However, bacteriophage research did not lead the genomics revolution,
which is clearly dominated by bacterial genomics. Only very recently
has the study of bacteriophage genomes become prominent, thereby
enabling researchers to understand the mechanisms underlying phage
evolution. Bacteriophage genome sequences can be obtained through
direct sequencing of isolated bacteriophages, but can also be derived
as part of microbial genomes. Analysis of bacterial genomes has shown
that a substantial amount of microbial DNA consists of prophage
sequences and prophage-like elements. A detailed database mining of
these sequences offers insights into the role of prophages in shaping
the bacterial genome.[9]
Cyanobacteria genomics
At present there are 24 cyanobacteria for which a total genome sequence is available. 15 of these cyanobacteria come from the marine environment. These are six Prochlorococcus strains, seven marine Synechococcus strains, Trichodesmium erythraeum IMS101 and Crocosphaera watsonii
[[WH8501. Several studies have demonstrated how these sequences could
be used very successfully to infer important ecological and
physiological characteristics of marine cyanobacteria. However, there
are many more genome projects currently in progress, amongst those
there are further Prochlorococcus and marine Synechococcus isolates, Acaryochloris and Prochloron, the N2-fixing filamentous cyanobacteria Nodularia spumigena, Lyngbya aestuarii and Lyngbya majuscula, as well as bacteriophages
infecting marine cyanobaceria. Thus, the growing body of genome
information can also be tapped in a more general way to address global
problems by applying a comparative approach. Some new and exciting
examples of progress in this field are the identification of genes for
regulatory RNAs, insights into the evolutionary origin of photosynthesis, or estimation of the contribution of horizontal gene transfer to the genomes that have been analyzed.[10]
References
- ^ Min
Jou W, Haegeman G, Ysebaert M, Fiers W., Nucleotide sequence of the
gene coding for the bacteriophage MS2 coat protein, Nature. 1972 May
12;237(5350):82-8
- ^ Fiers
W et al., Complete nucleotide-sequence of bacteriophage MS2-RNA -
primary and secondary structure of replicase gene, Nature, 260,
500-507, 1976
- ^ Sanger
F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA,
Slocombe PM, Smith M., Nucleotide sequence of bacteriophage phi X174
DNA, Nature. 1977 Feb 24;265(5596):687-95
- ^ The Viral Genomes Resource, NCBI Friday, 14 September, 2007
- ^ Genome Project Statistic, NCBI Friday, 14 September, 2007
- ^ BBC article Human gene number slashed from Wednesday, 20 October, 2004
- ^ CBSE News, Thursday October 16, 2003
- ^ NHGRI, pressrelease of the publishing of the dog genome
- ^ Mc Grath S and van Sinderen D (editors). (2007). Bacteriophage: Genetics and Molecular Biology, 1st ed., Caister Academic Press. ISBN 978-1-904455-14-1 .
- ^ Herrero A and Flores E (editor). (2008). The Cyanobacteria: Molecular Biology, Genomics and Evolution, 1st ed., Caister Academic Press. ISBN 978-1-904455-15-8 .
External links
Computational Genomics
Computational genomics is the study of deciphering biology from genome sequences using computational analysis.[1], including both DNA and RNA.
Computational genomics focuses on understanding the human genome, and
more generally the principles of how DNA controls the biology of any
species at the molecular level. With the current abundance of massive
biological datasets, computational studies have become one of the most
important means to biological discovery. [2]
History
Computational genomics began in spirit, if not in name, during the 1960s with the research of Margaret Dayhoff and others at the National Biomedical Research Foundation, who first assembled a database of protein sequences.[3] Their research developed a phylogenetic tree
that determined the evolutionary changes that were required for a
particular protein to change into another protein based on the
underlying amino acid sequences. This led them to create a scoring matrix that assessed the likelihood of one protein being related to another.
Beginning in the 1980s, databases of genome sequences began to be
recorded, but this presented new challenges in the form of searching
and comparing the databases of gene information. Unlike text-searching
algorithms that are used on websites such as google or Wikipedia,
searching for sections of genetic similarity requires one to find
strings that are not simply identical, but similar. This led to the
development of the Needleman-Wunsch algorithm, which is a dynamic programming
algorithm for comparing sets of amino acid sequences with each other by
using scoring matrices derived from the earlier research by Dayhoff.
Later, the BLAST
algorithm was developed for performing fast, optimized searches of gene
sequence databases. BLAST and its derivatives are probably the most
widely-used algorithms for this purpose. [4]
The first meeting of the Annual Conference on Computational Genomics
was in 1998, providing a forum for this speciality and effectively
distinguishing this area of science from the more general fields of Genomics or Computational Biology. [5] The first use of this term in scientific literature, according to MEDLINE abstracts, was just one year earlier in Nucleic Acids Research. [6].
The development of computer-assisted mathematics (using products such as Mathematica or Matlab)
has helped engineers, mathematicians and computer scientists to start
operating in this domain, and a public collection of case studies and
demonstrations is growing, ranging from whole genome comparisons to
gene expression analysis. [7].
This has increased the introduction of different ideas, including
concepts from systems and control, information theory, strings analysis
and data mining. It is anticipated that computational approaches will
become and remain a standard topic for research and teaching, while
students fluent in both topics start being formed in the multiple
courses created in the past few years.
Contributions of computational genomics research to biology
Contributions of computational genomics research to biology include [2]:
- discovering subtle patterns in genomic sequences
- proposing cellular signalling networks
- proposing mechanisms of genome evolution
- predict precise locations of all human genes using [comparative genomics] techniques with several mammalian and vertebrate species
- predict conserved genomic regions that are related to early embryonic development
- discover potential links between repeated sequence motifs and tissue-specific gene expression
- measure regions of genomes that have undergone unusually rapid evolution
References
- ^ Koonin
EV (2001) Computational Genomics, National Center for Biotechnology
Information, National Library of Medicine, NIH (PubMed ID: 11267880)
- ^ a b Computational Genomics and Proteomics at MIT
- ^ David Mount (2000), Bioinformatics, Sequence and Genome Analysis, pp. 2-3, Cold Spring Harbor Laboratory Press, ISBN 0-87969-597-8
- ^ T.A. Brown (1999), Genomes, John Wiley & Sons, ISBN 0-471-31618-0
- ^ The 9th Annual Conference on Computational Genomics (2006) [1]
- ^ A. Wagner (1997), A computational genomics approach to the identification of gene networks, Nucleic Acids Res., Sep 15;25(18):3594-604, ISSN 0305-1048
- ^ Cristianini, N. and Hahn, M. Introduction to Computational Genomics, Cambridge University Press, 2006. (ISBN-13: 9780521671910 | ISBN-10: 0521671914)
External links
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Genomics"
|
|