|
Analysis of gene expression
The expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), or various applications of multiplexed in-situ hybridization. All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in high-throughput gene expression studies. Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells.
Analysis of regulation
Regulation is the complex orchestration of events starting with an extra-cellular signal and ultimately leading to an increase or decrease in the activity of one or more protein molecules. Bioinformatics techniques have been applied to explore various steps in this process. For example, promoter analysis involves the elucidation and study of sequence motifs in the genomic region surrounding the coding region of a gene. These motifs influence the extent to which that region is transcribed into mRNA. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about the genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle, along with various stress conditions (heat shock, starvation, etc.). One can then apply clustering algorithms to that expression data to determine which genes are co-expressed. For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements.
Analysis of protein expression
Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected.
Analysis of mutations in cancer
Massive sequencing efforts are currently underway to identify point mutations in a variety of genes in cancer. The sheer volume of data produced requires automated systems to read sequence data, and to compare the sequencing results to the known sequence of the human genome, including known germline polymorphisms.
Oligonucleotide microarrays, including comparative genomic hybridization and single nucleotide polymorphism arrays, able to probe simultaneously up to several hundred thousand sites throughout the genome are being used to identify chromosomal gains and losses in cancer. Hidden Markov model and change-point analysis methods are being developed to infer real copy number changes from often noisy data. Further informatics approaches are being developed to understand the implications of lesions found to be recurrent across many tumors.
Some modern tools (e.g. Quantum 3.1 ) provide tool for changing the protein sequence at specific sites through alterations to its amino acids and predict changes in the bioactivity after mutations.
Major Research Areas Sequence analysis Prediction of protein structure
|