History of Bioinformatics

From Bioinformatics.ws

Jump to: navigation, search


History of Bioinformatics


Jong Bhak


j@bio.cc, KOBIC, KRIBB, Daejeon, Korea, +82 42 879 8500



Refer this as

J. Park, (2003), The history of Bioinformatics, BiO On-line publication. UniqueBioPaperNumber (UBIPAN): BiO20030320.000001 http://bioinformatics.ws/index.php/History_of_Bioinformatics

Publication Date

2003. March. 20th.

Paper Type

non-research paper.

Intellectual Property

(B) Biolicense. Please refer to the above URL for reference.

Related Papers

History of Biology



Modern bioinformatics is broadly comprised of two main disciplines. One is biological science and the other is computer science. Understanding the history of any academic discipline lets the new learners have a more wider and correct insight toward their research. Here, a succinct chronological data of historical events for both biology and computer science are presented.



The history of biology in general, B.C. and before the discovery of genetic inheritance by G. Mendel in 1865, is extremely sketch and inaccurate. Also, there is a great bias toward the western civilization. Therefore, this part of the history should be viewed as an extremely rough guide to show how much pre-biology people knew about life. The advancement of computing in 1960-70s resulted in the basic methodology of bioinformatics. However, it is the 1990s when the INTERNET arrived when the full fledged bioinformatics field was born.



1843: Richard Owen elaborated the distinction of homology and analogy.

1850-1855: Jean-Baptiste Boussingault, who had proved that the carbon in plants came from atmospheric CO2, proposes that plant nitrogen comes from the soil. demonstrates that higher plants cannot utilize atmospheric nitrogen, but only nitrates from the soil. He also demonstrates the necessity of nitrogen for plants and animals. His experimental results were not conclusive, however, and conflicting data were soon published by another Parisian chemist, Ville, and popularized by Liebig. The question he resolved was whether the nitrogen that plants need to grow came from the soil or from the air. Joseph Priestley had argued, in the 18th century, in favor of the air, and his opinion was seconded in the early 19th century, by Liebig, then the world's most famous chemist.

1855: Alfred Russell Wallace publishes On the Law Which Has Regulated the Introduction of New Species

1858: Charles Darwin and Alfred Wallace publish papers on theory of evolution.

1859: Charles Darwin, Cambridge, UK, publishes The Origin of Species, vastly strengthening the adaptationist hypothesis.

1864: Ernst Haeckel (Häckel) outlines the essential elements of modern zoological classification

1865: Gregory Mendel (1823-1884), Austria, established the genetic inheritance. The theoretical study of genetics. Experiments in Plant Hybridisation. Cambridge, MA: Harvard University Press. His work, in German, was first published in 1865 in the Proceedings of the Brünn Society for Natural History, Brünn, Austria (Hewlett, 1998). It was ignored for a generation.

1868: Friedrich Miescher - discovery of nuclein found in cell nucleus, acidic, rich in PO4,  lacks S (characteristic of protein). Now know this as nucleic acid

1902: The chromosome theory of heredity is proposed by Sutton and Boveri, working independently.

1905: The word "genetics" is coined by William Bateson.

1913: First ever linkage map created by Columbia undergraduate Alfred Sturtevant (working with T.H. Morgan).

1918-1926: Muller, Hermann J. (1962). Studies in Genetics. [His seminal paper on X-rays, from 1927, may be present in this collection.] The gene constitutes the basis of life and evolution by virtue of its property of reproducing its own internal changes

1930: Tiselius, Uppsala University, Sweden, A new technique, electrophoresis, is introduced by Tiselius for separating proteins in solution. "The moving-boundary method of studying the electrophoresis of proteins" (published in Nova Acta Regiae Societatis Scientiarum Upsaliensis, Ser. IV, Vol. 7, No. 4)

1930s: Chemical nature of nuclei acid  investigated. It was thought to be a tetranucleotide composed of one unit each of adenylic, guanylic, thymidylic and cytidylic acids 

1933: Electrophoresis was introduced by Tiselius for separating proteins in solution.

: Alan Turing, Cambridge University, The Turing machine, computability, universal machine

1941: Beadle and Tatum. Genetic Control of Biochemical Reactions in Neurospora: First sound scientific evidence for one-gene-one-enzyme hypothesis

1944: Oswald Avery identifies nucleic acids as the active principle in bacterial transformation. Avery, O. T., C. M. MacLeod, and M. McCarty (1944). Studies on the Chemical Nature of Substance Inducing Transformation of Pneumococcal Typoes.  Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III.  Journal of Experimental Medicine 79: 137-158. Also in Peters (1959).  Oswald Avery (1877-1955) was a bacteriologist whose research on pneumococcus bacteria made him one of the founders of immunochemistry and laid the foundation for later discoveries that launched the science of molecular genetics. 

1945: John von Neumann, Princeton University, USA, First Draft of a Report on the EDVAC, Contract No. W-670-ORD-492, Moore School of Electrical Engineering, Univ. of Penn., Philadelphia. Reprinted (in part) in Randell, Brian. 1982. Origins of Digital Computers: Selected Papers, Springer-Verlag, Berlin Heidelberg, pp. 383-392.

1946: Genetic material can be transferred laterally between bacterial cells, as shown by Lederberg and Tatum.

1948: Information Theory Claude Shannon

1950: Erwin Chargaff shows that the four nucleotides are not present in nucleic acids in stable proportions, and that the nucleotide composition differs according to its biological source. Chargaff, Erwin, ed. (1955-60). The Nucleic Acids: Chemistry and Biology. New York, Academic Press.

1951: Pauling and Corey propose the structure for the alpha-helix and beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl. Acad. Sci. USA, 37: 729-740, 1951).

1952: Alfred Day Hershey and Martha Chase proved, on the basis of their bacteriophage research, that DNA alone carries genetic information.

1953: James Dewey Watson and Francis Harry Compton Crick , Cambridge, UK, propose the double helix model for DNA based on x-ray data obtained by Franklin and Wilkins (Nature, 171: 737-738, 1953).

1953: Frederick Sanger, E. O. P. Thompson and Hans Tuppy completed the determination of the amino acid sequence of the A and B chains of insulin. Cambridge, UK.

1954: Max Perutz's group in Cambridge UK develops heavy atom methods to solve the phase problem in protein crystallography.

1956: Christian Boehmer Anfinsen and White concluded that the three-dimensional conformation of proteins is specified by their amino acid sequence.

1957: Seymour Benzer introduced the concept of the cistron: the smallest unit of function of the gene.

1958: The first integrated circuit is constructed by Jack Kilby at Texas Instruments.

1958: The Advanced Research Projects Agency (ARPA) is formed in the US.

1958: Francis Harry Compton Crick, Cambridge, UK, enunciated the central dogma of molecular genetics: information flows from DNA to RNA to protein.

1960: Fran?is Jacob and Jacques Lucien Monod proposed the operon hypothesis for the regulation of enzyme synthesis.

1961: Sidney Brenner, François Jacob, Matthew Meselson, identify messenger RNA,

1961-1965: The laboratories of Robert William Holley, Marshall Warren Nirenberg, Har Gobind Khorana and Severo Ochoa identified the genetic code words for the amino acids.

1965: Margaret Dayhoff's The first Atlas of Protein Sequence and Structure, which contained sequence information on 65 proteins.

1967: W.M. Fitch and E. Margoliash calculated the phylogenetic relationships of twenty organisms, ranging from fungi to mammals, by comparing their cytochrome C amino acid sequences.

1968: Packet-switching network protocols are presented to ARPA.

1968: Kimura, M. Evolutionary rate at the molecular level. Nature 217 (1968) 624-626.

1969: The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.

1970s: Fred Sanger, Cambridge UK, developed deoxy DNA sequencing method.

1970: Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443-53.

1970: Fitch, W. M. Distinguishing homologous from analogous proteins. Syst Zool (1970) 19:99-113.

1970: The first restriction enzyme was isolated.

1971: Lynn Margulis proposed an endosymbiont theory for the origins of eucaryotic organelles.

1971: Ray Tomlinson (BBN) invents the email program.

1971: Medline. NIH.

1972: Lee and Fred Richards calculates the accessibility of protein structures.

1972: The first recombinant DNA molecule is created by Paul Berg and his group.

1973: The Brookhaven Protein Data Bank is announced (Acta. Cryst. B, 1973, 29: 1746).

1973: Robert Metcalfe receives his Ph.D. from Harvard University. His thesis describes Ethernet.

1974: Langley, C.H. and Fitch, W.M., An examination of the constancy of the rate of molecular evolution. J. Mol. Evol. 3 (1974) 161-177.

1974: Vint Cerf and Robert Kahn develop the concept of connecting networks of computers into an "internet" and develop the Transmission Control Protocol (TCP).

1974: Charles Goldfarb invents SGML (Standardized General Markup Language).

1974: Chothia, C Hydrophobic bonding and accessible surface area in proteins. Nature 1974 Mar 22;248(446):338-9

1974: Chou PY, Fasman GD. Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974 Jan 15;13(2):211-22.

1975: [[Microsoft Corporation]] is founded by Bill Gates and Paul Allen.

1975: Tanaka and Scheraga publishes on simulating protein folding

1975: Cesar Milstein group's Monoclonal antibodies are produced

1975: King and Wilson, suggests the difference between Chimpanzee and humans is small. King, M.C. and A.C. Wilson (1975). Evolution at two levels in Humans and Chimpanzees. Science 188: 107-116. 

For an update on the topic, see Gibbons 1998; for recent work on multiple transcriptional controls, see Tijan and Holmes 2000.

1975: Two-dimensional electrophoresis, where separation of proteins on SDS polyacrylamide gel is combined with separation according to isoelectric points, is announced by P. H. O'Farrell (J. Biol. Chem., 250: 4007-4021, 1975).

1975: E. M. Southern published the experimental details for the Southern Blot technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517, 1975).

1976:  The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs. Dr. Robert M. Metcalfe develops Ethernet, which allowed coaxial cable to move data extremely fast. This was a crucial component to the development of LANs. The packet satellite project went into practical use. SATNET, Atlantic packet Satellite network, was born. This network linked the United States with Europe.Surprisingly, it used INTELSAT satellites that were owned by a consortium of countries and not exclusively the United States government. UUCP (Unix-to-Unix CoPy) developed at AT&T Bell Labs and distributed with UNIX one year later. The Department of Defense began to experiment with the TCP/IP protocol and soon decided to require it for use on ARPANET.

1977: Staden programs. DNA sequence analysis software. Published in NAR. Roger Staden, MRC, LMB, Cambridge, UK

1977: The full description of the Brookhaven PDB (http://www.pdb.bnl.gov) is published (Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol. Biol., 1977, 112:, 535).

1977: Procedures were developed for rapidly sequencing long sections of DNA.

1978:  The first Usenet connection is established between Duke and the University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis and Steve Bellovin.

1979: Goodman, M., Cselusniak, J., Moore, G. W., Romero-Herrera, A. E., and Matsuda, G. Fitting the gene lineage into its species lineage: A parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. (1979) 28:132-168.

1980: The first complete genome sequence for virus (pi-x 174) by Sanger group Cambridge, UK, is published. The gene consists of 5,386 base pairs which code nine proteins.

1980: Wüthrich et. al. publish paper detailing the use of multi-dimensional NMR for protein structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem. Biophys. Res. Comm., 1980, 95:, 1).

1981:  The Smith-Waterman algorithm for sequence alignment is published. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7.

1981: Sequence motif, Russell Doolittle.

1981: IBM introduces its Personal Computer to the market.

1981: Felsenstein, J. Evolutionary Trees from DNA-Sequences - a Maximum-Likelihood Approach. J. Mol. Evol. (1981) 17:368-376. (hardcopy available

1982: Genetics Computer Group (GCG) created as a part of the University of Wisconsin of Wisconsin Biotechnology Center. The company's primary product is The Wisconsin Suite of molecular biology tools.

1982:  GenBank LANL/EMBL/NCBI

1983: The Compact Disk (CD) is launched.

1983: Name servers are developed at the University of Wisconsin.

1983: Kary B. Mullis invents the polymerase chain reaction (PCR), a method for rapidly and easily cloning DNA fragments.

1984: Jon Postel's Domain Name System (DNS) is placed on-line.

1984: The Macintosh is announced by Apple Computer.

1984: Kabsch and Chris Sander model building by homology and structure prediction.

1985: The FASTP algorithm by Bill Pearson is published.

1985: The PCR reaction is described by Kary Mullis and co-workers.

1985: Richard Stallman's Open Software Foundation.

1986: Cyrus Chothia and Arthur Lesk examine the divergence between sequence and structure.

1986: The SWISS-PROT database is created by the Department of Medical Biochemistry of the University of Geneva and the European Molecular Biology Laboratory (EMBL).

1987: The use of yeast artifical chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).

1987: McClintock, Barbara (1987). The Discovery and Characterization of Transposable Elements: The Collected Papers of Barbara McClintock. New York: Garland, 1987.  In her 1983 Nobel lecture, McClintock said the genome is "a highly sensitive organ of the cell, that in times of stress could initiate its own restructuring and renovation." See the biography at the Cold Springs Harbor site (external). For a current discussion, see Pennisi 1998

1987: The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).

1987: Feng and Doolittle: the first approach fro an efficient multiple sequence alignment procedure that was later used in Clustal

1987: Perl (Practical Extraction Report Language) is released by Larry Wall.

1988: The National Center for Biotechnology Information (NCBI) is established at the National Cancer Institute.

1988: DNA Strider Christian Marck

1988: The Human Genome Initiative is started (Commission on Life Sciences, National Research Council. Mapping and Sequencing the Human Genome, National Academy Press: Washington, D.C.), 1988.

1988: The FASTA algorithm for sequence comparison is published by Pearson and Lupman.

1989:  The Genetics Computer Group (GCG) becomes a private company.

1989: Oxford Molecular Group, Ltd. (OMG) founded in Oxford, UK by Anthony Marchington, David Ricketts, James Hiddleston, Anthony Rees, and W. Graham Richards. Primary products: Anaconda, Asp, Cameleon and others (molecular modeling, drug design, protein design).

1990: The BLAST program (Altschul, et. al.) is implemented. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.

1990: The HTTP 1.0 specification is published. Tim Berners-Lee publishes the first HTML document. Merit, IBM and MCI formed a not for profit corporation called ANS, Advanced Network & Services, which was to conduct research into high speed networking. It soon came up with the concept of the T3, a 45 Mbps line. NSF quickly adopted the new network and by the end of 1991 all of its sites were connected by this new backbone. While the T3 lines were being constructed, the Department of Defense disbanded the ARPANET and it was replaced by the NSFNET backbone. The original 50Kbs lines of ARPANET were taken out of service. Tim Berners-Lee and CERN in Geneva implements a hypertext system to provide efficient information access to the members of the international high-energy physics community.

1991: Linus Torvalds announces a Unix-Like operating system which later becomes Linux.

1991: Bowie et al., The first implementation of protein structure prediction using threading

1991: The creation and use of expressed sequence tags (ESTs) is described (J. Craig Venter, et. al., Science, 252: 1651-1656).

1992: FSSP the global protein structural family database published by Liisa Holm et al., Protein Sci 1992 Dec;1(12):1691-1698 A database of protein structure families with common folding motifs. Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G

1992: Cyrus Chothia, Cambridge UK, suggests approximate number of protein families to be 1000. Nature, 1992, June, 357, 543-544 Proteins. One thousand families for the molecular biologist.

1992: Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86-9, PubMed

1993: Dali was published in JMB by Liisa Holm and Christ Sander. J Mol Biol 1993 Sep 5;233(1):123-138 Protein structure comparison by alignment of distance matrices. Holm L, Sander C.

1993: Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584-99, PubMed.

1993: InterNIC created by NSF to provide specific Internet services: directory and database services (by AT&T), registration services (by Network Solutions Inc.), and information services (by General Atomics/CERFnet). Marc Andreessen and NCSA and the University of Illinois develops a graphical user interface to the WWW, called "Mosaic for X".

1993: Hidden Markov Model based algorithm popularized.

1993: Affymetrix begins independent operations in Santa Clara, California

1993: Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993, 262(5131), 208-14. PubMed

1994: The first CASP (protein structure prediction meeting) held at Asilomar, California. Hidden Markov Model, Interative search method, Threading method were successful in predicting protein structures.

1994: DNA computer Leonard Adelman

The first free-living organism Haemophilus influenzea genome (1.8 Mb) is sequenced.

1995: SCOP data base published. (structural classification of proteins).

1995: The smallest free-living organism Mycoplasma genitalium genome is sequenced.

1995: The first open-community BioPerl project (with other sister projects BioJava, BioLinux, etc) in bioinformatics initiated by Jong Bhak and Steve Brenner, Cambridge, MRC Centre, UK (Bioperl)

1996: The genome for Saccharomyces cerevisiae (baker's yeast, 12.1 Mb) is sequenced.

1996-1997: The first cloning of a mammal (Dolly the sheep) is performed by Ian Wilmut and colleagues, from the Roslin institute in Scotland.

1996: Affymetrix produces the first commercial DNA chips.

1997: The genome for E. coli (4.7 Mbp) is published.

1997: Intermediate Sequence Search method by J. Park, et al., proving the validity of homology transitivity in sequence searches by using structural homology benchmark set that was based on SCOP.

1997: Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997 Oct 3;278(5335):82-7. PubMed

1997: PSI-BLAST algorithm was published. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Domains Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. PubMed

1998: The genomes for Caenorhabditis elegans and baker's yeast are published.

1998: Complete genomes show extensive gene/protein sequence/structure duplication. Teichmann etc. al. PNAS.

1998: Proving that multiple sequence based sequence search algorithms (use much more homology information than pairwise methods. J. Park, et al.

1998: Inpharmatica, a new Genomics and Bioinformatics company, is established by University College London, the Wolfson Institute for Biomedical Research, five leading scientists from major British academic centers and Unibio Limited.

1999: Protein Structural Interactome Map: PSIMAP including the first full genome interaction network using PDB and yeast two hybrid system was created by Liisa Holm group members, EBI, Cambridge, UK ( J Park, Liisa Holm, Michael Lappe) and S Teichmann. It is the first phylogenetic interaction network. The first map using protein Domains. The first global interaction network.

1999: Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999 Jul 30;285(5428):751-3. PubMed

1999: Bush, R. M., Bender, C. A., Subbarao, K., Cox, N. J., and Fitch, W. M. Predicting the evolution of human influenza A. Science (1999) 286:1921-1925.

1999: Barabasi AL, Albert R. Emergence of scaling in random networks. Science 1999 Oct 15;286(5439):509-12, PubMed

2000: Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature 2000 Oct 5;407(6804):651-4, PubMed

2000:  The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.

2000: The A. thaliana genome (100 Mb) is secquenced.

2000: The D. melanogaster genome (180Mb) is secquenced.

2001: The human genome (3 Giga base pairs) is published.

2002: An international sequencing consortium published the full genome sequence of the common house mouse (2.5 Gb). \



2005: PANASIA SNP initiative working on Asian population diversity.





Online References:

The hisotyr of internet: http://www.davesite.com/webstation/net-history.shtml

Allen B. Richon, E-mail: arichon@netsci.org http://www.netsci.org/Science/Bioinform/feature06.html

Internet hisotyr: http://members.magnet.at/dmayr/history.htm

Biological hisotory to 1953: http://www.mun.ca/biology/scarr/2250_History.htm

Long history of biology: http://www.crevola.com/laurent/sitelolo/histoire/historybc.html

http://cumicro2.cpmc.columbia.edu/icb/: http://cumicro2.cpmc.columbia.edu/icb/Lecture%201.pdf, jovanovic@cancercenter.columbia.edu <jovanovic@cancercenter.columbia.edu>

Classical papers in bioinformatics: http://www.sbc.su.se/~per/classics-bioinfo/

About Darwinism: http://www.aboutdarwin.com/literature/Pre_Dar.html

Theoretical Biology: http://www.zbi.ee/~uexkull/theor.htm

John Blamire: http://www.brooklyn.cuny.edu/bc/ahp/MBG/MBG3/MBG.C3.Question.html


Off-line References

J. Cairns, G. Stent, & J. Watson (1966). Phage and the Origins of Molecular Biology. Freeman.
        [Biographical essays on the early days by the founders of molecular genetics]

F. H. C. Crick (1988). What Mad Pursuit? Basic Books.
        [Crick's version of the 'double helix' history, and lots more]

L. Gonick & M. Wheelis (1991). The Cartoon Guide to Genetics, 2nd ed. Harper Collins.
        [Great illustrations: a good primer of basic Mendelian and molecular genetics]

H. F. Judson (1979). The Eighth Day of Creation. Simon & Schuster.
        [A general history of molecular biology]

A. Sayre (1975). Rosalind Franklin and DNA. Norton.
        [A re-appraisal of the role of Franklin, with commentary on the role of women in science]

G. Stent  (1971). Molecular Genetics: an introductory narrative.  Freeman.
        [A classic, now factually dated textbook, still highly readable]

J. D. Watson (1968). The Double Helix. Atheneum.
        [An entertaining, irreverent, sexist, account of the discovery of the structure of DNA.
         See the accounts of Crick and Sayre for an antidote]

History of Genetics: From Prehistoric Times to the Rediscovery of Mendel's Laws by Hans Stubbe (MIT press, out of print)

A History of Genetics by Alfred Sturtevant

The Eighth Day of Creation by Horace Judson (focus on molecular biology)

The Century of the Gene by Evelyn Fox Keller

Cracking the Genome : Inside the Race to Unlock Human DNA by Kevin Davies





Personal tools
Google AdSense