DNA Sequences

This algorithm lets you convert DNA letters into numbers. Below is a link to a government Web site which contains lists of DNA sequences you might find interesting:

Instructions:
Users can input any DNA sequence of A's, T's G's and C's. Some DNA sequences have N's, which must be removed. Users can search for DNA sequences with keywords at the National Center for Biotechnology Information site, a public database for molecular biology information. The repository contains nucleotides, protein sequences, protein structures, complete genomes, and information on taxonomy.

Step 1
Place your keyword search from the "Entrez" browser. The Nucleotide GenBank database should provide the most hits.

Step 2
Locate a gene, or gene fragment.

Step 3
Copy and paste the sequence into the DNA sequences input box at the present musicalgorithms site. The algorithm will observe only the letters (except N's) and ignore any numbers. If a user wants to insert a long DNA sequence with a dial-up modem connection, it would be advisable to use only 200 letter segments at a time. Otherwise, the program may work too slowly.

Description: DNA (Genetics)
Deoxyribonucleic acid (DNA) represents the most fundamental structure for which genes are composed. A strand of DNA consists of four basic molecules called nucleotides: A,T, G, C. When combined as a sequence, the nucleotides can represent a set of genetic instructions for the development of all cellular forms of life. The molecules are linked as pairs entwined in a double helix forming a chains of DNA strands. Some triplet base pairs of nucleotides are called codons. Codons along with other nucleotides and enzymes generate amino acids - the basic building blocks for proteins. The "expression" of these proteins is encoded ("written") in genes, in other words a gene is a DNA sequence (a string of nucleotides) that generates polypeptides and proteins. The genes provide a genetic code for proteins that an organism can "express." Genes are responsible for defining a species and making individualistic traits. The DNA structure was discovered in 1953 by Watson and Crick. The Human genome (entire DNA sequence) was completed in 2000 at a cost of over 3 billion dollars. Homo sapiens have 23 chromosome pairs with a total of 3 x 10 to the power 9 pairs of DNA molecules (base pairs) [this is how much is in each of the pairs, there are twice this number of bases in the 46 chromosomes possessed by each cell of a human]. [just a small portion of this DNA is used to code for proteins] Humans have approximately 30,000 to 35,000 genes.

DNA sequences vary enormously due to the extent of possible combinations. Calculating all possible combinations of three letters in three spaces with the possibility of repeated letters (e.g., A,A,C...T,C,G) is expressed mathematically as 4 to the power 3, or 4x4x4, which is 64. How many possible combinations of four letters with the possibility of repeated letters are there with 4 spaces?

Have you ever wondered how a missing letter can affect a person's genes? Try playing a melody from codons, and then remove one letter from the input DNA sequence. The new output will produce a new set of codons, and a new melody is created.