CUTG: Codon Usage Tabulated from GenBank
README (Updated September 12 2007)

      Originally developed by:

      IKEMURA, Toshimichi ( t_ikemura At nagahama-i-bio.ac.jp )
      Professor, Nagahama Institute of Bio-Science and Technology

      Currently mainteined by:

      NAKAMURA, Yasukazu ( yn At kazusa.or.jp )
      Principal Investigator, Laboratory of Plant Genome Informatics,
      Department of Plant Genome Research,
      Kazusa DNA Research Institute
      2-6-7 Kazusakamatari, Kisarazu, Chiba 292-0818 JAPAN


  Codon usage in individual genes has been calculated using the
nucleotide sequence data obtained from the GenBank Genetic Sequence
Database.  The compilation of codon usage is synchronized with each
major release of GenBank.


* SOURCE AND METHODS

  Compiled from NCBI-GenBank Flat File Release 160.0 [June 15 2007].

  Compiled sequence files are, pri (primate sequence entries), rod
(rodent sequence entries), mam (other mammalian sequence entries), vrt
(other vertebrate sequence entries), inv (invertebrate sequence
entries), pln (plant sequence entries), bct (bacterial sequence
entries), vrl (viral sequence entries) and phg (phage sequence
entries).

  Other sequence files are not compiled: files for est (EST: expressed
sequence tag sequence entries), pat (patent sequence entries), rna
(Structural RNA sequence entries), sts (STS: sequence tagged site
sequence entries), syn (synthetic and chimeric sequence entries), una
(unanotated sequence entries) and gss (genome survey sequence
entries).

  In selecting protein coding sequences we relied on the FEATURES
tables of GenBank. Only complete genes were used in the analysis.
Codons containing ambiguous base (such as N) were excluded from the
compilation.  In GenBank, a group of consecutive genes whose entire
region had been sequenced were registered under one LOCUS name.  To
distinguish the different genes belonging to a single LOCUS, the
symbol # followed by a number is added after the LOCUS name; the
numbers represent the order of the CDS registered in the FEATURES
table of GenBank.


* FILES

  Files of the present database are available here.  Files named
gb***.codon list the codon use in each gene registered in the selected
GenBank Flat Files.  The LOCUS names given in GenBank were used to
designate individual genes.  Each LOCUS name is followed by fields of
information extracted from FEATURES of each CDS for defining each open
reading frames analyzed here.  The order of the codons in the table is
the same as in the previous compilation (see the CODON_LABEL file).

  To reveal the characteristics of codon use of a wide range of
organisms, as well as viruses and organella, the frequency (per
thousand) of codon use in each organism was calculated by summing up
numbers of codon used.  Files named gb***.spsum list the sum of
numbers of codon use in each species as well as viruses and organella
(see the SPSUM_LABEL file).

  The files are distributed in two forms.  One form is gzip-compressed
files, the other form is as flat files.  The file contains two "LABEL"
files, and all of "codon" and "spsum" files.  Use "gunzip" and "tar"
to extract files from the archive.  If you do not need entire
sections, you can download selected file from "compressed" directory,
such as gbbct.spsum.gz and gbbct.codon.gz for bacterial entries.  If
you do not have "gunzip" and "tar" in your local operating system, you
could fetch each file as flat text from this directory.


* DISTRIBUTION

  Complete form of the database is available from following three URLs:

  1) DDBJ (DNA Data Bank of Japan,
        National Institute of Genetics, Mishima Japan)

  ftp://ftp.nig.ac.jp/pub/db/codon/current/

  2) Kazusa DNA Research Institute

  ftp://ftp.kazusa.or.jp/pub/codon/current/

  3) EBI (European Bioinformatics Institute, Cambridge, UK)

  ftp://ftp.ebi.ac.uk/pub/databases/cutg/


* RELATED SERVICE ON WWW

  If you need not all data, but want to obtain codon usage tables for
small number of species, use Codon Usage Database WWW service.  A user
can display a codon usage table by searching with the Latin name of
the organisms or clicking on an anchor for alphabetical lists.

  http://www.kazusa.or.jp/codon/


* ACCESS FOR MAINTENER

  Any requests or commnets?  Send an E-mail to yn@kazusa.or.jp.


* ACKNOWLEDGMENT

  This work  was suported by a grant-in-Aid for Publication of Science
Research Results from Japan Society for the Promotion of Science (JSPS).


* PLEASE CITE

Codon usage tabulated from the international DNA sequence databases:
status for the year 2000.  Nakamura, Y., Gojobori, T. and Ikemura, T.
(2000) Nucl. Acids Res. 28, 292.

This article gives references to earlier papers.
