Structures and Sequences

Supplementary materials to the paper: S. Mitrofanov, A. Panchin, S. Spirin, A. Alexeevski, Y. Panchin. Exclusive sequences of different genomes. J. Bioinform. Comput. Biol. 2010. 8(3):519–534

  • Genomes (xls file, 150 KB)
    List of analyzed genomes: 139 full genomes, 35 masked genomes, 33 CDS of genomes
  • Verification (xls file, 70 MB)
    Verification of downloaded genome data: genome size in bp is shown, occurencies of each word of 1-7 letters and its complement are compared for all genomes
  • Contrasts_K (xls file, 55 MB)
    Contrasts of words of 1-7 letters in full genomes according the method of Karlin et al.
  • Taxonomy specific exclusive words (xls file, 18 MB)
    Words of 1-7 letters in full genomes that are under/over-represented in 50% or more genomes from a wide taxonomic group
  • Values C , computed according Markov model of rank (n-2), in several eukaryotic genomes (xls file, 350 KB)
    Explanations are given on "Legend" sheet of each file
    This data shows that almost all words of length 2,3,...,7 are highly statistically over-/underrepresented in eukaryotic genomes

Contex dependent mutation trends in the human genome

Comparison of methods of detection of exceptional sequences in prokaryotic genomes