Word Clouds for Prokaryotic Genomes
An example: word-cloud for NC_013771.ptt
Mouse over word to see frequency.
This is my implementation of a word-cloud visualizer for bacterial genomes. It intakes an annotation file (NCBI GenBank gbk or ptt formats) , so technically a proteom annotation, and computes the frequency of words in the annotation column, and then outputs a WordCloud for display. In the word-cloud, you can get some clues on what kind of genes (proteins) are most abundent in that genome (proteom).

For information about the GenBank (.gbk) file format, please refer to this. Protein Table Files (.ptt), which you can usually find them when you download bacterial genomes from NCBI's ftp site. NCBI's ftp site.
An example of a .ptt file can be found here : NC_013771.ptt.

Please note that I used Daniel Barsotti's code for generating word-clouds.

Please upload a *.ptt or *.gbk / *.gb format file:

Show words.  
Word color:   BG:

By Minli Xu (whoji), Last Update: 08/09/2012