A simple way to analyze the data is in terms of an association coefficient S-sub-AB, which is defined as twice the number of nucleotides in the words common to both of the dictionaries A and B divided by the number of nucleotides in all words in the two dictionaries. S-sub-AB ranges from 1 when dictionaries A and B are identical to less than .1 when they are unrelated. (The coefficient is usually greater than zero even for unrelated sequences because of chance correspondences.) By compiling the S-sub-AB values for a number of organisms in a matrix one can discern a pattern of relatedness or unrelatedness among organisms. More over, it is possible by straightforward statistical methods to construct from a set of SAB values for a group of organisms a dendrogram, or tree, showing the relations among members of the group.
To date the ribosomal RNAs of almost 200 species of bacteria and eukaryotes have been characterized. Most of the bacteria form a coherent but very large (which is to say ancient) group. They are the eubacteria, or true bacteria, and as would be expected they are quite distinct from the eukaryotes. The relations among the various genera (represented by the branchings of the tree) determined through ribosomal-RNA analysis are at variance with many of the established prejudices about bacterial relations. What is important at this point is that the eubacteria are divided into a number of major branches and hat several of the branches include photosynthetic bacteria. This finding suggests all eubacteria stem from a com mon photosynthetic ancestor.