Biologists have long been studying genes to understand the history of branching on the tree of life, which unites all living creatures on earth—be they marmosets or microbes. One leaf on this sprawling ancestral tree, nestled among the apes, is Homo sapiens. Each individual in our species is an assemblage of cells, which cooperate to generate our body.

Normally the cells obey a covenant, established by trial and error more than 600 million years ago, in the first forms of multicellular life. The covenant decrees that if cells are to live together, they have to follow basic rules: repair their DNA when it is damaged; listen to their neighbors about whether to divide or not; and stay in the tissue where they are supposed to be. Typically mutations that cause cells to violate these restrictions and start to grow and spread incessantly—the hallmarks of malignant cancer—are quashed by controlled death. The mutated cells detect their own problems and commit suicide or are killed by the immune system before they can do any harm.

On occasion, though, mutations accumulate against which the cellular surveillance system does not work, and tumors grow and spread. A malignant evolutionary tree sprouts within.

Researchers know of a few mutations that drive tumorigenesis, the formation of the initial tumor. What makes cancers particularly lethal, however, is metastasis, the escape of diseased cells from the primary tumor and into formerly healthy tissues, where they lodge to generate new tumors. In the belief that further mutations were required to propel metastasis and that these occurred relatively late in the history of the primary tumor, oncologists often sought to identify them and to target them with drugs.

Around 2010, however, technological advances enabled scientists to inexpensively sequence the entire human genome (that is, to deduce the genome's ordering of bases, or constituent units of DNA). Research groups at several institutions began to study the genetic sequences of tumors comprehensively. To their dismay, the investigators found that even within a single patient, the tumors often contained a baffling variety of mutations.

Evolutionary biologists such as me see diversity as a source of valuable information, however. Along with colleagues at Yale University and other institutions, I decided to investigate how the mutations were related to one other. We sequenced the expressed portions of the genomes—those sections of DNA known to control the production of proteins and thereby to determine the properties of cells—of cancer patients. Further, we used that information to create evolutionary trees of the mutations associated with the disease. The branches of the trees illustrate how the genes within tumors change as the cancer grows from a few cells to a metastatic monster.

A Tangle of Branches

Our studies revealed that the branches linking the primary tumors to metastases within a patient sprout profusely and seemingly randomly, one from the other, like the branches of a mythical poison tree. Even more surprisingly, the first branches of this evolutionary tree can emerge from deep within the ball of the original tumor. Distinct cells in the primary tumor can be ready to evolve into more aggressive forms—each with its own genetic mechanisms for spreading—many years before the initial tumor is first diagnosed.

These findings are scary but also offer new hope. They imply that instead of concentrating on later mutations, cancer researchers should preferentially study genes that are altered early in the primary tumor, or seed, that gave birth to the cancer tree. Targeting these mutant genes with drugs might give patients a better chance of recovery.

A linear model has guided cancer research for decades. It states that a specific series of mutations lead to tumorigenesis. Only after that do some cells in the primary tumor acquire one or more further mutations that endow the ability to metastasize. If one could construct an evolutionary tree of the mutations, it would resemble a typical grass: tall, straight and possessing a single core from which, near the very top, a few leaves and seeds would emerge.

Metastatic liver cancer cells, as seen in a polarized-light micrograph. In the center is a dividing cell. Credit: Jennifer C. Waters Science Source

This theory does not square with what evolutionary biologists know about the history of life-forms. Ongoing mutation and selection propel organisms to constantly diverge from one another, generating a diversity of genetic lineages rather than a single, homogeneous population. Indeed, early studies by Marco Gerlinger of the Institute of Cancer Research in London and others hinted that even within primary tumors, different regions of tumor cells had different genetic sequences.

In 2010 members of my laboratory at the Yale School of Public Health and I, along with pathologist David Rimm, geneticist Richard Lifton and pharmacologist Joseph Schlessinger, all at the Yale School of Medicine, set out to answer three questions raised by these observations. First, are one or more specific mutations necessary for metastasis and present in all patients? Second, can metastatic lineages diverge relatively early in the history of the primary tumor, before most mutations have accumulated? Third, if we discovered a diversity of mutations in primary tumors and metastases, could we use evolutionary trees to calculate when they tend to occur? Answering these questions would reveal the genetic trajectories leading to the birth of the primary tumor and its metastases.

Poison Fruit

We had no idea how powerful our evolutionary tools would turn out to be. Rimm obtained autopsy tissues from primary and secondary tumors, as well as from neighboring healthy parts of the affected organs, of 40 patients who had died of 13 different types of cancers. For each sample, our team sequenced all parts of the genome that are known to be expressed in any tissue and at any time. Our studies revealed anywhere between dozens and thousands of mutations that were different between the germ-line, or normal, genetic sequence of the patient (which he or she had inherited from a single fertilized egg) and one or more samples of the cancerous tissues.

To understand how these samples related to one another, Zi-Ming Zhao, then a postdoctoral associate in my lab, constructed molecular evolutionary trees. This type of tree is used to understand our relationship with chimpanzees, gorillas and orangutans; the apes' relationship with other mammals; mammals' relationship with birds and other animals; and animals' relationship with fungi, plants and bacteria. Scientists compute these trees by comparing how organisms' traits (or the sequence of bases in their DNA) diverge from one species to another and by finding the most plausible graph in which each life-form in question has a place on a tree's branches.

Applying these techniques to cancer is tricky, however. Ordinarily, we use only present-day sequences as data and figure out what we can about the ancestors with that information. In cancer trees, however, we know the sequence of the ancestor: it is the germ-line sequence obtained from healthy tissue. Without modification, traditional approaches would assume that the normal sequence was an additional “descendent” lineage—producing trees that did not reflect the history we were interested in. We modified the classical approaches, requiring the genetic sequence of the healthy tissue to be the ancestor of the primary and metastatic lineages, and computed the trees that were most likely to explain the succession of changes.

These reoriented evolutionary trees revealed something striking. According to the long-standing linear model, all metastases would descend from a single lineage of cells that broke free from the primary tumor and spread to other sites. If indeed metastasis occurred in this way—deriving from a final mutation in a single-file march of DNA changes—we would expect the genetic sequence derived from each secondary tumor to be more closely related to those of other secondary tumors than to any part of the primary tumor.

That is not what we saw. As we started studying the tumor “trees,” we spotted patients whose primary tumor tissue was closely related to some metastatic tissues but not to others. The finding implied that not one but multiple genetic lineages within the primary tumor had at some point gone metastatic. In fact, this pattern showed up more than a third of the time in our core set of well-resolved trees.

Time Trees

We were stunned to realize that the classic linear model did not fit the actual data. Instead of a single, rare event inducing metastasis, the evidence indicated that the early genetic changes that jump-start tumor proliferation are also responsible for a lineage's ability to metastasize.

Furthermore, in cell lineages that evolved to metastases, we could finger no single gene as the culprit. Apart from the key genes already known to drive tumorigenesis (such as the KRAS gene, which is mutated in the primary tumors of almost every patient with pancreatic cancer), no particular gene in metastatic tissues was mutated in several patients. In fact, the mutations found in branches that led to secondary tumors were indistinguishable from those in lineages that never left the primary tumor. Factors other than mutation, such as epigenetic changes (alterations in how a gene is expressed) in a primary tumor cell—or the details of its microenvironment—were more likely to blame for metastasis.

Credit: Matthew Twombly

Epigenetic modifications in a cluster of primary tumor cells, driven by, say, chance exposure to a carcinogen, might increase the cells' propensity to migrate. Also pertinent is the location of a particular cell with respect to other types of cells. For example, some tumor cells might spread through the body because they happen to be close to a blood or lymph vessel, whereas other cells with identical mutations might not because they are not close enough. These other factors potentially influencing metastasis may have little or nothing to do with the later mutations that show up in our evolutionary trees.

Once it was clear that divergent lineages within the primary tumor sometimes give rise to different metastases, we wondered how early in the patient's lifetime these metastatic lineages diverged. Our molecular evolutionary trees do not answer this question: the lengths of the branches correspond not to real time but to the number of mutations that distinguish different parts of the cancer, such as primary tumors from metastases. They do not tell us how long it took for one tumor to give rise to another.

We wondered if we could employ another technique from evolutionary biology—the construction of time trees—to understand the history of cancer progression within the human body. In contrast to a molecular evolutionary tree, the length of a branch in a time tree measures the amount of time that elapsed before one creature evolved from another. Such graphs, obtained by comparing the traits of interest (such as genetic sequences) and combining these with temporal information (such as mutation rates), enable scientists to measure when key changes occurred. They have been used on fossil data, for instance, to reveal the timing of the Cambrian explosion, when diverse multicellular life appeared nearly 550 million years ago.

Of course, we had no buried fossils to calibrate cancer evolution across someone's lifetime. We could, however, do even better. In many cases, we had primary tissue that had been extracted before autopsy. Furthermore, we had medical records for each case—providing the dates of birth, diagnosis, biopsy, surgical removal of a tumor and autopsy. These dates served as calibration points. The cancer could not have originated before the year of birth, for example, and must have existed when the primary tumor was diagnosed. And tissue from biopsies, as well as from tumors that had been extracted, gave us snapshots of cancer evolution. The corresponding dates allowed us to calculate the rate of mutation. We also accessed published data gathered in the past by radiologists on the rates at which cells in the primary tumor typically divide. (Radiologists have gathered this information to gauge the amount of radiation necessary to destroy a tumor by radiotherapy.)

Atila Iamarino, then another postdoc in my lab, used all this information to turn the molecular evolutionary trees into time trees. We got a first glimpse of how the evolution of cancer relates to the life span of a patient and to how long he or she had been treated. We could estimate, for example, when the first genetic mutation differentiating the cancer cells from healthy tissue arose. In young patients, this divergence typically occurred just a few years before diagnosis; in older patients, it could have taken place decades earlier.

Deep Roots

The first mutation to genetically distinguish tumor tissue from normal tissue typically arose years—sometimes decades—before the cancer was diagnosed. Just as disturbing, in nine out of 10 of our subjects, at least one metastatic lineage had already diverged by then. In seven cases, this malignant branch had separated from the trunk closer to the time of the primary tumor's origin than to the death of the patient.

These observations struck us as deeply significant. Cells that proceed to metastasis can genetically differentiate from other cells in the primary tumor early in the evolutionary and temporal history of cancer. So early, in fact, that often they have diverged even before the primary tumor is diagnosed.

We had hoped to identify crucial metastasis-inducing mutations that would be suitable targets for pharmacological intervention. Because little was special about the genetics of the metastatic lineages, however, we turned our attention away from the branches and toward the evolution of the original tumor. We wondered whether the trunk of the evolutionary tree plays a special role in the origination of cancer. To answer this question, we examined whether mutations in this trunk were occurring in DNA that alters the cellular function of genes that were already known to play a role in cancer.

They were. For example, the well-known tumor suppressor gene p53, which inhibits the proliferation of cells, was mutated in many patients early in the evolution of diverse tumors. So was the proto-oncogene KRAS. (A proto-oncogene is a gene that, if mutated, becomes an oncogene, which prompts a cell to divide incessantly.) Almost every patient with pancreatic cancer, for example, had an early mutation at the 12th site of the KRAS gene.

The frequent presence of such key genes in the roots of cancer lineages implies that they play formative roles in the origin of tumors, as well as in their metastases. We speculate that as genetic drivers of tumorigenesis accumulate, the probability of metastasis becomes little more than a numbers game: the larger the number of cancer cells present, the greater the chances that they will find themselves at a location, or adopt an epigenetic state, that facilitates spreading.

Further studies are needed to clarify how these key genes might influence the chances of tumorigenesis and metastasis. Even so, the early drivers deserve redoubled attention. Drugs targeting them may be key to cancer treatment—both early in the development of primary tumors and in late-stage cancers.


Recent clinical trials have demonstrated that it is also possible to unleash the body's own immune system to destroy cancer cells. For both targeted drugs and immunotherapy, however, tumors seem to evolve resistance. Does resistance derive from specific mutations, as the primary tumor does? Or is it a symptom of the microenvironment and other factors, as metastasis appears to be? We do not yet know, but evolutionary trees can shed light on this question.

Our time-tree studies had revealed that some lesser-known genes that are also suspected to drive cancer were mutated, too, but those changes tended to occur later in the history of the disease. That is, they were not in the trunk but in the branches of the cancer tree—so that mutations in these genes were typically present only in some of the patient's tumors but not in others. In consequence, therapies directed toward such mutations, which some oncologists might prefer, could kill the mutated branch, but the remainder of the cancer tree would continue to proliferate and threaten the life of the patient. Doctors using such targeted drugs would do well to supplement them with treatments designed to kill other kinds of cancer cells as well.

On the other hand, if a drug targets an early mutation that is present in all of the cancer tissue, resistance might arise from the growth of cells featuring specific new mutations. Pathologist Katerina Politi of the Yale School of Medicine and her colleagues have identified changes to the EGFR gene—another major driver of cancer (in particular, lung cancer) when mutated—as indeed playing a significant role in resistance. To understand why and how resistance evolves as a patient is treated, our research group has begun to use evolutionary techniques. We are computing patients' cancer trees and scanning for mutations on the branches that lead to treatment-resistant tissue, such as a recurrent tumor. Excitingly, our preliminary studies suggest that resistance does seem to be driven by genetic changes that may derive from the kind of treatment the patient is undergoing.

Every year the number of therapeutic drugs developed to target specific mutations increases, as does the potential to prescribe complex combinations of traditional chemotherapy, radiotherapy and immunotherapy. No longer do oncologists regard one type of cancer as a homogeneous disease. Rather each case is its own entity. Studying the genomics of individual patients will have an enormous impact on cancer care in the future. To use these new tools wisely, oncologists will have to become de facto evolutionary biologists, examining the genetic variation present in each patient's cancer tissues and devising a strategy to destroy the cancer tree, root and branch.