Because I find it hard to relate to something as small as the structure of the human genome, I like to imagine it scaled up a millionfold. At this size, each DNA molecule—a chromosome—is as wide as a ramen noodle. Laid end to end, all 46 of the scaled-up chromosomes that compose a cell’s genome would stretch from New York to Kansas City, although they instead fold up to fit inside a structure the size of a house—the cell nucleus. Collectively, the 46 chromosomes contain two sets of roughly 20,000 genes. Each gene spells out a coded message telling the cell how to make a particular protein; at the millionfold scale, a gene is as long as a car.
Peering into the nucleus, you would see the DNA doing a lot of wiggling. Back when I was a Ph.D. student about a decade ago, I was stirring the ramen noodles in my dinner and wondering how the genome, unlike my noodles, avoided tangling into a mess that would prevent its crucial genetic messages from being sent.
In 2014 my colleagues and I contributed one piece of the answer to this question, adding to a growing realization that the structure of the genome inside the nucleus is far from random. Our team at the Baylor College of Medicine, led by my students Suhas Rao, Miriam Huntley and Adrian Sanborn, found that the human genome folds in a way that forms about 10,000 loops. These loops obey a simple code, hidden in the sequence of the genome itself. They turn out to be ancient structures; many of the same loops occur in mice, a shared legacy from an ancestral species that lived more than 60 million years ago. This persistence through time suggests that the loops are important to survival.
The loops seem to help control gene activity. All cells have the same genes, but if the patterns of activity did not differ, the body could not exist: a heart muscle cell would be no different from a brain cell. Just how these distinctive patterns are orchestrated has been a puzzle. Loops now appear to be one of the pattern controllers, a conductor of the genetic orchestra, influencing when particular genes become active enough to affect cell function.
As we continue to explore the loops, we expect to better understand gene regulation and to find clues about how many diseases arise. More recently, we and others have figured out how these loops form, dancing an elegant tango that keeps the genome tangle-free.
Facebook for the genome
My musings about entangled DNA related to a larger question: How does the 3-D arrangement of DNA in the cell nucleus influence gene activity? Since the late 1970s evidence had been piling up, indicating that small segments of DNA, known as enhancers, were needed to activate genes. Biologists had also learned that enhancers could lie very far from their target genes in a string of DNA. To trigger a gene’s “on” switch—a stretch of DNA adjacent to the gene known as a promoter—the string would presumably have to loop back onto itself, bringing the enhancer close to the promoter. But was the presumption correct? I became captivated by this problem and could think of only one way to settle it: find all the loops.
Conceptually the plan to do so was simple. If two people hang out especially often, it is logical to assume that they are friends. Similarly, we reasoned, if two stretches of DNA (“loci”) that are far apart along the chromosome tend to hang out especially often, the DNA has probably folded into a loop. What we needed was a way to measure how frequently bits of the genome interact with one another: to build something like Facebook but for the human genome.
To turn our idea into reality, we adapted a method described in 1993 by Katherine Cullen, then at Vanderbilt University, and her colleagues. At that time, the genome had confounded all known forms of imaging: like a bad portrait subject, the wiggling chromosomal noodles refused to sit still. But Cullen made the jittery chromosomes work to her advantage. As chromosomes jiggled, she knew, different bits of the genome would bump into one another. Bits that were in pretty close proximity in 3-D would bump into one another a lot; bits that were far apart would touch only rarely. So if you could measure the bump frequencies, you could figure out which parts of the genome were close in 3-D space.
To measure this bump frequency, Cullen and her colleagues developed what they called the nuclear ligation assay (NLA). In essence, you take cells and, without destroying their nuclei, stabilize their genomes. Then you send in an enzyme to cut the DNA into tiny pieces and deploy a protein that fuses the ends of two nearby fragments, forming a single strand. Finally, you examine the sequence of DNA base pairs (the paired letters of the DNA code that form the “rungs” of DNA’s familiar “ladder”) in the collection of fused fragments. If, in cell after cell, you see fusions of a particular pair of DNA bits that did not originally sit next to each other on a chromosome (known as ligation junctions), you can conclude that the two DNA bits often come near to each other in the 3-D space of the cells’ nuclei.
Cullen’s insight, published in the journal Science, allowed her to demonstrate that two bits of DNA bookending a specific long stretch of DNA bumped into each other far more often than chance would predict. In other words, the DNA formed a loop.
Back in 1993, experiments using the nuclear ligation assay were hard to perform. Fortunately, by the time I saw Cullen’s paper as a graduate student in the mid-2000s, there was a serviceable human genome reference, and DNA sequencing was becoming extremely cheap. I and three others at the Broad Institute of the Massachusetts Institute of Technology and Harvard University—Chad Nusbaum, Andreas Gnirke and Eric Lander—sketched out an approach that would analyze the contact frequency not of a single pair of DNA positions but of every pair of positions in the entire genome at the same time. It would also allow us to pinpoint exactly where each half of each ligation junction came from.
We decided to base our new method on a variant of Cullen’s procedure that had been developed by Job Dekker of the University of Massachusetts Medical School. Rather than using intact cell nuclei, as Cullen had, Dekker blew the nucleus apart and performed the crucial ligation steps in an extremely dilute solution. This modification, which Dekker had popularized and dubbed “chromosome conformation capture,” or “3C,” was believed to yield a more reliable estimate of bump frequency.
Next, we added a few steps to 3C. Before gluing fragments together, we would attach easily detectable labels to the ends of the shattered DNA—to mark the spot where two nearby bits became joined. After this step, we would cut the glued fragments into smaller pieces and pull out only the stretches bearing the labels; these bits would contain pure ligation junctions. Working with Dekker, his then postdoctoral fellow Nynke van Berkum and Louise Williams of the Broad Institute, we found that we could identify millions of contacts all at once. I called the method “Hi-C,” a play on “3C” and the name of one of my favorite drinks as a child. We published the method in 2009.
Our very first Hi-C maps of whole genomes showed that chromosomes, despite all their wiggling, were not folding up into a random jumble inside the nucleus. Instead each chromosome was partitioned into domains: stretches of DNA containing segments that made frequent contact with one another. Loci in one domain interacted with loci in other domains less frequently. What is more, our Hi-C data revealed that each of the domains sat within one of two larger spatial neighborhoods in cell nuclei. We called these neighborhoods “compartments” and labeled them A and B.
We found that the A compartment was rich in markers of genetic activity, such as messenger RNAs, which are molecules that genes send off to tell the rest of the cell what to do. The B compartment was more densely packed and was largely inactive. When the domains turned on or off, they moved from one compartment to the other. (Today we know that cell nuclei contain multiple A and B subcompartments.)
The discovery of this dynamic compartmentalization excited us because it confirmed that the genome’s large-scale 3-D structure was not random but instead intimately associated with gene activity. But I was disappointed that one folding feature never seemed to appear in the Hi-C data: loops!
Hi-C data are often represented as a heat map: a plot showing how frequently two loci in a chromosome form contacts with each other. In such plots, the contact frequency between two loci is indicated by the brightness of the spot on the x and y axes, representing the intersection of the loci. A loop should manifest as an unusually bright spot corresponding to the loop’s two anchor points. But we did not see any such peaks in brightness. If we could not show that loops were forming, we could not explore whether enhancers activated genes by physically coming into close proximity with promoters.
Making a loop map
This problem stumped us for the next three years. Then, in 2012, Rao and Huntley figured out what had gone wrong. They realized that one aspect of Hi-C—destroying cell nuclei before ligation—disrupted fine structures such as loops. So they set out to develop an updated Hi-C method that kept nuclei intact during ligation.
The new approach, called in situ Hi-C, made a huge difference. In studies of white blood cells, Rao and Huntley found that bright peaks now appeared all over our heat maps, each representing a putative loop. But it had now been six years since I had started working to map the loops; I no longer believed my own eyes. My team and I worried that we might be seeing things in the data that were not really there.
To make sure I was not dealing with confirmation bias, I brought the maps home to my son, Gabriel, who was then three. “Do you see a red dot?” “Yes,” he said. “Can you point at it?” He could.
We had it: a map showing 10,000 loops, spread out across the human genome. We checked to see whether the loops linked gene promoters and enhancers. They often did.
In a further test, we compared our blood cell maps with new ones for a different kind of cell—from the lung. We saw many of the same loops, but we also saw new connections that we presumed involved different enhancers and different target genes. These changes in the looping pattern suggested that loops might be involved in regulating the genes that give a cell its distinctive identity.
We wondered if looping was unique to humans or if the same loops were present in other organisms. So we made a map of the loops in mouse cells and found that half the loops were present at the corresponding position in the human genome. These shared loops had been conserved over at least 60 million years of evolution, from ancestral creatures that roamed the earth long before the Colorado River began to carve out the Grand Canyon.
One interesting implication of our data was that loops are not static: they seemed to constantly arise, come apart and form again. Naturally we wanted to know how this worked.
We suspected that hundreds of proteins were involved. The data, however, told a different story. In loop after loop, two protein factors stood out. One, named CTCF, had been discovered by Victor Lobanenkov and his colleagues in 1990. It contains 11 components called zinc fingers that allow CTCF to bind very tightly to certain spots on DNA. The second factor, cohesin, discovered in 1997 by Kim Nasmyth, now at the University of Oxford, is a ring-shaped complex made up of multiple proteins. It was thought that two cohesin rings might link up and function together, with each ring in the pair encircling DNA and sliding on it freely, like a ring on a necklace.
Seeing these proteins was not a total surprise: many earlier studies had suggested their possible involvement in genome folding, although such a ubiquitous role at loop anchors—especially at loops linking promoters and enhancers—was unexpected.
Then we stumbled onto something truly weird. Rao, Huntley and I had asked Ido Machol, a new computational scientist in the laboratory, to study the distribution of histone proteins (which help to package DNA inside the nucleus) near CTCF molecules. Machol noticed that there were more histone proteins immediately outside of loops than immediately inside, as though the histones somehow knew where a loop was positioned relative to the CTCF molecules. I suspected that the finding just reflected a bug in Machol’s code. But as the weeks passed, Machol did not find any bugs.
We began to look for a biological explanation. In the original paper describing the discovery of CTCF, Lobanenkov had shown that CTCF does not attach at arbitrary positions on DNA. Instead it always binds to a particular DNA word—a specific sequence of roughly 20 bases, called a motif. Because DNA is a double helix, it has two strands. Motifs can appear on either strand, pointing toward either terminus of the vast DNA noodle. The relative orientation of DNA motifs is often random, like a coin flip: there is a 50 percent chance that a typical motif points toward one terminus and a 50 percent chance that it points towards the other. So we expected, at first, to see random orientations of CTCF-binding motifs at loop anchors.
We wondered if the CTCF-binding motifs at loop anchors were giving histones a clue to where they should connect to the DNA near the motifs. We checked, and, to our astonishment, the two tiny CTCF-binding motifs—even if they were separated by millions of DNA letters in unfolded DNA—always pointed toward each other and into the loop, in what we named the convergent orientation. This convergent rule explained how the histones could know where to position themselves—they just had to determine which way the CTCF-binding motif was pointing.
But in resolving one puzzle, the convergent rule had created a second, far greater mystery. The nonrandom orientation of the motifs defied expectations. For perspective, let us again scale up the genome by a factor of one million. Now the motifs are each five millimeters long and separated by as much as a kilometer of genomic noodle. And yet, somehow, as if guided by a magical compass, the motifs at opposite ends of a stretch of loop-forming DNA always point at each other. Like any good magic trick, the convergent rule seemed physically impossible. It also contradicted the accepted view of how loops probably formed.
At the time, nearly everyone—ourselves included—believed that genome loops formed by diffusion. In that scheme, a protein needed for forming a loop binds at one end of a stretch of DNA. Next, another loop-enabling protein binds at the other end. Then, as usual, the DNA wiggles. Finally, if the wiggling brings the two proteins together, they form a physical link, thereby creating a loop. The trouble is, the entire DNA chain has so much room to wiggle that, if the diffusion model were correct, the relative orientation of the CTCF-binding motifs could not matter. And yet we were seeing convergence. Within the year two teams, one led by Suzana Hadjur of University College London and one led by Yijun Ruan of the Jackson Laboratory, confirmed the convergent rule in their own data sets. The rule was here to stay, and the loops we saw could not be forming by diffusion.
If loops did not form by diffusion, then how did they arise? And what were the roles of CTCF and cohesin? We did what we always do in case of a genome-folding emergency: we started playing with our headphone cables.
I am pretty sure that most people who work on genome folding keep a long, noodlelike object handy: a piece of yarn, a plastic tube. When you get stuck on a hard problem, you pull this object out and futz around. One day Rao and I were passing the headphones back and forth as we explored possible models of loop formation. Suddenly it occurred to us that the answer was not in our headphones; it was on our backpacks.
Imagine the apparatus that adjusts the length of backpack straps. This object, called a tri-glide, consists, more or less, of two rings that are physically attached to each other. The strap comes in the first ring and goes out the second. If you want to adjust the strap length, you pull some of the strap through one of the rings and start making a loop. And you can keep making the loop bigger until you reach a bit of folded-over material that stops you.
Perhaps pairs of cohesin rings worked like tri-glides? At first, they attach anywhere on the genome, with the DNA going in one ring and out the other. But then, the two rings slide in opposite directions (one to the left along the linear molecule and one to the right), extruding a growing loop as they go. They do not slide forever, though. Eventually one approaches a site where a CTCF molecule is bound. If the underlying CTCF-binding motif is pointing toward the approaching ring, then the sliding ring stops on contact. But if the motif is facing the other way, the cohesin ignores it and keeps going. (In this way, a CTCF-binding motif is like a stop sign for cohesin traffic: if the sign is facing you, you stop; if the sign is facing the other way, you do not.) The second ring keeps going until it, too, arrives at an inward-pointing CTCF-bound motif. The loop is now complete.
If cohesin rings actually worked that way, then loops would form only between pairs of CTCF-binding motifs that obeyed the convergent rule. We quickly realized that this extrusion process would provide a crucial benefit to cells. If loops formed by diffusion, then pairs of loops in a chromosome could easily become entwined, leading chromosomes to form knots and get entangled with one another. This would make it hard for genes to operate properly and could prevent chromosomes from separating when cells need to divide. In contrast, loops produced by extrusion do not form knots or entanglements—which is why your backpack straps do not get knotted no matter how much you adjust their length with a tri-glide.
The model was wild speculation. It made many basic assumptions for which we had no shred of direct evidence, such as the notion that cohesin could slide along DNA. We worried we were crazy. But as we read through the literature on cohesin, we realized Nasmyth himself had proposed back in 2001 that cohesin might extrude DNA. Sanborn ran detailed simulations that closely recapitulated the data in our maps. And when Rao experimented on real DNA, the looping changed exactly in the ways that Sanborn’s model predicted.
Deleting a CTCF-binding motif at a loop anchor eliminated the loop. Flipping a motif’s orientation made the original loop disappear but caused another loop to form on the other side. Adding a CTCF-binding motif—so long as it pointed the right way—also led to the formation of a new loop. We then found that we could add and remove loops to a genome at will.
We quickly wrote and submitted a paper on our extrusion model and the loop-engineering experiments that we had performed to test it. The field was heating up, and within a few weeks of one another in late 2015, our lab and two other teams published papers demonstrating that this kind of 3-D genome surgery worked. Similarly, three teams—ours, one at Emory University and one at M.I.T.—reported that the convergent rule favored a model in which loops form by extrusion. At last, the scientific community was starting to untangle the logic of loops.
Progress continued, now at a breakneck pace. At the Gladstone Institutes, Benoit Bruneau and his colleagues showed that interfering with CTCF greatly weakened loops. At the European Molecular Biology Laboratory, Francois Spitz and his co-workers got a similar result by eliminating a protein thought to load cohesin onto DNA. At the Netherlands Cancer Institute, Benjamin Rowland’s team showed that eliminating a factor that removes cohesin from DNA led to bigger loops, presumably because cohesin could now slide for longer. And in our lab, Rao showed that by degrading cohesin itself, we could eliminate all the cohesin loops within minutes.
But we all longed for direct confirmation: seeing extrusion in action. Finally, in April 2018, Cees Dekker of the Delft University of Technology in the Netherlands and his colleagues did just that. By using yeast’s condensin—a complex of proteins that is closely related to cohesin—they made a microscopic movie that many of us in the field of nuclear architecture will never forget. First you see a ribbon of DNA. Then condensin lands, forming a little nodule of DNA. The nodule grows and grows until the viewer realizes what it really is: an extruded loop.
Turning toward health
As the mechanisms and rules for loop formation emerge, the importance of looping for health and disease is becoming clearer. For instance, Frederick Alt of Harvard University and his colleagues have begun to articulate the role that looping plays in antibody production. Your body makes antibodies to pathogens it has never encountered before by cutting and pasting segments of antibody genes. Alt’s team found that this process is accomplished by forming multiple CTCF-anchored loops and then cutting them out.
The lab of Stefan Mundlos of the Max Planck Institute for Molecular Genetics in Berlin has shown that modifying a single CTCF-binding motif in mice causes the animals to develop an abnormal number of digits in their paws. Humans with the corresponding change did not have five fingers. And Rafael Casellas of the National Institutes of Health has shown that disrupting CTCF-binding motifs in a mouse plasmacytoma—a kind of cancer—could slow the tumor’s growth by 40 percent.
Yet as the notion of loop extrusion has gained credence, deeper theories about the role that loops play in gene regulation have been coming apart. For decades scientists thought that loops worked like switches: when the loop between an enhancer and a promoter was present, the corresponding gene turned on. Therefore, we expected that when we removed cohesin from cells, gene expression would go haywire, with thousands of genes changing their activity level. As predicted, many genes did change. But the changes were fairly small. Loops—at least those formed by extrusion—are not binary switches after all. Instead they seem to function more like knobs, turning gene activity up a little or down a little, fine-tuning a cell’s supply of different proteins.
In other words, nature has thrown us for a loop. We thought that we understood the rules of the game, that loops turn genes on. But now that we have seen loops in action, we must concede that our vision was too simplistic. It is even possible that gene regulation may be a side gig for loops; perhaps their main function in cells is something else entirely.
Like any explorers in uncharted territory, we need better maps. My colleague Ruan and I at the NIH’s Encyclopedia of DNA Elements (ENCODE) Project are currently working with our colleagues to create the first atlas of looping in the human genome, mapping the loops in tissues across the human body. Our groups, and many others, have also joined together in the 4D Nucleome consortium that is developing new methods for tackling these problems. And Olga Dudchenko, a postdoc in my lab, has created the DNA Zoo—a consortium of academic labs, zoos and aquariums around the world that is trying to assemble the genomes of hundreds of species, chronicling the evolution of loops across the tree of life.
For researchers the ending of one scientific story is always the beginning of another. Two billion years ago, before the emergence of the cell nucleus, the process of DNA extrusion arose. Why? Once more, into the loop.