Cover Image: April 2008 Scientific American Magazine See Inside

At the Edge of Life's Code

Using machine learning, Chris Wiggins hopes to develop models that can predict how all of an organism's genes behave under any circumstance - and thereby explain precisely why some cells become sick or cancerous















Share on Tumblr

At the Kavli Institute, Wiggins began building a model of a gene regulatory network in yeast—the set of rules by which genes and regulators collectively orchestrate how vigorously DNA is transcribed into mRNA. As he worked with different algorithms, he started to attend discussions on gene regulation led by Christina Leslie, who ran the computational biology group at Columbia at the time. Leslie suggested using a specific machine-learning tool called a classifier. Say the algorithm must discriminate between pictures that have bicycles in them and pictures that do not. A classifier sifts through labeled examples and measures everything it can about them, gradually learning the decision rules that govern the grouping. From these rules, the algorithm generates a model that can determine whether or not new pictures have bikes in them. In gene regulatory networks, the learning task becomes the problem of predicting whether genes increase or decrease their protein-making activity.

The algorithm that Wiggins and Leslie began building in the fall of 2002 was trained on the DNA sequences and mRNA levels of regulators expressed during a range of conditions in yeast—when the yeast was cold, hot, starved, and so on. Specifically, this algorithm—MEDUSA (for motif element discrimination using sequence agglomeration)—scans every possible pairing between a set of DNA promoter sequences, called motifs, and regulators. Then, much like a child might match a list of words with their definitions by drawing a line between the two, MEDUSA finds the pairing that best improves the fit between the model and the data it tries to emulate. (Wiggins refers to these pairings as edges.) Each time ­MEDUSA finds a pairing, it updates the model by adding a new rule to guide its search for the next pairing. It then determines the strength of each pairing by how well the rule improves the existing model. The hierarchy of numbers enables Wiggins and his colleagues to determine which pairings are more important than others and how they can collectively influence the activity of each of the yeast’s 6,200 genes. By adding one pairing at a time, MEDUSA can predict which genes ratchet up their RNA production or clamp that production down, as well as reveal the collective mechanisms that orchestrate an organism’s transcriptional logic.

Wiggins and his colleagues can now go much further than yeast. Recently they have shown that MEDUSA can accurately build predictive models of gene regulatory networks in higher organisms such as worms as well as in several cell lines, including those of human lymphocytes. In a cancer cell line, the team can determine which genes increase their activity when they should decrease it, and vice versa. The ultimate goal, however, is to understand their coordinated activity and infer, with statistics, which interactions lead to a diseased cell.

Although MEDUSA makes accurate predictions on test data, there is still no way to know whether it faithfully reproduces real biological networks. To do so, each connection would have to be experimentally tested. It is also unclear how well microarray data measure expression levels, so accurate predictions may not necessarily reflect the truth. Moreover, machine learning forces researchers to formulate ad hoc hypotheses that may be biased toward their results, “so any kind of correlation in the data may be a fluke,” remarks Yoav Freund of the University of San Diego, who created MEDUSA’s learning algorithm.

To address these limitations, researchers must not only continue to cross disciplines but also be willing to adopt their tools. “I would say that machine learning hasn’t taken off like wildfire in the physics community,” remarks Alex Hartemink, a machine-learning expert at Duke University. “But Chris seems to be most comfortable reaching out and learning about techniques from other places. And I think we need people that are going to do that—foray out into the forest, find new resources and bring them back to the tribe and say, ‘Hey, guys, check this out—this is great stuff.’”



This article was originally published with the title At the Edge of Life's Code.



Subscribe     Buy This Issue

Already a Digital subscriber? Sign-in Now
If your institution has site license access, enter here.

ABOUT THE AUTHOR(S)

Thania Benios is based in New York City.


1 Comments

Add Comment
View
  1. 1. SteveBallmer 01:35 AM 5/3/08

    Note that it takes excel to run this!

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

More »

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital

Latest from SA Blog Network

  SA Digital

Science Jobs of the Week

Email this Article

At the Edge of Life's Code: Scientific American Magazine

X
Scientific American Magazine

Subscribe Today

Save 66% off the cover price and get a free gift!

Learn More >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X