File-Swapping Veers into the Fast Lane

A new method for comparing files promises speedier downloads of music and movies















Share on Tumblr

File Swapper

FASTER FILES: Researchers have devised a new file-swapping scheme that could increase the number of data sources and make downloading faster. Image: © ISTOCKPHOTO/ANDREW MANLEY

A new file-swapping method could speed up downloads to rates as much as three times faster than the popular service BitTorrent. The approach, outlined and demonstrated last month by computer scientists at Carnegie Mellon University, Purdue University and Intel Research, would let file-swappers seeking a specific title download bits of it from similar, but not necessarily identical files. It works a little like an enterprising mechanic who uses spare parts from a Toyota to fix an old Ford.

The idea is already drawing interest from commercial content distribution companies, along with discussion in less formal peer to peer communities online.

"It makes an awful lot of sense," says Andrew Parker, chief technical officer of CacheLogic, which legally distributes movie and game files online. The company has been independently researching a "very similar" concept, he adds.

With high-definition online video just around the corner, proposals for speeding downloads and easing network traffic are increasingly welcome. File-swapping networks, rife with video, games and music, can provide a real-world laboratory with lessons for the broader Net.

In their quest for speed, most modern peer-to-peer systems break files—say, a copy of The Departed—into thousands of chunks and allow these individual components to be swapped separately. This allows someone with only half a movie downloaded to serve as a secondary source for that part of the content, for example.

Many files can still take days to download, however, as original sources go offline, or as a sources' upstream bandwidth clogs.

Aiming to fix this problem, Carnegie Mellon's David Andersen and his colleagues reasoned that many files online today are, in fact, near-duplicates with minor differences—identical songs labeled differently, movies in different languages or different versions of the same software programs, for example.

To test this, they downloaded all the versions they could find of 26 songs and 26 movies, resulting in more than 6,000 media files. Different versions of the same song wound up sharing about 99 percent of the same content, they found, while different versions of the same movies offered an average 15 percent overlap.

To make this shared content accessible, the team created a "handprinting" system, a unique digital identifier based on the exact contents of the file. Unlike more traditional digital "fingerprinting," commonly used to identify or authenticate documents, this system also allows fast comparison of a limited number of individual chunks, which can then be swapped if found to be identical.

Each handprint can be thought of as a string of digits, with different parts corresponding to different chunks of data. Thus, if The Departed's handprint was "12 14 16 18 24," and its Spanish language translation Los Infiltrados produced "13 15 17 18 24," the second file could be used as a source of some content. Scenes without dialogue, for example, might be identical in both language versions.

Tests of the team's prototype, dubbed Similarity-Enhanced Transfer (SET), found it to be as much as three times faster than BitTorrent for songs and about 30 percent faster for movie files when drawing content from similar as well as identical files over DSL-speed connections. If many identical copies were already available, however, the advantage disappeared, making it useful for perhaps "half the content out there," Andersen says.

The concept may be difficult to add to existing file-swapping networks, because its file-splitting methods would likely make SET-enabled updates incompatible with earlier versions of today's swapping software. Nevertheless, the idea is being widely discussed on peer-to-peer forums and mailing lists. Parker said SET or something like it is "certain" to end up in CacheLogic's toolbox before long.



Comments

Add Comment
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

Tweets could not be retrieved at this time

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital
  SA Digital

Science Jobs of the Week

Email this Article

File-Swapping Veers into the Fast Lane

X
Scientific American MIND iPad

Tap into your MIND

Get Both Print & Tablet Editions for one low price!

Subscribe Now >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X