Jun 20, 2016 7:00 AM

The Inventors of the Internet Are Trying to Build a Truly Permanent Web

What would you do if you wanted to read something stored on a floppy disk? That's a future the web's inventors don't want to see for their own creation.

If you wanted to write a history of the Internet, one of the first things you would do is dig into the email archives of Vint Cerf. In 1973, he co-created the protocols that Internet servers use to communicate with each other without the need for any kind of centralized authority or control. He has spent the decades since shaping the Internet's development, most recently as Google's "chief Internet evangelist."

Thankfully, Cerf says he has archived about 40 years of old email—a first-hand history of the Internet stretching back almost as far as the Internet itself. But you'd also have a pretty big problem: a whole lot of that email you just wouldn't be able to open. The programs Cerf used to write those emails, and the formats in which they're stored, just don't work on any current computer you'd likely be using to try to read them.

As fragile as paper is, written documents and records have long provided historians with a wealth of insight about that past that often helps shape the present. And they don't need any special technology to read them. Cerf himself points to historian Doris Kearns Goodwin's 2005 bestseller Team of Rivals, which she based on the diary entries and letters of Abraham Lincoln and his cabinet members. The book influenced how President Obama shaped his own cabinet and became the basis for the Steven Spielberg film Lincoln. In short, old records are important. But as Cerf's own email obsolescence shows, digital communications quickly become unreadable.

Don't believe it? What would you do right now if you wanted to read something stored on a floppy disk? On a Zip drive? In the same way, the web browsers of the future might not be able to open today's webpages and images--if future historians are lucky enough to have copies of today's websites at all. Says Cerf, "I'm concerned about a coming digital dark ages."

That's why he and some of his fellow inventors of the Internet are joining with a new generation of hackers, archivists, and activists to radically reinvent core technologies that underpin the web. Yes, they want to make the web more secure. They want to make it less vulnerable to censorship. But they also want to make it more resilient to the sands of time.

The Permanent Web

Today, much of the responsibility for preserving the web's history rests on The Internet Archive. The non-profit's Wayback Machine crawls the web perpetually, taking snapshots that let you, say, go back and see how WIRED looked in 1997. But the Wayback Machine has to know about a site before it can index it, and it only grabs sites periodically. Based on the Internet Archive's own findings, the average webpage only lasts about 100 days. In order to preserve a site, the Wayback Machine has to spot it in that brief window before it disappears.

What's more, the Wayback Machine is a centralized silo of information—an irony that's not lost on the inventors of the Internet. If it runs out of money, it could go dark. And because the archives originate from just one web address, it's relatively easy for censors, such as those in China, to block users from accessing the site entirely. The Archive Team--an unrelated organization--is leading an effort to create a more decentralized backup on the Internet Archive. But if Internet Archive founder Brewster Kahle, Cerf, and their allies who recently came together at what they called the Decentralized Web Summit have their way, the world will one day have a web that archives itself and backs itself up automatically.

Some pieces of this new web already exist. Interplanetary File System, or IPFS, is an open source project that taps into ideas pioneered by the decentralized digital currency Bitcoin and the peer-to-peer file sharing system BitTorrent. Sites opt in to IPFS, and the protocol distributes files among participating users. If the original web server goes down, the site will live on thanks to the backups running on other people's computers. What's more, these distributed archives will let people browse previous versions of the site, much the way you can browse old edits in Wikipedia or old versions of websites in the Wayback Machine.

"We are giving digital information print-like quality," says IPFS founder Juan Benet. "If I print a piece of paper and physically hand it to you, you have it, you can physically archive it and use it in the future." And you can share that copy with someone else.

Right now IPFS is still just a tool the most committed: you need to have IPFS's software installed on your computer to take part. But Benet says the team has already built a version of the software in JavaScript that can run in your browser without the need to install any new software at all. If it winds up on everyone's browsers, the idea goes, then everyone can help back up the web.

Unlike the early web, the web of today isn't just a collection of static HTML files. It's a rich network of interconnected applications like Facebook and Twitter and Slack that are constantly changing. A truly decentralized web will need ways not just to back up pages but applications and data as well. That's where things get really tricky--just ask the team behind the decentralized crowdfunding system DAO which was just hacked to the tune of $50 million last week.

The IPFS team is already hard at work on a feature that would allow a web app to keep trucking along even if the original server disappears, and it's already built a chat app to demonstrate the concept. Meanwhile, several other projects-- such as Ethereum, ZeroNet and the SAFE Network—aspire to create ways to build websites and applications that don't depend on a single server or company to keep running. And now, thanks in large part to the Summit, many of them are working to make their systems cross-compatible.

Why Bother?

Even if the web winds up in a new, better of digital archive, plenty of problems still remain. Today's web isn't just a collection of static HTML files; it's dynamic apps like Facebook, Twitter, and Slack. The operating systems and hardware of the future might not be able to read or run any of those. The same holds true for videos, photos, maybe even text.

Many efforts are afoot to right those weaknesses. But why bother?

After all, if anyone really cares about a specific file or site, can't they just transfer the files to newer media and convert the most important files to newer formats? The problem with that line of thinking, Cerf says, is that people often don't always know what's important right away. For example, sailors have kept meticulous records of weather and temperatures in locations all over the world for centuries. That sort of information probably seemed useless, the sort of thing geeks of old preserved out of a vague sense of historical purpose. But guess what: climate scientists may find all that weather data very valuable. (The Old Weather project is now hard at work digitizing those old ship logs.)

Still: some websites just shouldn't last forever. Does anyone in the future really need to see old drunken college photos or inadvisable Facebook rants? Meanwhile, activists and law enforcement are trying to stop web publishers from posting nude photos of people without their consent--a practice known as "revenge porn." These same preservation tools that could make it harder for governments to censor the web could make it harder for people to scrub content from the web that shouldn't be there anyway. People like Snapchat for a reason.

Cerf suggests possible technical workarounds to this problem. Web publishers, for example, could specify whether other people can automatically archive their sites. Bennet says the IPFS team has been considering a feature that would enable the original publisher of a page to un-publish it by sending a beacon to all other servers hosting a page asking for its removal. The IPFS servers could also host blacklists to remove copyrighted material. Still, those blacklists themselves become a reminder of the things we're trying to forget.

But the biggest problem facing the decentralized web is probably neither technical or legal. And that's getting people to care in the first place. At a time when people spend most of their time in closed-off platforms like Facebook and Snapchat, so much of what humans digitally produce stays locked up anyway. Bringing people back to the open web is going to mean creating user experiences that are fun enough and easy enough to persuade people to venture out of the confines of today's app-centric
Internet.

But Tim Berners-Lee, the creator of the original web, isn't worried. After all, the open web already beat out walled gardens with names like America Online, Compuserve, and Prodigy. "You can make the walled garden very very sweet," Berners-Lee said at the summit. "But the jungle outside is always more appealing in the long term."