UC Berkeley News
NewsCenter
Today's news & events
News by email
For the news media
Calendar of events
Press Release

UC Berkeley Press Release

Amount of new information doubled in last three years, UC Berkeley study finds

– If you feel like you're experiencing information overload, a team of University of California, Berkeley, researchers have a good idea why.

Worldwide information production has increased by 30 percent each year between 1999 and 2002, according to the team led by professors Peter Lyman and Hal Varian of the School of Information Management and Systems.

"All of a sudden, almost every aspect of life around the world is being recorded and stored in some information format," said Lyman. "That's a real change in our human ecology."

The researchers' report, which will be presented today (Tuesday, Oct. 28) at an information storage industry conference in Orlando, Fla., is supported by Microsoft Research, Intel, HP and EMC.

According to the researchers, the amount of new information stored on paper, film, optical and magnetic media has doubled in the last three years. And, new information produced in those forms during 2002 was equal in size to half a million new libraries, each containing a digitized version of the print collections of the entire Library of Congress, they added.

The researchers also report that electronic channels - such as TV, radio, the telephone and the Internet - contained three and a half times more new information in 2002 than did the information that was stored.

It's no surprise that the development of effective, reliable and cost-efficient strategies to store data is of increasing interest, and not just for commercial companies or for students downloading music.

Such storage is of growing importance to government agencies and institutions ranging from the Library of Congress, the Department of Homeland Security and the National Archives and Records Administration to the National Weather Service or NASA officials planning a mission to Mars.

"This study shows what an enormous challenge we and the rest of the information technology industry face in organizing, summarizing and presenting the vast amount of information mankind is accumulating," said Jim Gray, a Microsoft Bay Area Research Group distinguished engineer.

At Intel's storage components division, general manager Mike Wall agreed: "This calls for technology that can access and manage blocks of data the size of the Library of Congress to and from devices ranging from personal computers to PDAs anytime, anywhere, without losing as much as a bit."

Roy Sanford, EMC's vice president of content-addressed storage, said: "The study highlights the challenge of how to manage all their information according to its value at every stage of its life - from creation and protection to archive and disposal. It calls for application level integration directly into the storage infrastructure to allow for a policy-based, proactive management of the relentless growth of structured and unstructured data."

And Jeff Jenkins, director of marketing for storage, networking, and infrastructure with HP Industry Standard Servers, noted that today's enterprises face increasing complexities in managing and storing explosive data volumes generated from new financial legislation, disaster recovery implementations, real-time business communications and the Internet economy.

Because most new information is being stored in digital form while other, older formats are giving way to digital formats or are being digitally archived, the researchers chose to use the digital measurement of the terabyte as their standard gauge. A terabyte is a unit of computer data storage equal to a million megabytes, or roughly the text content of a million books. With the amounts of information so massive, however, the team also measured information by exabyte, or a million terabytes.

"Remember, it's not knowledge, just data," cautioned Lyman. "It takes thoughtful people using smart technologies to figure out how to make sense of all this information."

Among key findings in the report:

  • The amount of new information stored on paper, film, optical and magnetic media reached about five exabytes - or 5 million terabytes - in 2002, compared to about half that in 1999.
  • Some 92 percent of new information is stored on magnetic media, primarily hard drives.
  • New information flowing electronically on radio, television and the Internet in 2002 totaled nearly 18 exabytes.
  • The phone accounts for the largest percentage of information flow, with e-mail placing second.
  • While original information on paper continues to grow, most comes in the form of office documents and mail - not books, newspapers and journals.
  • North Americans consume 24 reams, or 11,916 sheets, of paper each year, while residents in the European Union account for 15 reams, or 7,280 sheets.
  • Peer-to-peer file sharing has exploded, and MP3 music files and digital video accounted for 70 percent of the files on the hard disks of users who participate in online file exchanges.
  • Globally, the average Internet user spends 11.5 hours online per month, but the average Internet user in the United States spends more than twice that amount.

The decreasing cost of digital storage using magnetic hard drives and optical storage media, such as the CD-ROM and DVD, explains the surge in person-to-person retrieval and storage on the Internet of music and video, Lyman said.

This has made the Web akin to a utility that offers easy, steady access not just to institutions, but to the individual, he said.

"The democratization of publishing is something that we thought would happen, and it has happened," Lyman said.

The report is available online. The research team included UC Berkeley research assistants Kirsten Swearingen, Joyojeet Pal, Laheem Lamar Jordan, Peter Charles and Nathan Good.

The researchers relied on various information sources and reports, and conducted their own sampling of nearly 10,000 websites to help determine the Web's size and sources. They also studied desktop disk drives to learn how people consume Internet information.