Wednesday, November 12, 2008

Hadoop: Powerful Software. its Funny Name

internetworkBehind Yahoo's (Nasdaq: YHOO) Latest News about Yahoo push to open up Web search and advertising Learn how you can enhance your email marketing program today. Free Trial - Click Here. is software powerful enough to sort through the entire Library of Congress in less than half a minute.

The software, called "Hadoop," is part of Yahoo's massive computing grid and is transforming the way that Yahoo and corporate giants like IBM (NYSE: IBM) Latest News about IBM extract meaning from enormous streams of data. Universities are also using the code -- an open source Rackspace is the expert when it comes to delivering Windows and Linux hosting solutions. Click here to learn more. Latest News about open source version of software Google (Nasdaq: GOOG) Latest News about Google relies on for daily operation -- to train a new generation of computer scientists and engineers.

"It makes it possible to actually take advantage of all the computers that we have hooked together," said Larry Heck, vice president of search and advertising sciences at Yahoo.

Data Sniffer
Hadoop improves the relevance of ads Yahoo shows on the Internet by analyzing the company's endless flow of data -- now well over 10 terabytes a day -- on the fly. As users click from Yahoo Mail to Yahoo Search to Yahoo Finance and back again, Hadoop helps figure out what ad, if any, is likely to catch someone's attention.

The key lies in mining insights from mind-boggling amounts of data. If a woman repeatedly reads reviews of sport-utility vehicles, then clicks on automotive classifieds and then orders a book about helping a child adjust to kindergarten, she might be in the market for a new family-size car, according to a Yahoo sales presentation.

As part of the push for more openness, Yahoo will be using the technology not only to boost ad sales on its own Web sites, but on sites owned by the 796 members of a newspaper consortium that is working with the search giant to sell more advertising at better prices. The San Jose Mercury News and its parent company, MediaNews, are members of the partnership.

"In some ways, perhaps it is even more targeted than search advertising," said Leon Levitt, vice president of digital media for Cox Newspapers, a consortium member.

Search Builder

For Yahoo, the rollout of an innovative approach to Internet advertising is a major accomplishment. When Yahoo launched its Hadoop project in January 2006 it was selling search advertising for half of what Google charged and watching its share of Internet searches dwindle.

Hadoop was first put to work building Yahoo's Web index -- the biggest computing problem inside Yahoo. Since then, a team of engineers tuned the software, and researchers inside and outside of Yahoo began using it to experiment on giant data sets.

"All of a sudden, instead of waiting overnight people could get the results of their experiments in a minute," said Doug Cutting, a work-at-home dad who hacked out the first version of Hadoop in his spare bedroom in Sonoma County, Calif., as part of an open source search project.

Code Legacy
Cutting, a 44-year-old programmer who had helped build search engines at Apple (Nasdaq: AAPL) Latest News about Apple and Excite, had started the search project in 2000 because he wanted his code to live on. He knew that closed-source projects, where software is treated as a corporate secret, had a way of dying. With open source, the code is published and other programmers can contribute suggestions and help fix bugs.

"It was a pretty ambitious goal, destined for failure in the short term but still worth pursuing in the long term," Cutting said. Plugging away with a core group of volunteers and with support World Class Managed Hosting from PEER 1, Just $299. Click here. from the Apache Foundation, Cutting created a library of code he called "Lucene" and a Web crawler he called "Nutch."

Meanwhile, he earned a living as a consultant for organizations like the Internet Archive and companies like Yahoo. Cutting made some progress but was stymied by the sheer size of the Web. He was able to index only several hundred million Web pages, a fraction of the Web that was already billions of pages and expanding quickly.

Light-Bulb Moment
It was Google that inadvertently supplied the solution. In 2004, Google fellows Jeffrey Dean and Sanjay Ghemawat published a paper about MapReduce, the secret software that Google uses to process raw data using thousands of computers. "It pretty much directly addressed the scaling issue we were having," Cutting said.

Using the clues provided by the Google paper, Cutting wrote Hadoop, which was named after his son's toy elephant. Yahoo saw the code and offered Cutting a job.

While a team of engineers adapted Hadoop to run reliably on tens of thousands of computers, researchers embraced the software as a new data mining Latest News about data mining tool. Word about the brawny program spread rapidly. Early this year, developers at Amazon (Nasdaq: AMZN) Latest News about Amazon.com, Facebook Latest News about Facebook and Intel (Nasdaq: INTC) Latest News about Intel were using Hadoop for everything from log analysis to modeling earthquakes.

"Hadoop gave me, an ordinary developer, the ability to do something extraordinary," said Jinesh Varia, a Web services evangelist at Amazon.

Google quickly got on board, launching an initiative with IBM to provide universities like Stanford, UC-Berkeley, MIT and Carnegie Mellon with clusters of several hundred computers so students could learn new techniques for parallel programming. Since Google's MapReduce was a trade secret, Google and IBM announced that the students would be taught on Hadoop.

"We are leveraging not only the contribution that we are giving to the software, but the contributions from the larger community as well, and everybody wins from it," said Heck of Yahoo.

Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

0 komentar: on "Hadoop: Powerful Software. its Funny Name"

Post a Comment