Thursday, April 30, 2009

Web tool 'as important as Google' out

A web tool that "could be as important as Google", according to some experts, has been shown off to the public.

Wolfram Alpha is the brainchild of British-born physicist Stephen Wolfram.

The free program aims to answer questions directly, rather than display web pages in response to a query like a search engine.

The "computational knowledge engine", as the technology is known, will be available to the public from the middle of May this year.

"Our goal is to make expert knowledge accessible to anyone, anywhere, anytime," said Dr Wolfram at the demonstration at Harvard University's Berkman Center for Internet and Society.

The tool computes many of the answers "on the fly" by grabbing raw data from public and licensed databases, along with live feeds such as share prices and weather information.

People can use the system to look up simple facts - such as the height of Mount Everest - or crunch several data sets together to produce new results, such as a country's GDP.

Other functions solve complex mathematical equations, plot scientific figures or chart natural events.

"Like interacting with an expert, it will understand what you're talking about, do the computation, and then present you with the results," said Dr Wolfram.

As a result, much of the data is scientific, although there is also limited cultural information about pop stars and films.

Dr Wolfram said the "trillions of pieces of data" were chosen and managed by a team of "experts" at Wolfram Research, who also massage the information to make sure it can be read and displayed by the system.

Nova Spivak, founder of the web tool Twine, has described Alpha as having the potential to be as important to the web as Google.

"Wolfram Alpha is like plugging into a vast electronic brain," he wrote earlier this year. "It computes answers - it doesn't merely look them up in a big database."

Learning language

The new tool uses a technique known as natural language processing to return answers.

This allows users to ask questions of the tool using normal, spoken language rather than specific search terms.

For example, a relatively simple search, such as "who was the president of Brazil in 1923?", will return the answer "Artur da Silva Bernardes".

This technique has long been the holy grail of computer scientists who aim to allow people to interact with computers in an instinctive way.

Dr Wolfram said that Alpha has solved many of the problems of interpreting people's questions.

"We thought there would be a huge amount of ambiguity in search terms, but it turns out not to be the case," he said.

In addition, he said, the system had got "pretty good at removing linguistic fluff", the kinds of words that are not necessary for the system to find and compute the relevant data.

Simple text

However, he said, most users tend to stop using structured sentences fairly quickly.

"Pretty soon they get lazy, and they say 'I don't need all those extra words'."

Instead they tended to use "concepts" similar to how most people use search engines today.

But Dr Boris Katz of the Massachusetts Institute of Technology, a natural language expert, said he was "disappointed" by Dr Wolfram's "dismissal of English syntax as 'fluff'''.

For example, he said, suppose someone asks ''When did Barack Obama visit Nicolas Sarkozy?"

"Here, understanding the sentence structure is important if you want to be able to distinguish cases where it was Barack Obama who visited Nicolas from cases where it was Nicolas Sarkozy who visited Barack Obama," he said.

"I believe he is misguided in treating language as a nuisance instead of trying to understand the way it organises concepts into structures that require understanding and harnessing."

Dr Katz is the head of the Start project, a natural language processing tool that claims to be "the world's first web-based question answering system". It has been on the web since December 1993.

Like Alpha, the system searches a series of organised databases to return relevant answers to search queries. However, it only uses public databases and runs on a much smaller scale than Alpha.

Dr Katz said, it answers "millions of questions from hundreds of thousands of users from around the world" on topics as diverse as places, movies, people and dictionary definitions.

It is also able to compute answers form several sources in a similar way to Alpha.

Web companies have also harnessed natural language processing.

For example, Powerset uses technology developed at the Palo Alto Research Center, the former research laboratories of Xerox.

The company is attempting to build a similar search engine "that reads and understands every sentence on the Web".

In May 2008, the company released a tool that allowed people to search parts of Wikipedia. Two months later, it was acquired by Microsoft.

Dr Wolfram said he has been working on Alpha for several years. However, he imagines that it will continue to evolve.

"In a sense we are at the beginning," he said.

No comments: