主题:google的挑战者:clusty -- 林小筑
http://clusty.com/
简单地说,这个搜索引擎的卖点在于自动的把搜索结果进行分类组织(clustering)。比如说搜索"java",他就把搜索结果自动分成一下类别。
⇨Technology (32)
⇨Open Source (16)
⇨FAQ, Java programming (16)
⇨JavaScript (22)
⇨Tutorials (14)
⇨Java Applets (17)
⇨Games (13)
⇨Download Java (6)
⇨Reviews (9)
⇨Class (8)
其中有些类还能展开,划分成跟小的类。比如把technology类展开,就成了下面这个样子。
Technology (32)
⇨Developer Forums (2)
⇨Mobile, Information Device Profile (3)
⇨Marketplace For Java Technology (2)
⇨Servlets, XML (3)
⇨Microsoft (2)
⇨Apple, Mac (2)
⇨Certification Java (2)
⇨Java Programming (3)
⇨Other Topics (13)
利用了人工智能技术做的,而不是人类进行的手工分类,所以结果当然不能尽善尽美。但这体现了一种崭新的思想:当网上信息量多到了泛滥的程度时该怎么办? 应该利用计算机来帮人类过滤和组织这些信息。
说起来,其实google也有了类似的东西,就是其新闻聚合器。http://news.google.com.hk/news?ned=cn&hl=zh-CN
http://clusty.com/
New Company Starts Up a Challenge to Google
September 30, 2004
By JOHN MARKOFF
SAN FRANCISCO, Sept. 29 - Google executives have long
conceded that one of their great fears is to be overtaken
by a more advanced Internet search technology. Vivisimo, a
company founded by three former Carnegie Mellon University
computer scientists, is hoping to prove that Google's
worries are well founded.
Four-year-old Vivisimo plans to start Clusty, a free,
consumer search service based on results from Yahoo's
Overture engine, Thursday.
Vivisimo already offers a search service for corporate
customers, which clusters results into categories to make
them easier to sort through. Search "swift boat," for
example, and Vivisimo returns 149 results - listing them
one by one, and also as a table of categories, like "Swift
Boat Veterans," "John Kerry" and "Patrol Craft Fast" on the
left-hand side of the Web page.
The new Clusty service for consumers, which will be free
and supported by advertising revenue, uses a similar
organizational structure. But it also presents a series of
tabs enabling the user to see results from sources besides
the general Web, including shopping information, yellow
pages, news, blogs, and images.
Vivisimo, which is privately held and is profitable,
according to its executives, has been selling its
clustering technology to corporations for research by their
employees. Now Vivisimo is making an effort to compete more
broadly by attracting consumers to its Web site,
clusty.com.
The service is meant to address the confusion that can be
created when search engines return huge lists. Clustering
is also intended to help users find related material they
may overlook when they employ services that utilize page
ranking methods. Such methods employ a variety of software
algorithms to rank Web pages by their perceived relevance
to a query.
Many search experts say that clustering offers a better way
of looking at information than Google's page ranking
system.
"As databases get larger, trying to pull the proverbial
needle out of the haystack gets tougher and tougher," said
Gary Price, a librarian who is also the news editor at
SearchEngineWatch, a Web site that covers the industry.
"Here, you're getting a bit of extra help."
Vivisimo's co-founder and chief executive, Raul
Valdes-Perez, was a protégé of Herbert A. Simon, a Nobel
laureate who was a pioneer in artificial intelligence
research. Before co-founding Vivisimo, Mr. Valdes-Perez was
a computer scientist at Carnegie Mellon University. He
professes that the way to deal with information overload is
with information "overlook" - techniques that strip away
extraneous information.
Clusty would generate money for Vivisimo by placing several
search-related advertisements from Overture on the
right-hand side of each page. Revenue from the ads would be
shared by Vivisimo and Overture.
Unlike many start-ups, which are launched with venture
capital financing, Vivisimo was created with help from a $1
million grant from the National Science Foundation Small
Business Innovation Research program, which is intended to
stimulate innovation by new companies.
Vivisimo is not the first to introduce clustering for Web
surfers. Northern Light, a search engine company founded in
1996, had offered a consumer service featuring what it
called "custom search folders." But that company is now
focused on corporate applications.
Google is also using clustering technology, but in a more
limited fashion: its news page provides links to topics
that appear on news sites.
Microsoft and Yahoo have been drawn into the search
business in part because of Google's profitability and
rapidly growing revenue - $962 million for the quarter that
ended in June, up from $389 million in the previous
quarter.
The introduction of Clusty comes two weeks after A9, a
subsidiary of Amazon.com, introduced a service focused on
organizing information retrieved during various Web
searches.
"Search will look more like the magazine business than the
soda market," said Oren Etzioni, a computer scientist at
University of Washington and an advisory board member of
Vivisimo. He predicts that users might select from a
variety of services, rather than from a few dominant
players.
"The competition has shifted from crawling the Web and
returning an answer quickly," Mr. Etzioni said, "to adding
value to the information that has been retrieved."
A Google spokesman declined to comment on the service.
Vivisimo's executives are betting that there is an audience
for providing a different view of Web search results.
"Google is excellent at crawling as much of the Web as they
can; we don't do that," said Mr. Valdes-Perez. Instead,
Vivisimo tackles the question, "How do you solve the
problem of information overload?"
http://www.nytimes.com/2004/09/30/technology/30search.html?ex=1097903707&ei=1&en=87e20490beecdd4b
感觉比Goolge更有条理,以后我们又多了一条枪
但不支持中文!
版. 根据介绍, 应该还是based on text/context, keyword的. 这方面的研究已经很久(其实比search engine早得多), 不过一直没有象Google一样在整个Web范围实践过.
最近的一个Project就是在做类似的事: 在一个search engine中增加categorization的选项.
clusty上好像没看到技术介绍,你找到了么?
你提的那些技术,请指点资料好么?
刚刚用中文作了搜索。中文的内容和古狗百度还是不能比的。而且没有网页快照。有些连接点击之后早就过时了。不象古狗百度还可以从网页快照里知道一些内容。
我觉得对于一般的dummy user来说,使用google就足够了. 这个新的搜索引擎对于想搞研究的人可能比较有用,比如说,可以通过分类对搜索内容进行thorough review.
Overture的结果当然比不上google
这个clusty只是指了一个方向,因为信息实在太多了,所以在返回搜索结果前要让计算机作一些过滤和提炼,才会对人类更有用。这说明搜索引擎仍大有可为。
google的人才储备很强,要做和这个clusty差不多(大致应该用到自然语言处理和机器学习,都是google的强项)的应该不难,甚至应该做得更好。