Today's web is so huge and diverse that it arguably
reflects the real world. For this reason, searching the web
is a promising approach to find things in the real world.
This paper presents NEXAS, an
extension to web search engines that attempts to find real-world
entities relevant to a topic. Its basic idea is to extract proper
names from the web pages retrieved for the topic. A main advantage of
this approach is that users can query any topic and learn about
relevant real-world entities without dedicated databases
for the topic. In particular, we focus on an application
for finding authoritative people from the web. This application is
practically important because once personal names are obtained,
they can lead users from the web to managed
information stored in digital libraries.
To explore effective ways of finding people, we first examine the
distribution of Japanese personal names by analyzing about 50 million
Japanese web pages. We observe that personal names appear frequently
on the web, but the distribution is highly influenced by automatically
generated texts. To remedy the bias and
find widely acknowledged people accurately, we utilize the number of
web servers containing a name instead of the number of web pages. We
show its effectiveness by an experiment covering a wide range of
topics. Finally, we demonstrate several examples and suggest possible
applications.