Our research interest is on mining of Web data resources, such as Web anchor texts and query logs, to discover useful knowledge for improving Web search engines and information extraction systems.

Projects

LiveTrans

http://wkd.iis.sinica.edu.tw/LiveTrans/
Wen-Hsiang Lu, Ruey-Cheng Chen, Pu-Jen Cheng, Lee-Feng Chien

WWW provides users to get a lot of information conveniently; however, we usually do not know the exact translation of a foreign proper noun between our native language to English or some other foreign language.

LiveTrans provides the users to query the information of a foreign proper noun without knowing the exact translation first. For example, the tour guide information about "The Greatest Wall" are usually introduced in Chinese web pages; thus, utilizing our LiveTrans system, the pictures about "The Greatest Wall" can be easily obtained without knowing the exacting translation.

LiveImage

http://wkd.iis.sinica.edu.tw/LiveImage/
Ruey-Cheng Chen, Shuo-Peng Liao, Pu-Jen Cheng, Lee-Feng Chien

For more information, visit our system

LiveConcept

http://wkd.iis.sinica.edu.tw/LiveConcept/
Chen-Ming Hung, Lee-Feng Chien

For more information, visit our system

LiveClassifier

Chien-Chung Huang, Guan-Ming Ling, Lee-Feng Chien

Many Web information services utilize techniques of information extraction (IE) to collect important facts from the Web. To create more advanced services, one possible method is to discover thematic information from the collected facts through text classification. However, most conventional text classification techniques rely on manual-labeled corpora and are thus ill-suited to cooperate with Web information services with open domains. In this work, we present a system named LiveClassifier that can automatically train classifiers through Web corpora based on user-defined topic hierarchies. Due to its flexibility and convenience, LiveClassifier can be easily adapted for various purposes. New Web information services can be created to fully exploit it; human users can use it to create classifiers for their personal applications.

LiveFilter

Jeng-Haur Wang, Lee-Feng Chien

As email communication become the necessary daily work in this century, the email spam or advertisement email also become the largest harassment in our life. Thus, to filter those emails bothering the users is one of the popular research topica.

Natural Language Processing

Chen-Ming Hung, Yi-Cheng Pan, Lee-Feng Chien

Chinese Natural Language Processing is one of the most popular field in the near future. Based on the skills developed in WKD Lab, we also start the research about Natural Language Processing; but different from the traditional technology, web-based corpora is used as the resources of corpora. Under such a large corpora by WWW, more exciting discoveries are expected.

Last Update: Tue Aug 14 16:11:08 CST 2007