Our research interest is on mining of Web data resources, such as Web anchor texts and query logs, to discover useful knowledge for improving Web search engines and information extraction systems.

About Us

History

The lab has been founded by Dr. Lee-Feng Chien since 1993. It was originally named as CSMART team and Intelligent Information Retrieval Lab. There are currently nine talented young colleagues working with the lab, including two postdoctoral fellows, three Ph.D. candidate researchers and four master-degree research associates (two of them are adjunct associates).

Previous Work

Our early research focused on the exploration of fundamental techniques for Chinese natural language processing (NLP) and information retrieval (IR). We have studied on Chinese parsing [1], language modeling and Mandarin speech recognition [2~4] for many years and obtained solid experience. Since 1995, we have paid much attention to Chinese text retrieval [5~7] and developed an effective retrieval system called Csmart. The system was successfully transferred to industry in Taiwan during 1995 and 1996. We began study of Mandarin spoken document retrieval in 1997 [8~11] and were an early pioneer team in dealing with robust retrieval of spoken documents via voice queries. In 1997, the director of the lab received the first K. T. Li Distinguished Young Scholar Award presented by the ACM Taipei Chapter for his contributions in Chinese Information Retrieval.

Terminology processing is a crucial and fundamental research area related to advanced IR. The recent trends in Web information retrieval, such as domain-specific term extraction from Web pages and automatic thesaurus construction for Web search, have stimulated the need for automatic terminology processing. Since 1997, we have been conducting a series of studies on automatic terminology processing, including term extraction and terminological knowledge extraction [12-27], and obtained international recognition. We have developed an innovative PAT-tree-based approach to term extraction and received the 1998 ACM SIGIR Best Poster Presentation Award for late-breaking research in Australia (http://www.acm.org/sigir/awards/awards.html).

Recent Research

Our recent research has focused much on Web mining and information retrieval. We have conducted a series of studies on mining of Web data resources, such as Web anchor texts and query logs to discover terminological knowledge for improving Web search engines. Our team presented a novel, effective log-based approach to performing relevant term extraction and term suggestion [18, 19]. In addition, we proposed an anchor-text-based technique to exploit Web anchor texts as live bilingual corpora and reduce the existing difficulties involved in query term translation [20~22]. The technique makes it possible for bilingual translations of many users' queries to be automatically extracted through mining of Web anchor texts. At the same time, we have tried to discover users' search vocabularies and concept hierarchies by means of query log mining. We have developed a set of log mining techniques, including query clustering and new query categorization, to create query taxonomies automatically [23~24]. These techniques are able to organize users' search vocabularies into hierarchical structures, extract relevant terms for queries and assign subject categories to them. With the above basis, we are exploring advanced retrieval techniques, such as topic-based IR, cross-language IR, personalized IR, speech IR, image IR, ontology-based IR, P2P IR, etc.

Contact Us

128, Academia Road Sec 2, Nankang, Taipei 11529, Taiwan
Tel: +886-2-27883799, Fax: +886-2-27824814
Last Update: Tue Aug 14 16:11:07 CST 2007