LiveTrans equips with the state-of-the-art Web-mining technologies for automatically translating search terms not covered by any bilingual dictionary

Introduction

The LiveTrans system is a meta-search engine that offers cross-language search capability for retrieval of both Web pages and images. Most existing cross-language Web search systems rely on bilingual dictionaries and dictionary lookup, by which search terms in one (source) language can be properly transformed into their translations for searching documents written in another (target) language. However, previous Web search engine log analyses reveal that 81% of the search terms in real Web queries could not be obtained from common translation dictionaries. Such search terms include proper names, new terminology, etc. Moreover, real Web queries are often short; the average query length for a Web search is about 2.3 words in English and 3.18 characters in Chinese. The phenomenon easily leads to low retrieval performance. For example, one may imagine if someone wants to search Chinese documents by the English query, Yo-Yo Ma and concert, and the translation of the person name, Yo-Yo Ma, is not found in the bilingual dictionaries, lots of the returned Chinese documents about 'concert' might not be relevant to Yo-Yo Ma.

One of the major bottlenecks in cross-language information retrieval applications is the lack of appropriate and up-to-date bilingual dictionaries, which contain the translations of popular query terms, such as new terminology and proper names. In order to discover more effective query translations, the LiveTrans system was developed to demonstrate a state-of-the-art technology for automatically translating search terms not included in the bilingual dictionaries with an innovative Web mining technique. A few technical papers regarding the research have been published on premier conference proceedings, such as SIGIR'04, ACL'04 and JCDL'04, and journals, such as ACM TOIS'04 and ACM TALIP'02, which have shown the system's great potential to benefit the development of cross-language information retrieval research.

Last Update: Tue Aug 14 04:25:46 CST 2007