The Internet Without English

Pointing the Way to a True World-Wide Web
Having gained a strong foothold in the English-language market, the Internet’s major growths today come from non-English-speaking areas. For instance, Web content in Arabic is doubling each year; the online population of Latin America grew by 577 percent from 2000 to 2007; and the number of registered domain names in Mainland China surged by 137.5 percent a year.

A challenge and an opportunity posed by this phenomenon is the need for better Web searching and browsing in non-English languages. Wingyan Chung, assistant professor of Operations and Management Information Systems at the Leavey School of Business, is gaining recognition as a leading researcher on this issue.

“My research has developed and validated a new framework for supporting Web searching in a multilingual world,” Chung said. His findings are summarized in a paper, “Web Searching in a Multilingual World,” that is the cover story in the May 2008 issue of Communications of the ACM (Association for Computing Machinery), which for five decades has been a leading computer research publication.

In the paper, Chung found that existing search engines in Chinese, Spanish, and Arabic tend to present results as long textual lists that often hinder user understanding and analysis. They also lack a comprehensive coverage of the domain- and region-specific content that even Google and other search engines fall short of providing.

From there, he moved on to describing a new framework for building portals that support Web searching in non-English languages, and testing portals he developed from that framework in the three languages. The Chinese and Spanish portals help users search and browse for business information while the Arabic portal supports searching and analysis of medical information.

Chung’s framework integrates and improves various automatic techniques in text and Web mining to support:
• Statistical language processing (which extracts patterns from a large body of textual documents);
• Meta-searching (which sends queries to multiple search engines and collates the top results from each);
• Web page summarization and categorization (which perform respectively intelligent sentence extraction and text classification), and;
• Web page visualization (which portrays search results in a vivid graphical format for easy understanding).

Each of the portals was then evaluated by people fluent in the language involved. It took three years to complete the work, and he found that the portals achieved better search and browse performance and user satisfaction than the benchmark portals.

Meanwhile, there were areas that still need improvement, which will provide grist for future research. The Chinese portal, for example, could be better on search precision and information quality, while the Spanish portal needs work on its domain-specific collection.

Chung noted that the science of Web searching is less than 20 years old and due to the dominance of English in early Internet development, support for English Web searching is generally better. He hopes his research will point the way toward new improvements in Web search support in non-English languages, which are spoken by billions of people in today’s emerging economies.

“This paper provides a timely review of an important and emerging issue,” he said. “It demonstrates that the framework for supporting non-English Web searching has a high usability for search engine developers and business practitioners.”

