Multilingualism on the Web - LightNovelsOnl.com
You're reading novel online at LightNovelsOnl.com. Please use the follow button to get notifications about your favorite novels and its latest chapters so you can come back anytime and won't miss anything.
The International a.s.sociation for Machine Translation (IAMT) heads a worldwide network with three regional components: the a.s.sociation for Machine Translation in the Americas (AMTA), the European a.s.sociation for Machine Translation (EAMT) and the Asia-Pacific a.s.sociation for Machine Translation (AAMT).
The a.s.sociation for Machine Translation in the Americas (AMTA) presents itself as an a.s.sociation dedicated to anyone interested in the translation of languages using computers in some way. It has members in Canada, Latin America, and the United States. This includes people with translation needs, commercial system developers, researchers, sponsors, and people studying, evaluating, and understanding the science of machine translation and educating the public on important scientific techniques and principles involved.
The European a.s.sociation for Machine Translation (EAMT) is based in Geneva, Switzerland. This organization serves the growing community of people interested in MT (machine translation) and translation tools, including users, developers, and researchers of this increasingly viable technology.
The Asia-Pacific a.s.sociation for Machine Translation (AAMT), formerly called the j.a.pan a.s.sociation for Machine Translation (created in 1991), is comprised of three ent.i.ties: researchers, manufacturers, and users of machine translation systems. The a.s.sociation endeavors to develop machine translation technologies to expand the scope of effective global communications and, for this purpose, is engaged in machine translation system development, improvement, education, and publicity.
In Web embraces language translation, an article of ZDNN (ZD Network News) of July 21, 1998, Martha L. Stone explains:
"Among the new products in the $10 billion language translation business are instant translators for websites, chat rooms, e-mail and corporate intranets.
The leading translation firms are mobilizing to seize the opportunities. Such as:
SYSTRAN has partnered with AltaVista and reports between 500,000 and 600,000 visitors a day on babelfish.altavista.digital.com, and about 1 million translations per day -- ranging from recipes to complete Web pages.
About 15,000 sites link to babelfish, which can translate to and from French, Italian, German, Spanish and Portuguese. The site plans to add j.a.panese soon.
'The popularity is simple. With the Internet, now there is a way to use US content. All of these contribute to this increasing demand,' said Dimitros Sabatakakis, group CEO of SYSTRAN, speaking from his Paris home.
Alis technology powers the Los Angeles Times' soon-to-be launched language translation feature on its site. Translations will be available in Spanish and French, and eventually, j.a.panese. At the click of a mouse, an entire web page can be translated into the desired language.
Globalink offers a variety of software and Web translation possibilities, including a free e-mail service and software to enable text in chat rooms to be translated.
But while these so-called 'machine' translations are gaining worldwide popularity, company execs admit they're not for every situation.
Representatives from Globalink, Alis and SYSTRAN use such phrases as 'not perfect' and 'approximate' when describing the quality of translations, with the caveat that sentences submitted for translation should be simple, grammatically accurate and idiom-free.
'The progress on machine translation is moving at Moore's Law -- every 18 months it's twice as good,' said Vin Crosbie, a Web industry a.n.a.lyst in Greenwich, Conn. 'It's not perfect, but some [non-English speaking] people don't realize I'm using translation software.'
With these translations, syntax and word usage suffer, because dictionary-driven databases can't decipher between h.o.m.onyms -- for example, 'light' (as in the sun or light bulb) and 'light' (the opposite of heavy).
Still, human translation would cost between $50 and $60 per Web page, or about 20 cents per word, SYSTRAN's Sabatakakis said.
While this may be appropriate for static 'corporate information' pages, the machine translations are free on the Web, and often less than $100 for software, depending on the number of translated languages and special features."
4.3. Computer-a.s.sisted Translation
Within the World Health Organization (WHO), Geneva, Switzerland, the Computer-a.s.sisted Translation and Terminology (Unit (CTT) is a.s.sessing technical options for using computer-a.s.sisted translation (CAT) systems based on "translation memory". With such systems, translators have immediate access to previous translations of portions of the text before them. These reminders of previous translations can be accepted, rejected or modified, and the final choice is added to the memory, thus enriching it for future reference. By archiving daily output, the translator would soon have access to an enormous "memory" of ready-made solutions for a considerable number of translation problems. Several projects are currently under way in such areas as electronic doc.u.ment archiving and retrieval, bilingual/multilingual text alignment, computer-a.s.sisted translation, translation memory and terminology database management, and speech recognition.
Contrary to the imminent outbreak of the universal translation machine announced some 50 years ago, the machine translation systems don't yet produce good quality translations. Why not? Pierre Isabelle and Patrick Andries, from the Laboratoire de recherche appliquee en linguistique informatique (RALI) (Laboratory for Applied Research in Computational Linguistics) in Montreal, Quebec, explain this failure in La traduction automatique, 50 ans apres (Machine translation, 50 years later), an article published in the Dossiers of the daily cybermagazine Multimedium:
"The ultimate goal of building a machine capable of competing with a human translator remains elusive due to the slow progress of the research. [...]
Recent research, based on large collections of texts called corpora - using either statistical or a.n.a.logical methods - promise to reduce the quant.i.ty of manual work required to build a MT [machine translation] system, but it is less sure than they can promise a substantial improvement in the quality of machine translation. [...] the use of MT will be more or less restricted to information a.s.similation tasks or tasks of distribution of texts belonging to restricted sub-languages."
According to Yehochua Bar-Hillel's ideas expressed in The State of Machine Translation, an article published in 1951, Pierre Isabelle and Patrick Andries define three MT implementation strategies: 1) a tool of information a.s.similation to scan multilingual information and supply rough translation, 2) situations of "restricted language" such as the METEO system which, since 1977, has been translating the weather forecasts of the Canadian Ministry of Environment, 3) the human being/machine coupling before, during and after the MT process, which is not inevitably economical compared to traditional translation.
The authors favour "a workstation for the human translator" more than a "robot translator":
"The recent research on the probabilist methods permitted in fact to demonstrate that it was possible to modelize in a very efficient way some simple aspects of the translation relations.h.i.+p between two texts. For example, methods were set up to calculate the correct alignment between the text sentences and their translation, that is, to identify the sentence(s) of the source text which correspond(s) to each sentence of the translation. Applied on a large scale, these techniques allow the use of archives of a translation service to build a translation memory which will often permit the recycling of previous translation fragments. Such systems are already available on the translation market (IBM Translation Manager II, Trados Translator's Workbench by Trados, RALI TransSearch, etc.)
The most recent research focuses on models able to automatically set up the correspondences at a finer level than the sentence level: syntagms and words.
The results obtained foresee a whole family of new tools for the human translator, including aids for terminological studying, aids for dictation and translation typing, and detectors of translation errors."
5. LANGUAGE-RELATED RESEARCH
[In this chapter:]
[5.1. Machine Translation Research / 5.2. Computational Linguistics / 5.3.
Language Engineering / 5.4. Internationalization and Localization]
5.1. Machine Translation Research
The CL/MT Research Group (Computational Linguistics (CL) and Machine Translation (MT) Group) is a research group in the Department of Language and Linguistics at the University of Ess.e.x, United Kingdom. It serves as a focus for research in computational, and computationally oriented, linguistics. It has been in existence since the late 1980s, and has played a role in a number of important computational linguistics research projects.
Founded in 1986, the Center for Machine Translation (CMT) is now a research center within the new Language Technologies Inst.i.tute at the School of Computer Science at Carnegie Mellon University (CMU), Pittsburgh, Pennsylvania. It conducts advanced research and development in a suite of technologies for natural language processing, with a primary focus on high-quality multilingual machine translation.
Within the CLIPS Laboratory (CLIPS: Communication langagiere et interaction personne-systeme = Language Communication and Person-System Communication) of the French IMAG Federation, the Groupe d'etude pour la traduction automatique (GETA) (Study Group for Machine Translation) is a multi-disciplinary team of computer scientists and linguists. Its research topics concern all the theoretical, methodological and practical aspects of computer-a.s.sisted translation (CAT), or more generally of multilingual computing. The GETA partic.i.p.ates in the UNL (Universal Networking Language) project, initiated by the Inst.i.tute of Advanced Studies (IAS) of the United Nations University (UNU).
"UNL (Universal Networking Language) is a language that - with its companion "enconverter" and "deconverter" software - enables communication among peoples of differing native languages. It will reside, as a plug-in for popular World Wide Web browsers, on the Internet, and will be compatible with standard network servers. The technology will be shared among the member states of the United Nations. Any person with access to the Internet will be able to "enconvert" text from any native language of a member state into UNL. Just as easily, any UNL text can be "deconverted" from UNL into native languages. United Nations University's UNL Center will work with its partners to create and promote the UNL software, which will be compatible with popular network servers and computing platforms."
The Natural Language Group (NLG) at the Information Sciences Inst.i.tute (ISI) of the University of Southern California (USC) is currently involved in various aspects of computational/natural language processing. The group's projects are: machine translation; automated text summarization; multilingual verb access and text management; development of large concept taxonomies (ontologies); discourse and text generation; construction of large lexicons for various languages; and multimedia communication.
Eduard Hovy, Head of the Natural Language Group, expained in his e-mail of August 27, 1998:
"Your presentation outline looks very interesting to me. I do wonder, however, where you discuss the language-related applications/functionalities that are not translation, such as information retrieval (IR) and automated text summarization (SUM). You would not be able to find anything on the Web without IR! -- all the search engines (AltaVista, Yahoo!, etc.) are built upon IR technology.
Similarly, though much newer, it is likely that many people will soon be using automated summarizers to condense (or at least, to extract the major contents of) single (long) doc.u.ments or lots of (any length) ones together. [...]
In this context, multilingualism on the Web is another complexifying factor.
People will write their own language for several reasons -- convenience, secrecy, and local applicability -- but that does not mean that other people are not interested in reading what they have to say! This is especially true for companies involved in technology watch (say, a computer company that wants to know, daily, all the j.a.panese newspaper and other articles that pertain to what they make) or some Government Intelligence agencies (the people who provide the most up-to-date information for use by your government officials in making policy, etc.). One of the main problems faced by these kinds of people is the flood of information, so they tend to hire 'weak' bilinguals who can rapidly scan incoming text and throw out what is not relevant, giving the relevant stuff to professional translators. Obviously, a combination of SUM and MT (machine translation) will help here; since MT is slow, it helps if you can do SUM in the foreign language, and then just do a quick and dirty MT on the result, allowing either a human or an automated IR-based text cla.s.sifier to decide whether to keep or reject the article.
For these kinds of reasons, the US Government has over the past five years been funding research in MT, SUM, and IR, and is interested in starting a new program of research in Multilingual IR. This way you will be able to one day open Netscape or Explorer or the like, type in your query in (say) English, and have the engine return texts in *all* the languages of the world. You will have them cl.u.s.tered by subarea, summarized by cl.u.s.ter, and the foreign summaries translated, all the kinds of things that you would like to have.
You can see a demo of our version of this capability, using English as the user language and a collection of approx. 5,000 texts of English, j.a.panese, Arabic, Spanish, and Indonesian, by visiting MuST Multilingual Information Retrieval, Summarization, and Translation System.
Type your query word (say, 'baby', or whatever you wish) in and press 'Enter/Return'. In the middle window you will see the headlines (or just keywords, translated) of the retrieved doc.u.ments. On the left you will see what language they are in: 'Sp' for Spanish, 'Id' for Indonesian, etc. Click on the number at left of each line to see the doc.u.ment in the bottom window. Click on 'Summarize' to get a summary. Click on 'Translate' for a translation (but beware: Arabic and j.a.panese are extremely slow! Try Indonesian for a quick word-by-word 'translation' instead).
This is not a product (yet); we have lots of research to do in order to improve the quality of each step. But it shows you the kind of direction we are heading in."
"How do you see the future of Internet-related activities as regards languages?"
"The Internet is, as I see it, a fantastic gift to humanity. It is, as one of my graduate students recently said, the next step in the evolution of information access. A long time ago, information was transmitted orally only; you had to be face-to-face with the speaker. With the invention of writing, the time barrier broke down -- you can still read Seneca and Moses. With the invention of the printing press, the access barrier was overcome -- now *anyone* with money to buy a book can read Seneca and Moses. And today, information access becomes almost instantaneous, globally; you can read Seneca and Moses from your computer, without even knowing who they are or how to find out what they wrote; simply open AltaVista and search for 'Seneca'. This is a phenomenal leap in the development of connections between people and cultures. Look how today's Internet kids are incorporating the Web in their lives.
The next step? -- I imagine it will be a combination of computer and cellular phone, allowing you as an individual to be connected to the Web wherever you are. All your diary, phone lists, grocery lists, homework, current reading, bills, communications, etc., plus AltaVista and the others, all accessible (by voice and small screen) via a small thing carried in your purse or on your belt.