LightNovesOnl.com

The Internet and Languages Part 4

The Internet and Languages - LightNovelsOnl.com

You're reading novel online at LightNovelsOnl.com. Please use the follow button to get notifications about your favorite novels and its latest chapters so you can come back anytime and won't miss anything.

Global Reach is a means for you to extend your website to many countries, speak to online visitors in their own language and reach online markets there. (...)

Since 1981, when my professional life started, I've been involved with bringing American companies in Europe. This is very much an issue of language, since the products and their marketing have to be in the languages of Europe in order for them to be visible here. Since the web became popular in 1995 or so, I've turned these activities to their online dimension, and have come to champion European e-commerce among my fellow American compatriots. Most lately at Internet World in New York, I spoke about European e-commerce and how to use a website to address the various markets in Europe."

Bill added in July 1999: "After a website's home page is available in several languages, the next step is the development of content in each language. A webmaster will notice which languages draw more visitors (and sales) than others, and these are the places to start in a multilingual web promotion campaign. At the same time, it is always good to increase the number of languages available on a website: just a home page translated into other languages would do for a start, before it becomes obvious that more should be done to develop a certain language branch on a website."

The World Wide Web Consortium (W3C) was founded in October 1994 to develop interoperable technologies (specifications, guidelines, software, and tools) for the web, for example specifications for markup languages (HTML, XML, and others), and to act as a forum for information, commerce, communication and collective understanding. In 1998, the section Internationalization/Localization gave a definition of protocols used for internationalization/localization: HTML, base character set, new tags and attributes, HTTP, language negotiation, URLs & other identifiers including non-ASCII characters, etc. It also offered some help with creating a multilingual website.

The Localisation Industry Standards a.s.sociation (LISA) was created in the mid-1990s as a forum for "software publishers, hardware manufacturers, localization service vendors, and an increasing number of companies from related IT sectors." LISA has defined its mission as "promoting the localization and internationalization industry and providing a mechanism and services to enable companies to exchange and share information on the development of processes, tools, technologies and business models connected with localization, internationalization and related topics". Its website was first housed and maintained by the University of Geneva, Switzerland.

Launched in January 1999 by the European Commission, the website HLTCentral (HLT: Human Language Technologies) gave a short definition of language engineering: "Through language engineering we can find ways of living comfortably with technology. Our knowledge of language can be used to develop systems that recognize speech and writing, understand text well enough to select information, translate between different languages, and generate speech as well as the printed world. By applying such technologies we have the ability to extend the current limits of our use of language. Language enabled products will become an essential and integral part of everyday life."

MACHINE TRANSLATION

= [Quote]

Tim McKenna is an author who thinks and writes about the complexity of truth in a world of flux. He wrote in October 2000: "When software gets good enough for people to chat or talk on the web in real time in different languages, then we will see a whole new world appear before us. Scientists, political activists, businesses and many more groups will be able to communicate immediately without having to go through mediators or translators."

= A definition

Machine translation can be defined as the automated process of translating a text from one language to another language. MT a.n.a.lyzes the text in the source language and automatically generates the corresponding text in the target language. With the lack of any human intervention during the translation process, machine translation (MT) differs from computer-a.s.sisted translation (CAT), which involves some interaction between the translator and the computer.

As explained on the website of SYSTRAN, a company specializing in translation software, "machine translation software translates one natural language into another natural language. MT takes into account the grammatical structure of each language and uses rules to transfer the grammatical structure of the source language (text to be translated) into the target language (translated text). MT cannot replace a human translator, nor is it intended to."

The website of the European a.s.sociation for Machine Translation (EAMT) gives the following definition: "Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful for certain specific applications, usually in the domain of technical doc.u.mentation. In addition, translation software packages which are designed primarily to a.s.sist the human translator in the production of translations are enjoying increasing popularity within professional translation organizations."

Machine translation is the earliest type of natural language processing, as stated on the website of Globalink, a company offering language translation software and services: "From the very beginning, machine translation (MT) and natural language processing (NLP) have gone hand-in-hand with the evolution of modern computational technology. The development of the first general-purpose programmable computers during World War II was driven and accelerated by Allied cryptographic efforts to crack the German Enigma machine and other wartime codes. Following the war, the translation and a.n.a.lysis of natural language text provided a testbed for the newly emerging field of Information Theory.

During the 1950s, research on Automatic Translation (known today as Machine Translation, or 'MT') took form in the sense of literal translation, more commonly known as word-for-word translations, without the use of any linguistic rules. The Russian project initiated at Georgetown University in the early 1950s represented the first systematic attempt to create a demonstrable machine translation system.

Throughout the decade and into the 1960s, a number of similar university and government-funded research efforts took place in the United States and Europe. At the same time, rapid developments in the field of Theoretical Linguistics, culminating in the publication of Noam Chomsky's "Aspects of the Theory of Syntax" (1965), revolutionized the framework for the discussion and understanding of the phonology, morphology, syntax and semantics of human language.

In 1966, the U.S. government-issued ALPAC (Automatic Language Processing Advisory Committee) report offered a prematurely negative a.s.sessment of the value and prospects of practical machine translation systems, effectively putting an end to funding and experimentation in the field for the next decade. It was not until the late 1970s, with the growth of computing and language technology, that serious efforts began once again. This period of renewed interest also saw the development of the Transfer model of machine translation and the emergence of the first commercial MT systems. While commercial ventures such as SYSTRAN and METAL began to demonstrate the viability, utility and demand for machine translation, these mainframe-bound systems also ill.u.s.trated many of the problems in bringing MT products and services to market. High development cost, labor-intensive lexicography and linguistic implementation, slow progress in developing new language pairs, inaccessibility to the average user, and inability to scale easily to new platforms are all characteristics of these second- generation systems."

As explained in August 1998 by Eduard Hovy, head of the Natural Language Group at USC/ISI (University of Southern California/Information Sciences Inst.i.tute), machine translation implies "language-related applications/functionalities that are not translation, such as information retrieval (IR) and automated text summarization (SUM). You would not be able to find anything on the Web without IR! -- all the search engines (AltaVista, Yahoo!, etc.) are built upon IR technology. Similarly, though much newer, it is likely that many people will soon be using automated summarizers to condense (or at least, to extract the major contents of) single (long) doc.u.ments or lots of (any length) ones together."

= Experiences

In December 1997, AltaVista, a leading search engine, was the first to launch a free translation software with Babel Fish -- also called AltaVista Translation --, which could translate webpages (up to three pages at the same time) from English into French, German, Italian, Portuguese or Spanish, and vice versa. The software was developed by SYSTRAN (an acronym for System Translation), a company specializing in machine translation software. SYSTRAN's headquarters are located in Soisy-sous-Montmorency, near Paris, France. Sales, marketing, and research and development are based in its subsidiary in La Jolla, California.

This initiative was followed by other translation software developed by Alis Technologies, Globalink, Lernout & Hauspie, and Softissimo, with free and/or paid versions on the web.

Based in Montreal, Quebec, Alis Technologies has specialized in development and marketing of language handling solutions and services, particularly language implementation in the information technology industry. Alis Translation Solutions (ATS) has offered applications in a number of languages, and tools and services to improve the quality of translations. Language Technology Solutions (LTS) has marketed advanced tools and services for language engineering and information technology (90 languages covered).

Based in Ieper, Belgium, and Burlington, Ma.s.sachusetts, Lernout & Hauspie (L&H) was a leader in advanced speech technology for commercial applications and products, with four core technologies: automatic speech recognition (ASR), text-to-speech (TTS), text-to-text (TTT), and digital speech compression (DSC). Its ASR, TTS and DSC technologies were licensed to companies in telecommunications, computers and multimedia, consumer electronics and automotive electronics. Its TTT translation services were provided to IT companies, and vertical and automation markets. The Machine Translation Group created by Lernout & Hauspie included L&H Language Technology, AppTek, AILogic, NeocorTech, and Globalink. Lernout & Hauspie was later bought by Nuance Communications.

Globalink, a company created in 1990 in the U.S., focused on language translation software and services, i.e. customized translation solutions built around software products, online options, and professional translation services. The software products were available in Spanish, French, Portuguese, German, Italian and English, for individuals, small businesses, multinational corporations and governments, from a stand-alone product giving a fast draft translation to a full system managing professional translations.

As explained on the company website in 1998, "with Globalink's translation applications, the computer uses three sets of data: the input text, the translation program and permanent knowledge sources (containing a dictionary of words and phrases of the source language), and information about the concepts evoked by the dictionary and rules for sentence development. These rules are in the form of linguistic rules for syntax and grammar, and some are algorithms governing verb conjugation, syntax adjustment, gender and number agreement and word re-ordering. Once the user has selected the text and set the machine translation process in motion, the program begins to match words of the input text with those stored in its dictionary. Once a match is found, the application brings up a complete record that includes information on possible meanings of the word and its contextual relations.h.i.+p to other words that occur in the same sentence. The time required for the translation depends on the length of the text. A three-page, 750-word doc.u.ment takes about three minutes to render a first draft translation."

At the headquarters of the World Health Organization (WHO) in Geneva, Switzerland, the Computer-a.s.sisted Translation and Terminology Unit (CTT) has been a pioneer since 1997 in a.s.sessing technical options for using computer-a.s.sisted translation (CAT) systems based on translation memory (TM). With such systems, translators can access previous translations from portions of the text; accept, reject or modify them; and add the new translation to the memory, thus enriching it for future reference. By archiving the daily output, the translator helps in building an extensive translation memory and in solving a number of translation issues. Several projects have been under way at the CTT for electronic doc.u.ment archiving and retrieval, bilingual/multilingual text alignment, computer-a.s.sisted translation, translation memory and terminology database management, and speech recognition.

The Pan American Health Organization (PAHO) in Was.h.i.+ngton, D.C. has developed its own machine translation software, as a common work from its own computational linguists, translators, and system programmers.

The PAHO Translation Unit has used SPANAM (Spanish to English) from 1980 and ENGSPAN (English to Spanish) from 1985, to process over 25 million words between 1980 and 1998. Staff translators and free-lance translators post-edit the raw output to produce high-quality translations with a 30-50% gain in productivity. The software is available in the LAN (Local Area Network) of PAHO Headquarters, and is regularly used by the staff of technical and administrative units. The software is also available in a number of PAHO field offices, and has been licensed to public and non-profit inst.i.tutions in the U.S., Latin America, and Spain. The software was later renamed PAHOMTS, and has included new language pairs with Portuguese.

= Comments

# Comments from ZDNN

In "Web Embraces Language Translation", an article published in ZDNN (ZDNetwork News) on 21 July 1998, Martha Stone explained: "Among the new products in the $10 billion language translation business are instant translators for websites, chat rooms, email and corporate intranets. The leading translation firms are mobilizing to seize the opportunities. Such as:

*SYSTRAN has partnered with AltaVista and reports between 500,000 and 600,000 visitors a day on babelfish.altavista.digital.com, and about 1 million translations per day -- ranging from recipes to complete webpages. About 15,000 sites link to babelfish, which can translate to and from French, Italian, German, Spanish and Portuguese. The site plans to add j.a.panese soon. 'The popularity is simple. With the internet, now there is a way to use U.S. content. All of these contribute to this increasing demand,' said Dimitros Sabatakakis, group CEO of SYSTRAN, speaking from his Paris home.

*Alis technology powers the Los Angeles Times' soon-to-be launched language translation feature on its site. Translations will be available in Spanish and French, and eventually, j.a.panese. At the click of a mouse, an entire webpage can be translated into the desired language.

*Globalink offers a variety of software and web translation possibilities, including a free email service and software to enable text in chat rooms to be translated.

But while these so-called 'machine' translations are gaining worldwide popularity, company execs admit they're not for every situation.

Representatives from Globalink, Alis and SYSTRAN use such phrases as 'not perfect' and 'approximate' when describing the quality of translations, with the caveat that sentences submitted for translation should be simple, grammatically accurate and idiom-free. 'The progress on machine translation is moving at Moore's Law -- every 18 months it's twice as good,' said Vin Crosbie, a web industry a.n.a.lyst in Greenwich, Conn. 'It's not perfect, but some [non-English speaking] people don't realize I'm using translation software.'

With these translations, syntax and word usage suffer, because dictionary-driven databases can't decipher between h.o.m.onyms -- for example, 'light' (as in the sun or light bulb) and 'light' (the opposite of heavy). Still, human translation would cost between $50 and $60 per webpage, or about 20 cents per word, SYSTRAN's Sabatakakis said. While this may be appropriate for static 'corporate information'

pages, the machine translations are free on the web, and often less than $100 for software, depending on the number of translated languages and special features."

# Comments from RALI

Despite the imminent outbreak of a universal translation machine announced at the end of the 1940s, machine translation hasn't produced good translations yet. Pierre Isabelle and Patrick Andries, two scientists from the RALI Laboratory (Laboratory for Applied Research in Computational Linguistics - Laboratoire de Recherche Appliquee en Linguistique Informatique) in Montreal, Quebec, explain the reasons for this failure in "La Traduction Automatique, 50 Ans Apres" (Machine Translation, 50 Years Later), an article published in 1998 by Multimedium, a French-language online magazine: "The ultimate goal of building a machine capable of competing with a human translator remains elusive due to slow progress in research. (...) Recent research, based on large collections of texts called corpora -- using either statistical or a.n.a.logical methods -- has promised to reduce the quant.i.ty of manual work required to build a machine translation (MT) system, but can't promise for sure a significant improvement in the quality of machine translation. (...) The use of MT will be more or less restricted to tasks of information a.s.similation or tasks of text distribution in restricted sub-languages."

According to Yehochua Bar-Hillel's ideas expressed in "The State of Machine Translation", an article published in 1951, Pierre Isabelle and Patrick Andries define three implementation strategies for machine translation: (a) a tool of information a.s.similation to scan multilingual data and supply rough translation, (b) situations of "restricted language" such as the METEO system which, since 1977, has translated the weather forecasts of the Canadian Ministry of Environment, (c) the human/machine coupling before, during and after the machine translation process, that may not save money if compared to traditional translation.

Pierre Isabelle and Patrick Andries favor "a workstation for the human translator" more than a "robot translator": "Recent research on the probabilist methods showed it was possible to modelize in an efficient way some simple aspects of the translation relations.h.i.+p between two texts. For example, methods were set up to calculate the correct alignment between the text sentences and their translation, that is, to identify the sentence(s) of the source text corresponding to each sentence of the translation. Applied on a large scale, these techniques can use the archives of a translation service to build a translation memory for recycling fragments from previous translations. Such systems are already available on the translation market (IBM Translation Manager II, Trados Translator's Workbench by Trados, RALI TransSearch, etc.) The latest research focuses on models that can automatically set up correspondences at a finer level than the sentence level, i.e.

syntagms and words. The results let hope for a bunch of new tools for the human translator, including for the study of terminology, for dictation and translation typing, and for detectors of translation errors."

# Comments from Randy Hobler

In September 1998, Randy Hobler was a consultant in internet marketing at Globalink, after working for IBM, Johnson & Johnson, Burroughs Wellcome, Pepsi, and Heublein. He wrote in an email interview: "We are rapidly reaching the point where highly accurate machine translation of text and speech will be so common as to be embedded in computer platforms, and even in chips in various ways. At that point, and as the growth of the web slows, the accuracy of language translation hits 98% plus, and the saturation of language pairs has covered the vast majority of the market, language transparency (any-language-to-any- language communication) will be too limiting a vision for those selling this technology. The next development will be 'transcultural, transnational transparency', in which other aspects of human communication, commerce and transactions beyond language alone will come into play. For example, gesture has meaning, facial movement has meaning and this varies among societies. The thumb-index finger circle means 'OK' in the United States. In Argentina, it is an obscene gesture.

When the inevitable growth of multimedia, multilingual videoconferencing comes about, it will be necessary to 'visually edit'

gestures on the fly. The MIT (Ma.s.sachusetts Inst.i.tute of Technology) Media Lab, Microsoft and many others are working on computer recognition of facial expressions, biometric access identification via the face, etc. It won't be any good for a U.S. business person to be making a great point in a web-based multilingual video conference to an Argentinian, having his words translated into perfect Argentinian Spanish if he makes the 'O' gesture at the same time. Computers can intercept this kind of thing and edit them on the fly.

There are thousands of ways in which cultures and countries differ, and most of these are computerizable to change as one goes from one culture to the other. They include laws, customs, business practices, ethics, currency conversions, clothing size differences, metric versus English system differences, etc. Enterprising companies will be capturing and programming these differences and selling products and services to help the peoples of the world communicate better. Once this kind of thing is widespread, it will truly contribute to international understanding."

= Machine translation R&D

Here is an overview of the work of four research centers, in Quebec (RALI Laboratory), California (Natural Language Group), Switzerland (ISSCO) and j.a.pan (UNDL Foundation).

Click Like and comment to support us!

RECENTLY UPDATED NOVELS

About The Internet and Languages Part 4 novel

You're reading The Internet and Languages by Author(s): Marie Lebert. This novel has been translated and updated at LightNovelsOnl.com and has already 1071 views. And it would be great if you choose to read and follow your favorite novel on our website. We promise you that we'll bring you the latest novels, a novel list updates everyday and free. LightNovelsOnl.com is a very smart website for reading novels online, friendly on mobile. If you have any questions, please do not hesitate to contact us at [email protected] or just simply leave your comment so we'll know how to make you happy.