Cross Language Translation in a Web-based Environment - Sample Essay

One of the major problems with cross-language translations involving those that are rarely used together (i. e. Finish ?? Lithuanian) is that there are no dictionaries available or it is extremely difficult to find one. The main problem is that there are not enough people to create a market and no one would invest in creation of such kind of dictionaries. English-based dictionaries, however, is of abundance. This project tackled the above cited problem in Cross Language Translation using English as its base dictionary.

Artificial intelligence through Neural Networks was used as it appeared well-suited to problems of this nature. For this reason, artificial intelligence through neural networks was investigated as a potential tool to improve translation accuracy but future implementation was left as a possibility. WordNet® was also investigated as source of defining English words and possible tool to achieve greater accuracy in cross-language translations. A large number of people across the world converse in English thus it serves as the primary lingua franca for developments in the research world.

Most publications and journals are published in such language. This leaves publications in other languages inaccessible and apparently, information in English is withheld from those millions who do not speak English (Diekema 2003) Recent trends promote the construction of a far-reaching complex infrastructure for transporting information across boundaries. Apparently, language shares a vital portion in the hindrances presented by National borders.

Whilst the fact is inevitable that English remains the most spoken language in the whole world and though, it is true that the spread of ‘World English’ can promote cooperation and equity, longstanding linguistic competition threatens to be even more divisive in a globalizing world. (Maurais et al n. d. ) A lot is currently going on to overcome these linguistic barriers. The most efficient approach to overcoming such is with cross-language translation and in this literature it will well be mostly in the web-based-online-dictionary aspect of such approach.

English has always been the main focus of information retrieval, well, that is by tradition. Many of them retrieval algorithms and heuristics stem from English speaking countries and thus are based on the said language. Over the years, these retrieval methods have been adopted by other language communities, creating a wide selection of language-specific monolingual retrieval systems. However, to ensure complete information exchange, information retrieval systems need to be multilingual or cross-lingual. (Diekema 2003)

There are a lot of ways to pin down the hindrance of being in this multi-lingual world, the barrier of being in a world divided by being in English- or Non-English-speaking territories. And, as presented, the most researched approach is through Cross-Language translation. We can examine this in greater detail in Figure 3. Word Autobusas is translated from Lithuanian language to Russian based on English language. Two different possible translations occur (bus, omnibus) when translating Lithuanian -> English.

First word “bus” translated from English to Russian has three meanings “автобус” ”омнибус” ”шина” word translated. As first few are synonyms third one has totally different value and meant “Topology bus”. As you can see on reverse translations ”шина” will going to give you four different meanings translated in Lithuanian language. Cross-Language Information Retrieval, its promise. Information retrieval entails an individual querying about something of interest to him. Inevitably, since we are life forms known to be ever inquisitive, we do Information Retrieval in every aspect of our living.

This event so commonly happens in a lot of situation and may be best displayed in a Library when a student picks his book of choice. Formally, let us define Information Retrieval (IR) as the process in which users with information need query a collection of documents to find those documents that satisfy his need. (Diekema 2003) In the electronic realm, the user queries by typing in related words, the system then processes these keywords to create a representation understandable by the system.

In the course of the procedure, the system usually strips off non-bearing fragments of the query keywords such as articles like determiners, prepositions, and pronouns. The document collection undergoes the same process resulting to a list of document representations or a catalogue. To find documents that are similar to the query, the ‘stripped off’ query representation is then matched against the catalogue. When a certain degree of similarity between the catalogue and the ‘stripped off’ query has been established, the documents with the uppermost similarity scores (depending on the settings, say top 10) are shown to the user as results.

This occurs typically during browsing through the internet and Google. comTM best displays this example. A development of IR is CLIR – the Cross-Language Information Retrieval, which, as the name implies, is information retrieval in a multi-linguistic environment. Consequently, CLIR techniques simplify searching by multilingual users and allow monolingual searchers to judge relevance based on machine translated results and/or to allocate expensive translation resources to the most promising foreign language documents.

(Diekema 2003) Simple IR systems only consist of a Query, an Input Cleanser, a Matcher, the Document database and the Output, in logical order. The addition of Language Translators would make this system a Cross-Language Information Retrieval system. Of course the Document database would now contain multi-lingual entries as well and the output is to be presented in the way the query has been placed in the input. Figure 4 would show the Cross-Language Information Retrieval system in schematics.