


|
Machine Translation: progress or window dressing?
Michael Benis reports on the latest MT products
The three and a half-year itch
Bulletin last reported on machine translation over three years ago, so we thought it was time for an update, with a number of questions uppermost in our minds, above all: has the technology moved on much since then and, if so, has it advanced sufficiently to make it worth while for current users to upgrade? At the same time, we thought we'd provide a quick introduction to the technology and its uses so that readers can evaluate whether buying and using an MT product could help them raise their profile or increase their income.
What's in the box?
Let's first just clear up a couple of definitions. The traditional definition of Machine Translation (MT) in contradistinction to Computer Aided Translation (CAT) is that the former processes a translation all on its own, while the latter provides terminological and other support to a human translator who does the actual translation work themselves. Translation Memory systems like Déjà Vu or TRADOS are typical examples of the latter.
Machine translation systems consist of a component called a "Parser", which uses a dictionary and grammar rules to produce a source language structure, and a "Transformer" which then "transforms" this structure using transformation rules that include bilingual dictionary rules and rules to reorder the words in order to create a target language structure.
Every machine translation system currently works along these lines and contains these components, which are referred to as the "machine translation engine". In addition, they will contain a number of other possible modules, especially if they're designed for professional use as opposed to, for example, free use on the Internet. The principal purpose of these additional components is to allow a knowledgeable user to improve the quality of the final translation by helping the system "understand" any words or word groupings/constructions not already in its dictionaries or coped with by its transformation rules. The modules generally assist this by performing some analysis of the frequency with which such words and phrases occur so that the user can decide which are worth creating custom dictionary entries for and which are not, to make the best possible use of their time. This doesn't just involve "telling" the system how to translate phrases or compound nouns that it doesn't know how to handle, but equally compound nouns such as long company names, for example, that it doesn't need to translate. It's worth remembering that these custom dictionaries can be reused and combined flexibly depending on whether they are generic or for specific customers. These additional components therefore make it possible to significantly improve the "out-of-the-box" quality offered by all professional machine translation systems.
In addition, most systems also provide additional modules that further extend these capabilities, including - for example - to facilitate integration with translation memory systems. Traditionally, this has involved using machine translation systems to process no-match or low-fuzzy-match segments with a view to increasing productivity. This approach has gradually fallen from favour since the translation quality expected of translation memory (which is to say of a human translator) is much higher than that generally expected of machine translation, meaning that substantial editing of such segments is generally required and any initial productivity gains are subsequently lost.
Myths, misconceptions and uses
Now that the quality issue has leapt out of the bag, we might as well tackle it head on. Translators who feel threatened that Machine Translation is going to put them out of a job tend to exaggerate, practically claiming that output quality is invariably so incomprehensible as to be useful only for the purpose of generating confusion or inspiring laughter. Likewise, technologists and manufacturers were at least in the past often tempted to claim that MT can generate translations that are eminently usable without further intervention by translation professional.
The truth is, not surprisingly, somewhere between these two extremes. Although current Machine Translation output is never comparable with that of a competent professional, it will in most cases be more understandable and accurate than the efforts of those not suitably qualified, while generating results very much quicker than any other solution (seconds or minutes rather than hours or even days). Similarly, although Machine Translation output is generally understandable, it nevertheless contains enough "howlers" to act as a perfectly good advertisement for the value of qualified human translators, a fact of which professional translators are increasingly aware and could happily exploit.
Finally, it should be remembered that there are a number of ways in which professional translators can increase the quality of machine translation systems, in addition to the fine-tuning mentioned earlier. This can be done both before and after the systems are used, editing the source text so as to simplify the language in any points that are likely to cause the MT system problems, and/or editing the results to make them more accurate or comprehensible. The latter is known as post-editing.
As a result, there are now four main uses for Machine Translation systems:
· Gisting, which is the provision of "rough" translations for information only when a client simply needs to acquire an overview understanding of a document in order to appraise whether a proposal or call for tenders, for example, is of any interest to them and requires further action, including an accurate "human" translation. In these situations, Machine Translation is particularly useful since it provides a solution that is both extremely rapid and extremely cost-effective. Companies are able to obtain low-quality machine translations free of charge on the Internet, or by using more sophisticated programs in-house. Professional translators can add to the overall benefits of Machine Translation for a very small increase in cost by fine-tuning the systems using custom dictionaries etc and post-editing the output as described above to generate higher-quality output. · Customer focus branding - If you are a translation company or freelance that operates with "direct clients", offering your customers a gisting service can help you to demonstrate a clear service commitment, enabling you to meet your customers needs for low-cost and/or rapid low-quality information. · Marketing - Machine Translation can also be an excellent marketing tool for translation companies and freelance translators alike. In theory, all you are doing is providing a gisting service as described above. In practice, however, there are a number of cunning tricks you can adopt to increase the marketing impact and return on investment you achieve from offering such a service. Firstly, Machine Translations can be offered as a free service over the Internet to drive traffic and potential new customers to your website. If your website then points out that the free translations are the "raw" and unedited output of your Machine Translation system, you can then offer fine-tuning and post editing as chargeable services that add value. Similarly, you can offer your MT customer a discount on any subsequent full "human" translation of the same document to incentivise them to "upgrade" for any documents that are found to be of special interest. In other words, structuring your service not only allows you to keep more areas of customer demand appropriately covered, but also allows you to increase your website traffic (and therefore search engine ranking) as well as your appeal to potential customers. Managed and marketed astutely, an MT service can therefore help grow your business. Increasing the efficiency of TM systems - Although most professional translators are unlikely to find their Translation Memory productivity is increased by using their Machine Translation system to translate no-match or low-fuzzy-match segments unless they have been very cunning with their custom dictionaries, there are other ways in which MT can be used to get the most out of TM. That's especially true in the early stages, when one hasn't yet built up large translation memories and terminology databases in particular. Once one has become a proficient user of most TM and MT systems, it is a relatively easy matter to obtain TM entries for the most common words and phrases that occur in any given translation. This can achieve very helpful productivity increases with most systems, in some cases rising to the spectacular for Déjà Vu, with its sub-segment-level matching technology and Lexicon function. In addition, most MT systems allow you to export your custom dictionaries to formats from which they can then be imported to your TM system's terminology database. You may find this a cost-effective alternative to buying TM companies' expensive terminology extraction programs.
Lastly, it should be noted that there is an increasing trend for MT companies to offer functions that allow their products to consult or import TM memories as well as their own dictionaries to improve quality (assuming the memories have been built by competent human translators). Both the systems reviewed in this article offer this, albeit following different approaches.
So, having established that there are a number of valuable potential uses of MT systems, let's take a look at two of the most popular offerings on the market today, Systran and Prompt.
Evaluating MT
Before proceeding to consider the individual systems, let's briefly pause to consider how they may best be evaluated.
MT output is traditionally evaluated with regard to two interlinked criteria: intelligibility and accuracy, with the former largely referring to clarity of syntax, grammatical correctness, mistranslations and untranslated words, while the latter refers to whether the translation means the same as the original. To understand the distinction, it would be possible for a translation to be completely intelligible but not have the same meaning as the original. Similarly, the translation could have the same meaning as the original but be very difficult to understand, containing structures and grammar that are entirely alien to the target language.
In evaluating an MT system, a number of other aspects also need to be considered, in particular how easy it is to analyse the source text and construct custom dictionaries/new dictionary entries to improve on what would have been the "raw" output. Similarly, it may also be necessary to take other functions such as terminology extraction and integration with TM systems into account. Lastly, usability should also be considered, including integration with common word processors such as Microsoft Word.
Systran
The Big Daddy that offers a big choice
Systran is perhaps the best-known MT system on the market today, partly because it has been around for longer than most and partly because it has the distinction of being the system that is used by the European Commission and US intelligence bodies. One of the advantages of its age is that it offers probably the greatest combination of language directions on the market. It's worth remembering that before you get too carried away about any of the features offered by its competitors, since they'll be no use if your particular language combination/s aren't covered.
Systran also offers a wide range of different product offerings to suit all needs. Professional translators and translation companies really need only consider two versions: Systran Professional Standard priced at £246 69, and Systran Professional Premium costing £528 69, both including VAT. The big difference between the two is that the latter offers a much wider range of specialist glossaries and allows you to build your own. Fortunately, however, there's no need to agonise over the choice, since you can obtain a 30-day trial version to work out whether it's worth your while paying almost double for Professional. While there's no doubt that translation companies would be better off opting for the pricier choice, many freelances are unlikely to find that they suffer from choosing the cheaper option.
Features and usability
As you would expect from a market leader, Systran installs simply enough, pretty much like any other Windows program. The Professional versions also work in Windows emulation on Macs, and provide you with plug-ins for Internet Explorer, Microsoft Word, Outlook, PowerPoint and Excel, together with a Dictionary Manager. A new development for both these top-of the-line versions is Systran's Translation Project Manager, which works like a simple Translation Memory editing interface, showing text with its basic formatting and highlighting the source and target versions of the sentence you have selected to work on.
Systran makes life easy for users by providing just three uncluttered interfaces plus a handy little clipboard taskbar that allows you to use it to translate text in practically any application. Most people will find it easiest to use Systran in Word, where the dedicated toolbar offers access to all the various different options you require. The system also has a nice little safety feature that ensures there is very little room for error, automatically opening a new window for your translation and saving it with a different name so that you can't lose your source language document.
If you're intending to carry out a substantial amount of post-editing, the Systran Translation Project Manager (STPM) makes life much easier by facilitating cross-checking with the source text when necessary.
There are, however, a number of flies in the ointment. Firstly, there is no context-sensitive Help. Indeed, there are no Help files whatsoever. Click on Help, and all you get is the usual "About" version information, which is of course of absolutely no use whatsoever if you actually want some guidance about how to use the program. You don't get a printed manual to read in the bath, but do get a PDF manual which you can download from the Systran website. Although this is generally written in clear English, the information isn't always easy to apply, since there's little in the way of step-by-step instructions. That's a Big Black Mark for a leading company that is offering a product which can hardly be considered cheap in even one's most extravagant moments.
Perhaps the biggest current weakness, however, is the Translation Memory feature offered for STPM, which could in theory significantly improve quality for users with massive translation memories. Systran certainly builds up one's expectations in this respect by providing TMX compatibility, which should mean that it can import files in the "Translation Memory eXchange" format designed to offer precisely this sort of leverage across products. The problem is that there are half a dozen different versions of TMX and Systran either won't import complete memories, providing you with truncated mini-memories, or imports the complete memory file but then won't save it properly. What's more, sometimes it will save the memory but have problems with the coding, so that if you select the Systran Translation Memory you have created as one of your dictionaries you will find that it "translates" the source language back into the source language - a somewhat less than impressive result. The system can be made to work, but it takes an awful lot of fiddling with formats and imports - a pleasure that can only be recommended to the computer literate with a piquant tendency towards the masochistic.
Translation quality
Overall, the raw output of Systran was suitable for gisting, albeit with some surprising oversights, such as for example the fact that it didn't know that "formation" in French can also mean "training" in English. Both intelligibility and accuracy were on a par with other Machine Translation programs, as can be seen from the comparison figures. Indeed, what was really surprising was to compare the output of today's systems with those obtained for Power Translator Pro and Reverso Pro in 2002: while there are small but noticeable differences, there is to my mind no way anyone could put their hand on their heart and claim that we are testifying to earth-shattering advances.
Although the premium version comes with 20 specialist dictionaries that are grouped into different subject areas to make them easier to use, these dictionaries were often less useful than might be expected, partly because they are still quite generic. Consequently, selecting "Political Science" under "Business" for example, or "Medicine" under "Life Sciences" can still quite happily generate inappropriate terminology - it's just inappropriate political or medical terminology rather than inappropriate generic terminology. As a rule, it was more effective to resolve these problems by creating a custom dictionary or adding dictionary entries.
PROMT, the potentially pricey Russian doll
PROMT adopts a very different approach to Systran, at least as far as its interfaces, dictionaries and manuals are concerned. Installation is a simple enough affair but, unlike Systran, leaves you with a large number of different interfaces, all of which are, however, relatively easy to use and most of which will only need to be used occasionally to customise the system or access more powerful features. These also include a handy feature for backing up your PROMT dictionaries and for installing specialist dictionaries. PROMT offers an immense number of the latter which are, however, all optional extras, with the basic system being supplied with just three dictionaries: General, Internet, and Business and Computer. Consequently, although PROMT initially seems outstanding value compared to Systran this may well not be the case in practice, particularly if you work in a number of highly specialised fields and don't want to build the relevant dictionaries from scratch yourself.
As a Russian company, PROMT offers a greater variety of language combinations for Russian and is generally less Anglo-centric than Systran. On the other hand, specialist dictionaries are not available for all its language combinations.
|
|
First published in ITI Bulletin, 2005. |
