Can
we improve semantic and discourse-level properties of SMT output? |
Bonnie Webber
|
Statistical Machine Translation (SMT) is currently limited by two forms of locality:
One is the locality of a single sentence, which limits how much is translated at one
time. A standard SMT system processes sentences independently of one another, and
even the order in which they are processed doesn't matter. The second is the N-gram
locality of the Language Model used in SMT. This limits how much of an output
translation can be simultaneously assessed as a good sub-string in the target language.
Neither of these localities provides enough of a view of an output translation to ensure that it is syntactically correct, semantically adequate for expressing the source message in the target, or discourse appropriate for its position in a multi-sentence text. If an output translation ends up satisfying these criteria, it is more a matter of frequency in the training data and luck than of making linguistically-informed choices. In this talk, I will briefly describe some efforts at Edinburgh and elsewhere to improve one aspect of semantics in translation (consistent expression of negation) and two aspects of discourse (appropriate signalling of coreference and appropriate signalling of discourse relations). |