ENIA: Palestra 3 - Supervised evolving parallel text aligners for improving quality translation
J. Gabriel Pereira Lopes (Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal)
Horário: 08:30 - 10:00
Local: Auditório 6
 "In this talk, I will address three complementary problems for building up necessary basis for enabling high quality Example-, Context- and Statistics-Based Machine Translation (EBMT, CBMT and SBMT). Those problems are: 1) term translation extraction from indexed aligned parallel corpora and validation; 2) reuse of acquired and validated term translation lexicon for parallel corpora alignment, and 3) extraction of term translation from non aligned parallel corpora The solution of these problems enables a faster construction of robust evolving parallel text aligners and, as a consequence, leads to improved high quality translations. For extracting single- and multi-word term translations from indexed aligned parallel corpora, we experimented a couple of association measures combined by a voting scheme, for scaling down translation pairs according to the degree of internal cohesiveness, and evaluated results obtained. Precision obtained is clearly much better than results obtained in related work for the very low range of occurrences we have dealt with, and compares with the best results obtained in word translation. These results refer to previously unknown term translations. For aligning parallel corpora, using previously acquired and validated term translation lexicons, we experimented a new global robust alignment method and, at current stage of development of used term translation lexicon, achieved an alignment combined precision and recall F-measure value of 78.9%, being precision and recall values close to that value. As, for the alignment, we clearly separated alignment proper from extraction of term translations, for addressing any new language pair, we need to build an initial bilingual lexicon. This will be the third problem addressed. To finish, I will go through translation results obtained and problems that still require additional research efforts."