Article contents
From unified phrase representation to bilingual phrase alignment in an unsupervised manner
Published online by Cambridge University Press: 01 August 2022
Abstract
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
- Type
- Article
- Information
- Copyright
- © The Author(s), 2022. Published by Cambridge University Press
References
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20230519055110259-0285:S1351324922000328:S1351324922000328_inline77.png?pub-status=live)
- 2
- Cited by