Main Article Content


Translation rule selection is the task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection that incorporate global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions. With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic-specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.


Rule Selection Sensitivity Models Topic-based models

Article Details

How to Cite
Zhang, M., & Xiao, X. (2022). Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection. Journal of Engineering Applied Science and Humanities, 7(1), 64–78.


  1. Tam, Y.-C., Lane, I. R., & Schultz, T. (2007). Bilingual LSA-based adaptation for statistical machinetranslation.Machine Translation,21(4), 187–207.
  2. Hardmeier, C., Nivre, J., & Tiedemann, J. (2012). Document-wide decoding for phrase-based s-tatistical machine translation. InProceedings of the 2012 Joint Conference on EmpiricalMethods in Natural Language Processing and Computational Natural Language Learning,pp. 1179–1190, Jeju Island, Korea. Association for Computational Linguistics.
  3. He, Z., Liu, Q., & Lin, S. (2008). Improving statistical machine translation using lexicalized ruleselection. InProceedings of the 22nd International Conference on Computational Linguistics(Coling 2008), pp. 321–328, Manchester, UK. Coling 2008 Organizing Committee.
  4. Koehn, P. (2004). Statistical significance tests for machine translation evaluation. InProceedingsof EMNLP 2004, pp. 388–395, Barcelona, Spain
  5. Liu, Q., He, Z., Liu, Y., & Lin, S. (2008). Maximum entropy based rule selection model for syntax-based statistical machine translation. InProceedings of the 2008 Conference on Empiri-cal Methods in Natural Language Processing, pp. 89–97, Honolulu, Hawaii. Association forComputational Linguistics
  6. Xiao, X., & Xiong, D. (2013). Max-margin synchronous grammar induction for machine trans-lation. InProceedings of the 2013 Conference on Empirical Methods in Natural LanguageProcessing, pp. 255–264, Seattle, Washington, USA. Association for Computational Linguis-tics
  7. Xiao, X., Xiong, D., Zhang, M., Liu, Q., & Lin, S. (2012). A topic similarity model for hierarchi-cal phrase-based translation. InProceedings of the 50th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Papers), pp. 750–758, Jeju Island, Korea.Association for Computational Linguistics