[meteor_slideshow slideshow=”adssa” metadata=”height: 126, width: 630″]
Write a proposal about “Arabic to English Statistical Machine Translation for Big Data sets using Hadoop and Compression-based Language Models”
The idea of the proposal is to extend the work for big data sets of PhD students, on Arabic-English alignment for Statistical Machine Translation (SMT). You can find more info about this data set on this Paper
I will use “MapReduce programming model” and Compression-based Language Models on the big data sets I have to create Arabic to English Statistical Machine Translation model or perhaps a program.
A case study on MapReduce used on a data set with word alignment is here: ( 6.4 CASE STUDY: WORD ALIGNMENT FOR STATISTICAL MACHINE TRANSLATION https://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf Page 138)
I will use Hadoop which is a software framework that applies MapReduce programming model.
The Major first part of this idea is found on this paper (Experiments with a PPM Compression-based Method for English-Chinese Bilingual Sentence Alignment – Wei Liu, Zhipeng Chang, and William J. Teahan ).
Please use the following sources:
A Compression-based Algorithm for Chinese Word Segmentation.pdf
A New Parallel Corpus of Arabic-English.pdf
Experiments with a PPM Compression-based A Method for English-Chinese Bilingual Sentence Alignment
Modelling Chinese for Text Compression – confrnace.pdf
Universal Text Preprocessing for Data Compression.pdf
Using compression based language models for text categorization.pdf
Data-Intensive Text Processing with MapReduce Page: 138 – Download the book from here https://lintool.github.io/MapReduceAlgorithms/MapReduce-book-final.pdf
[meteor_slideshow slideshow=”best” metadata=”height: 126, width: 630″]