For example, to train an Amharic-English system, we can train a basic translation system for English-Amharic, and use it to translate large quantities of monolingual English data into Amharic. These methods rely on lower-accuracy models that are used to generate artificial training data. In addition to generating more in-domain data, we explored semisupervised and data augmentation methods to generate additional training data. On average, we see +1.5 BLEU increase for every additional 10K sentence pairs of seed data. On average, we obtained +7.2 BLEU for each of the 15 languages in the experiment (the results vary from direction to direction). To measure the effectiveness of in-domain training data, we trained systems with and without it and measured the BLEU scores with and without. In total, we manually labeled millions of words in 25 languages.
#Abnet amharic software facebook professional#
Then we automatically sent them for professional translation in weekly batches to different translation providers. Then we ranked them to maximize coverage (i.e., to ensure that we are getting translations we don’t already have). For example, we automated the selection and preparation of these posts. To scale up to all the languages we covered, we automated several of our processes. To this end, we manually label (professionally translate) public posts.
#Abnet amharic software facebook how to#
To learn how to translate these, we need to provide the algorithms with good examples of social post translations. Overall, we experimented with three main strategies:įacebook posts and messages are very different from other types of text: They are generally shorter, less formal, and contain abbreviations, slang, and typos. We trained NMT models using the open source PyTorch Translate framework, converted them to the ONNX format, and used them in the production Caffe2 environment. We used BLEU scores (a metric that measures the degree of overlap between the generated translation and a professional reference) to measure translation quality. To better understand what helps under low-resource settings, we conducted several experiments. The lack of data is also challenging for NMT, which uses models with a large number of parameters and is more sensitive to the quality of data. The main challenge we faced in building translation systems for new languages consisted of achieving a level of translation quality that yields usable translations, in the absence of large quantities of parallel corpora. Unfortunately, these readily available parallel corpora exist for only a handful of languages. Large collections of translations are available on the web, originating mainly from international organizations such as the United Nations, the European Union, and the Canadian Parliament. Most of the translation systems require parallel data, or paired translations to be used as training data. Nonetheless, the systems produce useful translations that convey the gist of the original meaning, and they give us a way to improve quality iteratively.Įxamples of translations at each stage along the quality scale. Translation systems for many of these languages are at an early stage, and the translations they produce are still a long way from professional quality. The newly supported languages in 2018 are: We are now serving translations for a total of 4,504 language directions (a pair of languages between which we offer translation, e.g., English to Spanish). Today, we are proud to share the results of those efforts and announce that we have added 24 new languages to our automatic translation services. In 2018, Facebook’s Language and Translation Technologies (LATTE) group set out to change that, and to achieve the goal of “no language left behind.” Our primary challenges included a lack of resources for training (most of these languages do not have a quantity of readily available human translations) and the need to find a way to train systems fast enough to produce usable translations quickly. Our translation models have improved in terms of quality since we transitioned to neural networks, but until recently, technical challenges kept us from increasing the amount of languages we serve. Providing automatic translation at our scale and volume requires the use of artificial intelligence (AI) and, more specifically, neural machine translation (NMT). Using traditional methods, it could take years to professionally translate a single day’s content. When we factor in the number of languages in use and the volume of content on those platforms, we are serving nearly 6 billion translations per day to our community. Translating more content in more languages also helps us better detect policy-violating content and expand access to the products and services offered on our platforms. Part of Facebook’s mission to bring the world closer together is breaking down language barriers and allowing everyone to engage with content in their preferred language.