See also: Requests for Projects, NLP Progress, and NLP Guide
Eastern Armenian is typically between language 50 and 100 for the top companies which launch products in many languages. Just like for many other languages, early technical challenges were often around encoding for non-ASCII characters, but Unicode has mostly solved that.
TODO: compare to English, Russian, Hebrew, Georgian
Unlabelled datasets.
Wikipedia
TODO
For optical character recognition
TODO
For speech recognition
TODO
fastText has trained embeddings for the top few hundred languages, including Armenian, on Wikipedia and Common Crawl.
Word Analogy Task along with GloVe, fastText, SkipGram, CBOW embeddings pre-trained on Wikipedia + news articles + blog posts.
https://github.com/UniversalDependencies/UD_Armenian-ArmTDP
https://github.com/UniversalDependencies/UD_Armenian-ArmTDP
https://github.com/ispras-texterra/pioner
There is proprietary dataset at UCom. Google has a working system trained on search queries read aloud.
TODO: open dataset
TODO
TODO
Sentiment analysis: TODO
TODO
TODO
Parallel corpora: TODO
Unsupervised translation: TODO
https://github.com/YerevaNN/translit-rnn
https://github.com/deeplanguageclass/fairseq-transliteration/
TODO