Stable Random Projection is a language-agnostic strategy for vectorizing text into a low dimensional space. (Schmidt, 2017)
SRP representations are quite inefficient from an information-theoretic point of view, but desirable because unlike most vector representations of text they do not bake in any assumptions from a training set about the vocabulary likely to be encountered. That means they can be used on any language or vocabulary, and produce a consistent embedding in a universal space.