nonconsumptive

access
to texts

Stable Random Projection is a language-agnostic strategy for vectorizing text into a low dimensional space. (Schmidt, 2017)

SRP representations are quite inefficient from an information-theoretic point of view, but desirable because unlike most vector representations of text they do not bake in any assumptions from a training set about the vocabulary likely to be encountered. That means they can be used on any language or vocabulary, and produce a consistent embedding in a universal space.

  • dimensionality (default 1280)
  • resolution (Usually 4-byte floats: but distributing a version that encodes the sign as a binary representation is 1/32 the size and allows extremely fast searching using bitwise comparisons).