nc:tokenization
Consumptive alert
This is a consumptive sharing strategy. In most cases,
distributing this information about texts may violate the intellectual
property rights of the owner.
A list of tokens, representing the full text of a document in order.
Tokenization strategies may remove whitespace and non-word categories.
Columnar metadata.
- nc:tokenization-software. string. The specific program used, with version number. e.g. ‘blingfire-1.6.2’.
- nc:tokenization-strategy. string. The tokenization rules applied. Recommended terminology.
- ‘regex’: A regular expression.
- ‘words’: A linguistically informed effort to create.