The fullest library for creating nonconsumptive files is the Python module nonconsumptive.
Depending on what format your texts are in, it may be very easy to create a representation without coding in python.
If you have a few hundred texts located inside a folder called texts
,
and associate metadata in ‘meta.csv’, you can run the following command to
create a set of bookstacks.
NOT IMPLEMENTED
Join us to help develop itpip install nonconsumptive
nonconsumptive build --texts texts --metadata meta.csv --metadata-id-field filename --targets unigrams bigrams stacks srp --dir nc
Once you have done so, host it online and add the package to our registry to allow others to work with it.
For more information, see the python docs.
Nonconsumptive access in R is handled through the Apache-arrow package; we recommend tidytext for exploring the data that it produces.
Javascript interaction with nonconsumptive corpora happens through duckdb. Because parquet files and structured to allow random access and duckdb-wasm makes innovative use of http requests to load things, it is possible to treat a set of bookstacks, hosted statically, as a database to be queried from the browser. This means that we can host bookstacks statically at low cost to libraries, and users can dial up only the subsets they want to look at.
The underlying data architecture here is designed to work seamlessly with a variety of other files.