Skip to content

Command-Line Interface

aitextgen has a command-line interface to quickly automate common tasks, and make it less necessary to use a script; helpful if running on a remote server.

Encode

Encodes given text text.txt into a cache and compressed TokenDataset, good for prepping a dataset for transit to a remote server.

aitextgen encode text.txt

If you are encoding a CSV, you should pass in the line_by_line parameter as well.

aitextgen encode reddit.csv --line_by_line True

Train

To train/finetune on the default 124M GPT-2, given text text.txt and all default parameters:

aitextgen train text.txt

If you are using a cached/compressed dataset that ends with tar.gz (e.g one created by the Encoding CLI command above), you can pass that to this function as well.

aitextgen train dataset_cache.tar.gz

Other parameters to the TokenDataset constructor can be used.

Generate

Loads a model and generates to a file.

By default, it will generate 20 texts to the file, 1 at a time at temperature of 0.7.

aitextgen generate

You can print to console instead by passing --to_file False

aitextgen generate --prompt "I believe in unicorns because" --to_file False