The fastText command line
Following is the list of parameters that you can use with fastText command line:
$ ./fasttext usage: fasttext <command> <args> The commands supported by fasttext are: supervised train a supervised classifier quantize quantize a model to reduce the memory usage test evaluate a supervised classifier predict predict most likely labels predict-prob predict most likely labels with probabilities skipgram train a skipgram model cbow train a cbow model print-word-vectors print word vectors given a trained model print-sentence-vectors print sentence vectors given a trained model print-ngrams print ngrams given a trained model and word nn query for nearest neighbors analogies query for analogies dump dump arguments,dictionary,input/output vectors
The supervised
, skipgram
, and cbow
commands are for training a model. predict
, predict-prob
are for predictions on a supervised model. test
, print-word-vectors
, print-sentence-vectors
, print-ngrams
, nn
, analogies can be used to evaluate the model. The dump
command is basically to find the hyperparameters of the model and quantize
is used to the compress the model.
The list of hyperparameters that you can use for training are listed later.
The fastText supervised
$ ./fasttext supervised Empty input or output path. The following arguments are mandatory: -input training file path -output output file path The following arguments are optional: -verbose verbosity level [2] The following arguments for the dictionary are optional: -minCount minimal number of word occurences [1] -minCountLabel minimal number of label occurences [0] -wordNgrams max length of word ngram [1] -bucket number of buckets [2000000] -minn min length of char ngram [0] -maxn max length of char ngram [0] -t sampling threshold [0.0001] -label labels prefix [__label__] The following arguments for training are optional: -lr learning rate [0.1] -lrUpdateRate change the rate of updates for the learning rate [100] -dim size of word vectors [100] -ws size of the context window [5] -epoch number of epochs [5] -neg number of negatives sampled [5] -loss loss function {ns, hs, softmax} [softmax] -thread number of threads [12] -pretrainedVectors pretrained word vectors for supervised learning [] -saveOutput whether output params should be saved [false] The following arguments for quantization are optional: -cutoff number of words and ngrams to retain [0] -retrain whether embeddings are finetuned if a cutoff is applied [false] -qnorm whether the norm is quantized separately [false] -qout whether the classifier is quantized [false] -dsub size of each sub-vector [2]
The fastText skipgram
$ ./fasttext skipgram Empty input or output path. The following arguments are mandatory: -input training file path -output output file path The following arguments are optional: -verbose verbosity level [2] The following arguments for the dictionary are optional: -minCount minimal number of word occurences [5] -minCountLabel minimal number of label occurences [0] -wordNgrams max length of word ngram [1] -bucket number of buckets [2000000] -minn min length of char ngram [3] -maxn max length of char ngram [6] -t sampling threshold [0.0001] -label labels prefix [__label__] The following arguments for training are optional: -lr learning rate [0.05] -lrUpdateRate change the rate of updates for the learning rate [100] -dim size of word vectors [100] -ws size of the context window [5] -epoch number of epochs [5] -neg number of negatives sampled [5] -loss loss function {ns, hs, softmax} [ns] -thread number of threads [12] -pretrainedVectors pretrained word vectors for supervised learning [] -saveOutput whether output params should be saved [false] The following arguments for quantization are optional: -cutoff number of words and ngrams to retain [0] -retrain whether embeddings are finetuned if a cutoff is applied [false] -qnorm whether the norm is quantized separately [false] -qout whether the classifier is quantized [false] -dsub size of each sub-vector [2]
The fastText cbow
$ ./fasttext cbow Empty input or output path. The following arguments are mandatory: -input training file path -output output file path The following arguments are optional: -verbose verbosity level [2] The following arguments for the dictionary are optional: -minCount minimal number of word occurences [5] -minCountLabel minimal number of label occurences [0] -wordNgrams max length of word ngram [1] -bucket number of buckets [2000000] -minn min length of char ngram [3] -maxn max length of char ngram [6] -t sampling threshold [0.0001] -label labels prefix [__label__] The following arguments for training are optional: -lr learning rate [0.05] -lrUpdateRate change the rate of updates for the learning rate [100] -dim size of word vectors [100] -ws size of the context window [5] -epoch number of epochs [5] -neg number of negatives sampled [5] -loss loss function {ns, hs, softmax} [ns] -thread number of threads [12] -pretrainedVectors pretrained word vectors for supervised learning [] -saveOutput whether output params should be saved [false] The following arguments for quantization are optional: -cutoff number of words and ngrams to retain [0] -retrain whether embeddings are finetuned if a cutoff is applied [false] -qnorm whether the norm is quantized separately [false] -qout whether the classifier is quantized [false] -dsub size of each sub-vector [2]