Packt+ | Advance your knowledge in tech

You're reading from fastText Quick Start Guide Get started with Facebook's library for text representation and classification

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781789130997

Length 194 pages

Edition 1st Edition

Languages

Python

Tools

fastText

Concepts

Mobile Application Development

Author (1):

Joydeep Bhattacharjee

View More author details

Table of Contents (17) Chapters

Title Page

Dedication

Packt Upsell

Contributors

Preface

1. Introducing FastText FREE CHAPTER

2. Creating Models Using FastText Command Line

3. Word Representations in FastText

4. Sentence Classification in FastText

5. FastText in Python

6. Machine Learning and Deep Learning Models

7. Deploying Models to Web and Mobile

1. Notes for the Readers

2. References

3. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

The fastText command line

Following is the list of parameters that you can use with fastText command line:

$ ./fasttext
usage: fasttext <command> <args>

The commands supported by fasttext are:

  supervised train a supervised classifier
  quantize quantize a model to reduce the memory usage
  test evaluate a supervised classifier
  predict predict most likely labels
  predict-prob predict most likely labels with probabilities
  skipgram train a skipgram model
  cbow train a cbow model
  print-word-vectors print word vectors given a trained model
  print-sentence-vectors print sentence vectors given a trained model
  print-ngrams print ngrams given a trained model and word
  nn query for nearest neighbors
  analogies query for analogies
  dump dump arguments,dictionary,input/output vectors

The supervised, skipgram, and cbow commands are for training a model. predict, predict-prob are for predictions on a supervised model. test, print-word-vectors, print-sentence-vectors, print-ngrams, nn, analogies can be used to evaluate the model. The dump command is basically to find the hyperparameters of the model and quantize is used to the compress the model.

The list of hyperparameters that you can use for training are listed later.

The fastText supervised

$ ./fasttext supervised
Empty input or output path.

The following arguments are mandatory:
  -input training file path
  -output output file path

The following arguments are optional:
  -verbose verbosity level [2]

The following arguments for the dictionary are optional:
  -minCount minimal number of word occurences [1]
  -minCountLabel minimal number of label occurences [0]
  -wordNgrams max length of word ngram [1]
  -bucket number of buckets [2000000]
  -minn min length of char ngram [0]
  -maxn max length of char ngram [0]
  -t sampling threshold [0.0001]
  -label labels prefix [__label__]

The following arguments for training are optional:
  -lr learning rate [0.1]
  -lrUpdateRate change the rate of updates for the learning rate [100]
  -dim size of word vectors [100]
  -ws size of the context window [5]
  -epoch number of epochs [5]
  -neg number of negatives sampled [5]
  -loss loss function {ns, hs, softmax} [softmax]
  -thread number of threads [12]
  -pretrainedVectors pretrained word vectors for supervised learning []
  -saveOutput whether output params should be saved [false]

The following arguments for quantization are optional:
  -cutoff number of words and ngrams to retain [0]
  -retrain whether embeddings are finetuned if a cutoff is applied [false]
  -qnorm whether the norm is quantized separately [false]
  -qout whether the classifier is quantized [false]
  -dsub size of each sub-vector [2]

The fastText skipgram

$ ./fasttext skipgram
Empty input or output path.

The following arguments are mandatory:
  -input training file path
  -output output file path

The following arguments are optional:
  -verbose verbosity level [2]

The following arguments for the dictionary are optional:
  -minCount minimal number of word occurences [5]
  -minCountLabel minimal number of label occurences [0]
  -wordNgrams max length of word ngram [1]
  -bucket number of buckets [2000000]
  -minn min length of char ngram [3]
  -maxn max length of char ngram [6]
  -t sampling threshold [0.0001]
  -label labels prefix [__label__]

The following arguments for training are optional:
  -lr learning rate [0.05]
  -lrUpdateRate change the rate of updates for the learning rate [100]
  -dim size of word vectors [100]
  -ws size of the context window [5]
  -epoch number of epochs [5]
  -neg number of negatives sampled [5]
  -loss loss function {ns, hs, softmax} [ns]
  -thread number of threads [12]
  -pretrainedVectors pretrained word vectors for supervised learning []
  -saveOutput whether output params should be saved [false]

The following arguments for quantization are optional:
  -cutoff number of words and ngrams to retain [0]
  -retrain whether embeddings are finetuned if a cutoff is applied [false]
  -qnorm whether the norm is quantized separately [false]
  -qout whether the classifier is quantized [false]
  -dsub size of each sub-vector [2]

The fastText cbow

$ ./fasttext cbow
Empty input or output path.

The following arguments are mandatory:
 -input training file path
 -output output file path

The following arguments are optional:
 -verbose verbosity level [2]

The following arguments for the dictionary are optional:
 -minCount minimal number of word occurences [5]
 -minCountLabel minimal number of label occurences [0]
 -wordNgrams max length of word ngram [1]
 -bucket number of buckets [2000000]
 -minn min length of char ngram [3]
 -maxn max length of char ngram [6]
 -t sampling threshold [0.0001]
 -label labels prefix [__label__]

The following arguments for training are optional:
 -lr learning rate [0.05]
 -lrUpdateRate change the rate of updates for the learning rate [100]
 -dim size of word vectors [100]
 -ws size of the context window [5]
 -epoch number of epochs [5]
 -neg number of negatives sampled [5]
 -loss loss function {ns, hs, softmax} [ns]
 -thread number of threads [12]
 -pretrainedVectors pretrained word vectors for supervised learning []
 -saveOutput whether output params should be saved [false]

The following arguments for quantization are optional:
 -cutoff number of words and ngrams to retain [0]
 -retrain whether embeddings are finetuned if a cutoff is applied [false]
 -qnorm whether the norm is quantized separately [false]
 -qout whether the classifier is quantized [false]
 -dsub size of each sub-vector [2]