Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
fastText Quick Start Guide
fastText Quick Start Guide

fastText Quick Start Guide: Get started with Facebook's library for text representation and classification

Arrow left icon
Profile Icon Joydeep Bhattacharjee
Arrow right icon
$25.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7 (3 Ratings)
eBook Jul 2018 194 pages 1st Edition
eBook
$25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $12.99p/m
Arrow left icon
Profile Icon Joydeep Bhattacharjee
Arrow right icon
$25.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7 (3 Ratings)
eBook Jul 2018 194 pages 1st Edition
eBook
$25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $12.99p/m
eBook
$25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $12.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

fastText Quick Start Guide

Chapter 1. Introducing FastText

Welcome to fastText Quick Start Guide. In this first chapter, you will find out how to install fastText and create a stable environment in which to learn how to use fastText applications as part of your Natural Language Processing applications.

fastText is a library that helps you to generate efficient word representations and gives you support for text classification out of the box. In this book, we will take a look at a specific use case, namely machine translation, and use fastText for that. We have chosen machine translation because fastText claims that it is superior in terms of yet unknown words, and can handle different languages for which sufficiently large data sources and corpora may not be available. In different chapters, we will see how fastText fares in such cases. General techniques will also be discussed so that you will be able to extend those techniques to your specific use case. We will cover the following topics in this chapter:

  • Introducing fastText
  • Installing fastText in Windows, Linux, and macOS
  • Using a Docker image for fastText
  • Installing dependencies on Mac systems
  • Installing Python dependencies
  • Installing dependencies on RHEL machines using the yum package manager
  • Installing dependencies on Debian-based machines such as Ubuntu
  • Installing dependencies on Arch Linux using pacman

 

Introducing fastText


In today's interconnected world, a lot of text data gets generated around the world. This text information includes descriptions of things. Take, for example, people writing about products in Amazon reviews, or people writing about their thoughts through their Facebook posts. Natural Language Processing (NLP) is the application of machine learning and other computational techniques to understanding and representating spoken and written text. The following are the major challenges that NLP seeks to solve:

  • Topic modeling: In general, texts deal with a topic. Topic modeling is frequently used to determine hidden structures or "abstract topics" that may be present in a collection of documents. An effective application of topic modeling would be summarization. For example, legal documents are quite complex and verbose, and hence systems such as these would help the reader to get the gist of the document and a high-level description of what is happening.
  • Sentence classification: Text classification is a important challenge, where we are able to take in blobs of text and classify them into different labels. For example, a system should be able to correctly classify something like "Shahrukh Khan was on fire at Dubai event" as belonging to the label "Entertainment" and another sentence, "Fire breaks out in store opposite Breach Candy Hospital," to be categorized as "News."
  • Machine translation: The total number of languages in the world is at least 3,000. About half of these languages have fewer than 10,000 speakers and about 25 percent have less than 1,000 speakers. Hence, we can imagine that a lot of languages are dying and when a language dies, collectively we lose a lot of our cultural heritage. The best translation system right now is made by Google, but it covers only 103 languages at the time of writing, so it is very important that we develop machine learning translation models that are able to train from few sources with a high degree of predictive power.
  • Question and answer (QA) systems: The focus here is to build a system that automatically answers questions based on the questions that people ask in natural language. QA systems that can be built around closed domain systems can be highly accurate as they can retrieve documents and text that are relevant to the search item.
  • Sentiment analysis: Sentiment analysis is about understanding the needs and intents that the users share when talking about something. People make choices based on emotions. The needs of many people are largely emotional and, generally, people are very forthcoming about how they feel. Creating a system that takes this into account will always add a lot of value to the business.
  • Event extraction: Use cases involve where a lot of data is stored in the form of text. For example, some legal text may be describing a "crime" event, which is followed by an "investigation" event, which is followed by multiple "hearing" events. The events themselves may be nested such that the "hearing" events may consist of a "presenting arguments" events and a "presenting evidence" events.
  • Named entity detection: The focus of building this system is to extract and classify entities or specific information as per some predefined categories, such as people, organization, geography, and so on. For example, if we take the following text: "We're used to spicy foods down here in South Texas," we can understand that the "buyer" likes "spicy foods" and his "geography" is South Texas. If there is sufficient evidence received from the data that buyers in South Texas like spicy foods, more such foods can be marketed to them.
  • Relation detection: A relation detection system parses text and identifies focal points and agents, then tries to find the relationship between them. For example, the sentence "Mike has the flu" can be converted to Person-[RELATION:HAS]->Disease. These relations can then be explored in a business context to build intelligent apps.

The previous list has many of the problems that NLP practitioners are targeting. Depending on the use case, you can pick up any of these challenges and try to solve them in your domain. The challenge with many previous approaches and modeling techniques is that NLP requires a lot of textual data and there is a lot of contextual information in the data. It is quite hard for a computational model to get a sense of all the data in an efficient manner. NLP models up to now have only targeted English as textual data is available in English. But only 20 percent of the population of the world speak English and even among them, the majority are non-native speakers. The biggest deterrent to building non-English NLP models is the lack of data. Hence, we desperately need libraries that can build models even when the data is limited. fastText has the potential to change all that. The fastText team has published pretrained word vectors for 294 languages. By the time the book is published, more languages will have been added to it.

In this chapter, we will see how to install fastText so that you can start tinkering with this amazing software. 

 

Note

Some of the descriptions provided may not be applicable to you; for example, instructions for Mac users may not be directly relevant to Linux users and vice versa. Still, I would suggest that you read through the whole description for each of the dependencies for a better understanding.

Installing fastText


Depending on your operating system, you will need to make sure that you have some dependencies installed in your machine. In this section, you will get to know how to install fastText based on whether you are using a Linux, Windows, or macOS operating system. Additionally, you will get to know what additional dependencies you should install depending on your usage. My recommendation is to install all the software packages, as we will be exploring all the various ways we can use fastText in this book.

Prerequisites

FastText works on Windows, Linux, and macOS. FastText is built using the C++ language, so you will first need a good C++ compiler.

Windows

Official binaries for Windows are not available, but you can download the latest Windows binaries compiled by Meng Xuan Xia at the following link: https://github.com/xiamx/fastText/releases. To run these binaries, you will need to install Visual C++ 2017. You can download the 64-bit versions of Visual C++ from this link: https://support.microsoft.com/en-in/help/2977003/the-latest-supported-visual-c-downloads. Next, the usual way of installing, by double-clicking on the installer file for Visual C++, should install it on your Windows machine.

Linux

The list of prerequisite software that you need to install is as follows:

  • GCC-C++; if you are using Clang, you will need 3.3 or newer
  • Cmake
  • Python 3.5 (you can work with Python 2.7, but we are going to focus on Python 3 in this book)
  • NumPy and SciPy
  • pybind

Optional requirements, depending on your system, are as follows:

  • Zip
  • Docker
  • Git

Installing dependencies on RHEL machines supporting the yum package manager

On Linux machines, you will need to have g++ installed. On Fedora/CentOS, which supports the yum package manager, you can installg++using the following command. Open the Terminal or connect to the server where you are installing this using your favorite SSH tool and run the following command:

$ sudo yum install gcc-c++

CMake should be installed by default. The official docs have the installation instructions in make and cmake. I would recommend installing cmake on your machine and using it to build fastText. You can directly install cmake using the yum generic command like before:

$ sudo yum install cmake

To get a full list of cmake commands, take a look at the following link: https://cmake.org/cmake/help/v3.2/manual/cmake.1.html.

To install the optional software, run the following command:

$ sudo yum install zip docker git-core

If you are starting on a new server and running yum commands there, then you may encounter the following warning:

Failed to set locale, defaulting to C

In this case, install the glibc language pack:

$ sudo yum install glibc-langpack-en

Now, you can jump to the installation instructions for Anaconda to install the Python dependencies.

Installing dependencies on Debian-based machines such as Ubuntu

In Ubuntu and Debian machines, apt-get or apt is your package manager. apt is basically a wrapper around apt-get and other similar tools, and hence you should be able to use them interchangeably. I will be showing apt commands here but if you are using older versions of Ubuntu and Debian, and see that apt is not working on your machines, then you can replace apt with apt-get and it should work. Also, consider upgrading your machine if possible. Similar to Fedora, to install C++, open a Terminal or SSH into the server where you are going to install fastText and run the following command. This will also install the cmake command:

$ sudo apt update
$ sudo apt install build-essential

Now install cmake:

$ sudo apt install cmake

To install the optional requirements, run the following command:

$ sudo apt install zip docker git-core

Now, check the Anaconda section to see how to install Anaconda for the Python dependencies.

Note

The apt command only works from Ubuntu-16 onwards. If you are using an older Ubuntu version, you should use the apt-get command.

Installing dependencies on Arch Linux using pacman

The package manager of choice on Arch Linux is pacman and you can run the following command to install the essential build tools:

$ sudo pacman -S cmake make gcc-multilib

This should install the make, cmake, and g++ compiler that you need to build fastText. Although Arch distributions already have Python 3.x installed, I would recommend installing Anaconda as described later in this chapter so that you don't miss out on any of the Python dependencies.

To install the optional requirements, run the following command:

$ sudo pacman -S p7zip git docker

Installing dependencies on Mac systems


On macOS, you should have Clang installed by default, which is designed to be a drop-in replacement for the normal compilers for C, C++, and other similar languages. Check whether the version is 3.3 or later using clang --version in a Terminal. If you do not have Clang or something from the older versions, then you can install using the xcode command-line tools using a Terminal:

$ xcode-select --install

A dialog should appear next that asks if you want to install the developer tools. Click on the Install button.

Installing Python dependencies


I recommend that you install Anaconda so that there are no issues with installing Python and using it for fastText. Detailed instructions for installing Anaconda are given on the official documentation page, which can be accessed at https://conda.io/docs/user-guide/install/linux.html. Simply stated, if you are on Windows, then download the Windows installer, double-click on it, and then follow the instructions on the screen. Installing it using a GUI is also possible for macOS.

 

 

In the case of Linux and macOS, download the corresponding bash file and then run the following command in a Terminal:

$ bash downloadedfile.sh

Please take care to download and install it using installers that are tagged for Python 3.x. The Python code snippets that will be shown in this book will be shown for Python 3.x.

Installing fastText on Windows

Currently, official binaries are not provided for fastText on Windows, and hence there is no GUI to install fastText on your machine. To use fastText, you will need to perform the following steps:

  1. Download the latest binary named fasttext-win64-latest-Release.zip from the release page provided by Xua (https://github.com/xiamx/fastText/releases).
  2. This is a ZIP file and hence you will need to extract the contents. You will find the fasttext_pic.lib, fasttext.lib, fasttext.exe, and fasttext.dll files in the extracted folder. This folder will be your working directory for fastText:

 

  1. Create a folder, data where you will keep all your data files. Now, open PowerShell and change directory to the folder.
  2. Type .\fasttext.exe in Powershell and you should be able to see the output.

If you don't see any output at the end, then you probably don't have Visual C++ Redistributable in your machine and will need to install that.

 

 

Installing fastText in Linux and macOS

To install fastText, run the following commands to clone the image and build it in a Terminal:

 $ git clone https://github.com/facebookresearch/fastText.git
 $ cd fastText
 $ mkdir build && cd build && cmake ..
 $ make && make install

In this book, a lot of focus will be on building systems for Python. So, run the following commands as well in the same directory:

$ pip install .

pip is the package manager for Python. fastText assumes UTF-8 encoded text, which is the default in Python 3.x. The Python code examples in this book will be shown using Python 3.x. One of the advantages of fastText is that you can build fastText models for multiple languages and if you are not using Python 3.x, then you will not be able to take advantage of this. If that is not a concern and you are trying to use fastText using Python 2.7, then take a look at the Appendix at the end, which will give you guidelines on how to develop, keeping in mind UTF-8 in Python 2.7.

Using a Docker image for fastText


You can also use Docker to run fastText on your machine and not worry about building it. This can be done to maintain version control between specific versions and thus gives us predictability and consistency. You can get information on how to install Docker from the following link: https://docs.docker.com/install/#cloud.

After installing, start the Docker service before running the following commands:

 start the docker service.
 $ systemctl start docker

 # run the below commands to start the fasttext container.
 $ docker pull xebxeb/fasttext-docker

You should now be able to run fastText:

$ mkdir -p /tmp/data && mkdir -p /tmp/result
$ docker run --rm -v /tmp/data:/data -v /tmp/result:/result \
         -it xebxeb/fasttext-docker ./classification-example.sh

 

 

You may need to provide permissions and create the specific directories to run the docker run command.

Summary


In this chapter, you have taken a look at how to install and start using fastText in the environment of your choice.

In the next chapter, we will be taking a look at how to train fastText models using the command line and how to use them.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Introduction to Facebook's fastText library for NLP
  • Perform efficient word representations, sentence classification, vector representation
  • Build better, more scalable solutions for text representation and classification

Description

Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText.  This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification.  Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch.  Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects.

Who is this book for?

This book is for data analysts, data scientists, and machine learning developers who want to perform efficient word representation and sentence classification using Facebook's fastText library. Basic knowledge of Python programming is required.

What you will learn

  • Create models using the default command line options in fastText
  • Understand the algorithms used in fastText to create word vectors
  • Combine command line text transformation capabilities and the fastText library to implement a training, validation, and prediction pipeline
  • Explore word representation and sentence classification using fastText
  • Use Gensim and spaCy to load the vectors, transform, lemmatize, and perform other NLP tasks efficiently
  • Develop a fastText NLP classifier using popular frameworks, such as Keras, Tensorflow, and PyTorch

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 26, 2018
Length: 194 pages
Edition : 1st
Language : English
ISBN-13 : 9781789136715
Vendor :
Facebook
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 26, 2018
Length: 194 pages
Edition : 1st
Language : English
ISBN-13 : 9781789136715
Vendor :
Facebook
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$12.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 6,500+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$129.99 billed annually
Feature tick icon Unlimited access to Packt's library of 6,500+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$179.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 6,500+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 120.97
fastText Quick Start Guide
$32.99
Natural Language Processing and Computational Linguistics
$43.99
Hands-On Natural Language Processing with Python
$43.99
Total $ 120.97 Stars icon
Banner background image

Table of Contents

7 Chapters
Introducing FastText Chevron down icon Chevron up icon
Creating Models Using FastText Command Line Chevron down icon Chevron up icon
Word Representations in FastText Chevron down icon Chevron up icon
Sentence Classification in FastText Chevron down icon Chevron up icon
FastText in Python Chevron down icon Chevron up icon
Machine Learning and Deep Learning Models Chevron down icon Chevron up icon
Deploying Models to Web and Mobile Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7
(3 Ratings)
5 star 66.7%
4 star 0%
3 star 0%
2 star 0%
1 star 33.3%
MJ Oct 23, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I've been using fastText for about 6 months now. I had not been able to find any good resources that simply explained all of the necessary steps to prepare text training data before building a FastText model. It was also very hard to know how to tweak the supervised hyperparameters. This books answered my questions!I'm so thankful for Joydeep Bhattacharjee and his hard work on this fantastic resource.
Amazon Verified review Amazon
Laxmi Vanam Sep 24, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Having followed Joydeep's lectures (youtube channel and seminars) for a while, I got this book hoping to get an indepth understanding of the fasttext library while working on my NLP project. I would say, it absolutely met my expectations in taking me from zero-in-depth understanding of the concepts. I have always believed that books are deeper way to dive into the subject compared to online lectures and this book proved to be right. I can say with just the super basic knowledge of what NLP is you can rely on this book to take you to a next level. All the concepts are explained in a lucid manner and can me sense to an absolute beginner in NLP. I would recommend this book to anyone who wants to get absolute understanding of fastText library for text classification on both supervised and unsupervised representations.
Amazon Verified review Amazon
C. Vogel May 10, 2020
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Bash scripts, really?By today standards, one would expect a clean support of notebooks or organized code per chapter. Not a mixed bag of 5 lines Python functions, and command line scripts.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.