Importing plain text data from a PDF file
The source text data could come in a portable document format (.pdf
). Scientific research papers usually comes in PDF format. If you want to perform text mining, then you need to import the text from the PDF file into the R environment before doing any processing. In this recipe, you will import text data from a PDF file.
Getting ready
To implement this recipe, you will need to install the pdftools
library.
To install the required library, run the following code:
install.packages("pdftools")
The source data for this recipe is given in the following three different PDF files containing three abstracts. The filenames are as follows:
abstract_1.pdf
abstract_2.pdf
abstract_3.pdf
How to do it…
Let's take a look at the following steps to import plain text data from a PDF file:
- Since you will read multiple PDF files, it is good to create an object containing all filenames. You can do this either by manually creating the object of filenames, or you can automatically...