HDFS and Hive in Python
This book is about Python for geospatial development, so in this section, you will learn how to use Python for HDFS operations and Hive queries. There are several database wrapper libraries with Python and Hadoop, but it does not seem like a single library has become a standout go-to library, and others, like Snakebite, don't appear ready to run on Python 3. In this section, you will learn how to use two libraries—PyHive and PyWebHDFS. You will also learn how you can use the Python subprocess module to execute HDFS and Hive commands.
To get PyHive, you can use conda and the following command:
conda install -c blaze pyhiveYou may also need to install the sasl library:
conda install -c blaze saslThe previous libraries will give you the ability to run Hive queries from Python. You will also want to be able to move files to HDFS. To do so, you can install pywebhdfs:
conda install -c conda-forge pywebhdfsThe preceding command will install the library, and as always, you can...