Preface
With an increasing interest in big data analysis, Hive over Hadoop becomes a cutting-edge data solution for storing, computing, and analyzing big data. The SQL-like syntax makes Hive easier to learn and is popularly accepted as a standard for interactive SQL queries over big data. The variety of features available within Hive provides us with the capability of doing complex big data analysis without advanced coding skills. The maturity of Hive lets it gradually merge and share its valuable architecture and functionalities across different computing frameworks beyond Hadoop.
Apache Hive Essentials, Second Edition prepares your journey to big data by covering the introduction of backgrounds and concepts in the big data domain, along with the process of setting up and getting familiar with your Hive working environment in the first two chapters. In the next four chapters, the book guides you through discovering and transforming the value behind big data using examples and skills of Hive query languages. In the last four chapters, the book highlights the well-selected and advanced topics, such as performance, security, and extensions, as exciting adventures for this worthwhile big data journey.
Who this book is for
If you are a data analyst, developer, or user who wants to use Hive for exploring and analyzing data in Hadoop, this is the right book for you. Whether you are new to big data or already an experienced user, you will be able to master both basic and advanced functions of Hive. Since HQL is quite similar to SQL, some previous experience with SQL and databases will help with getting a better understanding of this book.
What this book covers
Chapter 1, Overview of Big Data and Hive, begins with the evolution of big data, Hadoop ecosystem, and Hive. You will also learn the Hive architecture and advantages of using Hive in big data analysis.
Chapter 2, Setting Up the Hive Environment, presents the Hive environment setup and configuration. It also covers using Hive through the command line and development tools.
Chapter 3, Data Definition and Description, outlines the basic data types and data definition language for tables, partitions, buckets, and views in Hive.
Chapter 4, Data Correlation and Scope, shows you ways to discover the data by querying, linking, and scoping the data in Hive.
Chapter 5, Data Manipulation, focuses on the process of exchanging, moving, sorting, and transforming the data in Hive.
Chapter 6, Data Aggregation and Sampling, explains the way of doing aggregation and sample using aggregation functions, analytic functions, windowing, and sample clauses.
Chapter 7, Performance Considerations, introduces the best practices of performance considerations in the aspect of design, file format, compression, storage, query, and job.
Chapter 8, Extensibility Considerations, describes the way of extending Hive by creating user-defined functions, streaming, serializers, and deserializers.
Chapter 9, Security Considerations, introduces the area of Hive security in terms of authentication, authorization, and encryption.
Chapter 10, Working with Other Tools, discusses how Hive works with other big data tools.
To get the most out of this book
This book will give you maximum benefit if you have some experience with SQL. If you are a data analyst, developer, or simply someone who wants to quickly get started with Hive to explore and analyze Big Data in Hadoop, this is the book for you. Additionally, install the following in your system.
- JDK 1.8
- Hadoop 2.x.y
- Ubuntu 16.04/CentOS 7
Download the example code files
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register at www.packtpub.com.
- Select the
SUPPORT
tab. - Click on
Code Downloads & Errata
. - Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
- WinRAR/7-Zip for Windows
- Zipeg/iZip/UnRarX for Mac
- 7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Apache-Hive-Essentials-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/ApacheHiveEssentialsSecondEdition_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText
: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Add the necessary system path variables in the ~/.profile
or ~/.bashrc
file"
A block of code is set as follows:
export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=/opt/hadoop/conf export HIVE_HOME=/opt/hive export HIVE_CONF_DIR=/opt/hive/conf export PATH=$PATH:$HIVE_HOME/bin:$HADOOP_HOME/ bin:$HADOOP_HOME/sbin
Any command-line or beeline interactive input or output is written as follows:
$hive $beeline -u "jdbc:hive2://localhost:10000"
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select Preference from the interface."
Note
Warnings or important notes appear like this.
Note
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: Email [email protected]
and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected]
.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.