Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Apache Hive Essentials

You're reading from   Apache Hive Essentials Essential techniques to help you process, and get unique insights from, big data

Arrow left icon
Product type Paperback
Published in Jun 2018
Publisher Packt
ISBN-13 9781788995092
Length 210 pages
Edition 2nd Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Dayong Du Dayong Du
Author Profile Icon Dayong Du
Dayong Du
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Preface
1. Overview of Big Data and Hive FREE CHAPTER 2. Setting Up the Hive Environment 3. Data Definition and Description 4. Data Correlation and Scope 5. Data Manipulation 6. Data Aggregation and Sampling 7. Performance Considerations 8. Extensibility Considerations 9. Security Considerations 10. Working with Other Tools 1. Other Books You May Enjoy Index

Preface

With an increasing interest in big data analysis, Hive over Hadoop becomes a cutting-edge data solution for storing, computing, and analyzing big data. The SQL-like syntax makes Hive easier to learn and is popularly accepted as a standard for interactive SQL queries over big data. The variety of features available within Hive provides us with the capability of doing complex big data analysis without advanced coding skills. The maturity of Hive lets it gradually merge and share its valuable architecture and functionalities across different computing frameworks beyond Hadoop.

Apache Hive Essentials, Second Edition prepares your journey to big data by covering the introduction of backgrounds and concepts in the big data domain, along with the process of setting up and getting familiar with your Hive working environment in the first two chapters. In the next four chapters, the book guides you through discovering and transforming the value behind big data using examples and skills of Hive query languages. In the last four chapters, the book highlights the well-selected and advanced topics, such as performance, security, and extensions, as exciting adventures for this worthwhile big data journey.

Who this book is for

If you are a data analyst, developer, or user who wants to use Hive for exploring and analyzing data in Hadoop, this is the right book for you. Whether you are new to big data or already an experienced user, you will be able to master both basic and advanced functions of Hive. Since HQL is quite similar to SQL, some previous experience with SQL and databases will help with getting a better understanding of this book.

What this book covers

Chapter 1, Overview of Big Data and Hive, begins with the evolution of big data, Hadoop ecosystem, and Hive. You will also learn the Hive architecture and advantages of using Hive in big data analysis.

Chapter 2, Setting Up the Hive Environment, presents the Hive environment setup and configuration. It also covers using Hive through the command line and development tools.

Chapter 3, Data Definition and Description, outlines the basic data types and data definition language for tables, partitions, buckets, and views in Hive.

Chapter 4, Data Correlation and Scope, shows you ways to discover the data by querying, linking, and scoping the data in Hive.

Chapter 5, Data Manipulation, focuses on the process of exchanging, moving, sorting, and transforming the data in Hive.

Chapter 6, Data Aggregation and Sampling, explains the way of doing aggregation and sample using aggregation functions, analytic functions, windowing, and sample clauses.

Chapter 7, Performance Considerations, introduces the best practices of performance considerations in the aspect of design, file format, compression, storage, query, and job.

Chapter 8, Extensibility Considerations, describes the way of extending Hive by creating user-defined functions, streaming, serializers, and deserializers.

Chapter 9, Security Considerations, introduces the area of Hive security in terms of authentication, authorization, and encryption.

Chapter 10, Working with Other Tools, discusses how Hive works with other big data tools.

To get the most out of this book

This book will give you maximum benefit if you have some experience with SQL. If you are a data analyst, developer, or simply someone who wants to quickly get started with Hive to explore and analyze Big Data in Hadoop, this is the book for you. Additionally, install the following in your system.

  • JDK 1.8
  • Hadoop 2.x.y
  • Ubuntu 16.04/CentOS 7

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Apache-Hive-Essentials-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/ApacheHiveEssentialsSecondEdition_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Add the necessary system path variables in the ~/.profile or ~/.bashrc file"

A block of code is set as follows:

export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/conf
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=/opt/hive/conf
export PATH=$PATH:$HIVE_HOME/bin:$HADOOP_HOME/
bin:$HADOOP_HOME/sbin

Any command-line or beeline interactive input or output is written as follows:

$hive 
$beeline -u "jdbc:hive2://localhost:10000"

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select Preference from the interface."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

 

 

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images