What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Java Data Analysis

Chapter 2. Data Preprocessing

Before data can be analyzed, it is usually processed into some standardized form. This chapter describes those processes.

Relational database tables

In a relational database, we think of each dataset as a table, with each data point being a row in the table. The dataset's signature defines the columns of the table.

Here is an example of a relational database table. It has four rows and five columns, representing a dataset of four data points with five fields:

Last name	First name	Sex	Age	ID
Adams	John	M	26	704601929
White	null	F	39	440163867
Jones	Paul	M	49	602588410
Adams	null	F	30	120096334

Note

There are two null fields in this table.

Because a database table is really a set of rows, the order of the rows is irrelevant, just as the order of the data points in any dataset is irrelevant. For the same reason, a database table may not contain duplicate rows and a dataset may not contain duplicate data points.

Key fields

A dataset may specify that all values of a designated field be unique. Such a field is called a key field for the dataset. In the preceding example, the ID number field could...

Key Benefits

Get your basics right for data analysis with Java and make sense of your data through effective visualizations.

Use various Java APIs and tools such as Rapidminer and WEKA for effective data analysis and machine learning.

This is your companion to understanding and implementing a solid data analysis solution using Java

What You Will Learn

Develop Java programs that analyze data sets of nearly any size, including text

Implement important machine learning algorithms such as regression, classification, and clustering

Interface with and apply standard open source Java libraries and APIs to analyze and visualize data

Process data from both relational and non-relational databases and from time-series data

Employ Java tools to visualize data in various forms

Understand multimedia data analysis algorithms and implement them in Java.

Who Is This Book For?

If you are a student or Java developer or a budding data scientist who wishes to learn the fundamentals of data analysis and learn to perform data analysis with Java, this book is for you. Some familiarity with elementary statistics and relational databases will be helpful but is not mandatory, to get the most out of this book. A firm understanding of Java is required.

Book Description

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. After getting a quick overview of what data science is and the steps involved in the process, you’ll learn the statistical data analysis techniques and implement them using the popular Java APIs and libraries. Through practical examples, you will also learn the machine learning concepts such as classification and regression. In the process, you’ll familiarize yourself with tools such as Rapidminer and WEKA and see how these Java-based tools can be used effectively for analysis. You will also learn how to analyze text and other types of multimedia. Learn to work with relational, NoSQL, and time-series data. This book will also show you how you can utilize different Java-based libraries to create insightful and easy to understand plots and graphs. By the end of this book, you will have a solid understanding of the various data analysis techniques, and how to implement them using Java.

Frequently bought together