Implementation objectives
The goal of this section will be to get started with developing a data pipeline using the Random Forests algorithm.
List of implementation goals
The following implementation objectives are the same and cover both the Random Forests pipeline and linear regression. We will perform preliminary steps such as Exploratory Data Analysis (EDA) once and develop specific implementation code that pertains to a particular pipeline. Therefore, the implementation objectives are listed here as follows:
- Get the stock price dataset.
- Carry out preliminary EDA in the Sandbox Zeppelin Notebook environment (or Spark shell), and run a statistical analysis.
- Develop the pipeline incrementally in Zeppelin, and port the code into IntelliJ. This means doing the following:
- Create a new Scala project in IntelliJ, or import an existing empty project into IntelliJ, and create Scala artifacts from code that was incrementally developed in the Notebook.
- Do not forget to wire up all the necessary dependencies...