Spark machine learning APIs - ML pipelines and MLlib
Until around 1.6.0, the north-facing data abstraction method was RDD, and the MLlib APIs implemented machine learning on RDDs. MLlib was introduced in Spark 0.8 and, for the most part, were straightforward library calls to ML algorithms; however, this didn't reflect the data pipelines inherent in machine learning. With the advent of DataFrames and Datasets, MLlib transformed as well with more capabilities, and the resulting framework is the ML pipeline.
Tip
MLlib APIs are in maintenance mode from 2.0.0 and will be deprecated in 3.0.0. But be aware that there are still some APIs that are not migrated to the ML world; for example, the random generator still outputs an RDD. So you will have to use MLlib to generate a random normal, poisson, or other distributions, then convert the RDD into a DataFrame, and finally use ML for transformation and machine learning algorithms.