Loading and saving data
Loading data into an RDD and Saving an onto an output system both support different methods. We will cover the most common ones in this section.
Loading data
Loading data into an RDD can be by using SparkContext
. Some of the most common methods are:.
textFile
wholeTextFiles
load
from a JDBC datasource
textFile
textFile()
can be used to load textFiles into an RDD and each line becomes an element in the RDD.
sc.textFile(name, minPartitions=None, use_unicode=True)
The is an example of loading a textfile
into an RDD using textFile()
:
scala> val rdd_two = sc.textFile("wiki1.txt") rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24 scala> rdd_two.count res6: Long = 9
wholeTextFiles
wholeTextFiles()
can be used to load multiple text files into a paired RDD containing pairs <filename, textOfFile>
representing the filename and the entire content of the file. This is when loading multiple small text files and is...