Installation and configuration
There are ways of installing and configuring PySpark on Python IDEs such as PyCharm, Spider, and so on. Alternatively, you can use PySpark if you have already installed Spark and configured the SPARK_HOME
. Thirdly, you also use PySpark from the Python shell. Below we will see how to configure PySpark for standalone jobs.
By setting SPARK_HOME
At first, download and the Spark distribution at your preferred place, say /home/asif/Spark
. Now let's set the SPARK_HOME
as follows:
echo "export SPARK_HOME=/home/asif/Spark" >> ~/.bashrc
Now let's set PYTHONPATH
as follows:
echo "export PYTHONPATH=$SPARK_HOME/python/" >> ~/.bashrc echo "export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.1-src.zip" >> ~/.bashrc
Now we need to add the following two paths to the environmental path:
echo "export PATH=$PATH:$SPARK_HOME" >> ~/.bashrc echo "export PATH=$PATH:$PYTHONPATH" >> ~/.bashrc
Finally, let's refresh the current terminal so that the newly modified...