Skip to content

Install apache spark on k8s

In order to run apache spark workloads in kubernetes cluster we need to build the apache spark image and push it to the repository. Install apache spark into the local computer downloading the apache spark package using the command.

wget -c https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz

Setup apache spark on mac

Unzip spark package on mac using the following command

tar zxvf spark-3.2.0-bin-hadoop3.2.tgz
sudo mv spark-3.2.0-bin-hadoop3.2 /opt/spark

Setup following in ~/.bash_profile file

export SPARK_HOME=/opt/spark
export PATH=$PATH:/usr/local/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin

Build the spark kubernetes image using the command

docker-image-tool.sh -r registry.logpoint.com.np/spark -t 0.0.1 build
docker-image-tool.sh -r registry.logpoint.com.np/spark -t 0.0.1 push

RBAC Setup

on a RBAC enabled cluster, create service account and assing relevant clusterrolebindings to the service accounts using the following commands.

kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

Run the spark command using command

spark-submit \
    --master k8s://https://192.168.2.55:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.container.image=registry.logpoint.com.np/spark:0.0.1 \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.0.3.jar 10000