Skip to content

Pyspark projects in k8s

Using the freshly pushed docker image containing spark python image. We can build new docker image pushing our custome code into the image. We are building a pyspark app reading data from minio.

PySpark app reading data from minio

from pyspark import SparkContext
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

def load_config(spark_context: SparkContext):
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.access.key", "console")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "console123")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "minio-1641612822.minio.svc.cluster.local:9000")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.path.style.access", "true")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    spark_context._jsc.hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false")

load_config(spark.sparkContext)
df = spark.read.json('s3a://merobucket/orders.json')
average = df.agg({'amount':'avg'})
average.show()

Packing Pyspark app using Dockerfile

FROM registry.logpoint.com.np/spark-py:latest
COPY main.py .

Building & Pushing Pyspark app Docker Image

docker build -t registry.logpoint.com.np/sparkapp:0.0.1 .
docker push registry.logpoint.com.np/sparkapp:0.0.1

Runing spark app on kubernetes cluster

spark-submit \
    --master k8s://https://192.168.2.55:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.container.image=registry.logpoint.com.np/sparkjob:0.0.4 \
    local:///opt/spark/work-dir/main.py

Since /opt/spark/work-dir is the working directory on the pyspark base image. Our spark application get copied into the same directory and we can run spark application from this directory.