file spark-lamdo used to build and config cluster apche spark with one master and two worker
the version of spark : 3.2.1
file include two file apps and data
Job.py file in /apps is written based on pyspark for the purpose of ETL (extract, transform and load data) into Postgresql.
file csv and jar in /data
#To run:
docker build -t cluster-apache-spark:3.2.1 .
docker compose up -d
#To submit the app connect to one of the workers or the master and execute:
/opt/spark/bin/spark-submit --master spark://spark-master:7077 \
--jars /opt/spark-data/postgresql-42.2.22.jar \
--driver-memory 1G \
--executor-memory 1G \
/opt/spark-apps/Job.py
Check the completion time of spark-cluster.