Check Youtube Video For Setting up Spark
- Go to
rbac/spark-rbac.yaml
RBAC is Role based Access Control to define User Access Priviledges. K8s RBAC is Rest based and maps http verbs to the permissions
A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide
- Go to
helm_values/sparkoperator_values.yaml
Read more.
- Spark createRole and createClusterRole is set
true
- For now, we didn't enable monitoring using graffana or external service, so metrics & podMonitor is set to
false
- resources entirely depends system/docker capacity, change it accordingly
resources:
limits:
cpu: 2000m
memory: 8000Mi
requests:
cpu: 200m
memory: 100Mi
- Execute the Spark Operator helm file
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
$ helm install spark-operator spark-operator/spark-operator -n default -f sparkoperator_values.yaml --create-namespace
Spark will create all pods inside spark namespace only
- Test Application by running
kubectl apply -f examples/spark/pi.yaml -n default
. Check the logs, a pi value will be logged if passed
- Login to minio and choose
test-files
bucket - Upload any temporary file in the bucket
- Go to the director ../examples/spark
- Execute the spark wordcount job
kubectl apply -f examples/spark/wordcount.yaml -n default
Spark should be able to read from minio which works like AWS s3