-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have you been able to launch jobs with Java? #39
Comments
I have the same problem
Supose, it's the same problem.
|
@JonathanLoscalzo I'm running into same issue. Were you able to solve the issue? |
@jaskiratr not for now. Instead, I have installed on a Google Colab notebook an instance of spark with this code:
I didn't catch up if we must install spark locally to connect to a remote instance. It should be easier, but not. |
Not clear how you got this.
The driver url should be |
Sorry @Cricket007 , when you said you have not got this, you have been referring to the initial error or how I use pyspark in colab? Where do you write "--driver-url", when you run the containers? (it's is a docker-compose file this). Could you explain in more detail? I have found this link |
@JonathanLoscalzo I was referring to OP. Your colab setup is likely very different than running Docker Compose, and I would suggest DataProc in the GCP environment rather than doing anything manual in CoLab |
You need Spark client libraries, yes. Or you can Or you can install Apache Livy as a REST interface to submit Spark jobs |
Find my answer there? See if that network diagram answers any of your networking issues. (Make sure you can telnet / netcat between all relevant ports) |
Thanks @Cricket007 I realized that I need spark install locally or something related to that to run the scripts ( In this case, the machine which it was running jupyter). Now, you are confirmed my issue 👍 . I suppose it is the issue of @yeikel (?)
I did't use Apache Livy, you recommend it?
I don't know what is "DataProc" in GCP, Is it like Databricks for Azure? (I will check it tomorrow) Thanks for your answer! |
DataProc is the managed Hadoop/Spark service by Google. Amazon and Azure have similar offerings, if that's what you want. Databricks is purely Spark. If you want more than that, Qubole is another option. If all you really want is to learn Spark locally, either extract it locally or use a VM simply because networking is easier and the way you would install an actual cluster would not be in containers. (and there's plenty of ways to automate the installation such as Apache Ambari, or Ansible). Otherwise, the Cloudera/Hortonworks Sandboxes work fine. |
I've used it indirectly via HUE interface, but it was fairly straightforward to setup. And I personally use Zeppelin over Jupyter because Spark (Scala) is more tightly integrated, though it can handle Python fine |
Hi ,
I am running Spark with the following configuration:
And I have the following simple Java program:
The problem that I am having is that it is generating the following command :
Spark Executor Command: "/usr/jdk1.8.0_131/bin/java" "-cp" "/conf:/usr/spark-2.3.0/jars/*:/usr/hadoop-2.8.3/etc/hadoop/:/usr/hadoop-2.8.3/etc/hadoop/*:/usr/hadoop-2.8.3/share/hadoop/common/lib/*:/usr/hadoop-2.8.3/share/hadoop/common/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/*:/usr/hadoop-2.8.3/share/hadoop/hdfs/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/lib/*:/usr/hadoop-2.8.3/share/hadoop/yarn/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.8.3/share/hadoop/mapreduce/*:/usr/hadoop-2.8.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.driver.port=59906" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@yeikel-pc:59906" "--executor-id" "6" "--hostname" "172.19.0.3" "--cores" "2" "--app-id" "app-20180401005243-0000" "--worker-url" "spark://[email protected]:8881"
Which results in
The text was updated successfully, but these errors were encountered: