Skip to content

This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebook…

Notifications You must be signed in to change notification settings

jpacerqueira-zz/Jupyter_Spark_H2O_Kafka_Client_Setup

Repository files navigation

Jupyter_Spark_H2O_Kafka_Client_Setup An setup with Jupyter Spark AutoML h2o.ai and client libraries Delta.io, PyArrow and Kafka

Ideal to use Jupyter and tools exploring data in environments Docker (Ubunto 18 LTS) or Windows10 WSL (Windows Subsystem Linux)Setup Options =========Option 1

 Run using Docker Desktop in your Laptop./install-container.shOption 2

=== Run with Docker the container registred in repo datascience-fullstackConsume and run docker container with iptable port opening process~ mac-u$ docker run -it -p 9003:9003 -p 54321:54321 --cap-add=NET_ADMIN --name lost_saturn jpacerqueira83/jupyter_datascience:stableRun Docker exec to start Jupyter after setup above finishes~ mac-u$ docker exec -it lost_saturn /bin/bash -c "cd ; source .bashrc ; bash -x start-jupyter.sh ; sleep 4 ; cat notebooks/jupyter.log ; sleep infinity"Option 3

Run setup in Windows 10 with WSL app Ubunto 18.4 LTS Consume package ~ wsl-u$ cd ; git clone https://github.com/jpacerqueira/Jupyter_Spark_H2O_Kafka_Client_Setup.gitExecute installation ~ wsl-u$ cd ; cp Jupyter_Spark_H2O_Kafka_Client_Setup/library_tools/*.sh . ; bash -x anaconda_setup.shlost_saturn - container - Jupyter Notebooks DataScienceIssues and Workarrounds =========Issue 1

It is recommended to default OpenJDK8 and not OpenJDK11 or JAVA8(Oracle with an License) follow link - Installation : https://www.linuxuprising.com/2019/02/install-any-oracle-java-jdk-version-in.html - This installation resolve my issue here : jupyter/jupyter#248 Issue 2

Mount additional driver in docker containers for optional JAVA8(Oracke)- Your local container may require to mount large files (git LFS) like Oracle JAVA installerIssue 3

If Jupyter tools ( spark + h2o.ai + delta_lake:0.3) are not responsive in 1st time usagePlease re-install in the following order stop-jupyter ; install-jupyter-support-packs ; start-jupyter.sh(base) notebookuser@1662e83c8269:$ pwd /home/notebookuser (base) notebookuser@1662e83c8269:$ ls anaconda3 install-jupyter-support-packs.sh java knode_ds.err knode_ds.out library_tools notebooks python-additional-libraries spark start-jupyter.sh stop-jupyter.sh (base) notebookuser@1662e83c8269:$ netstat -anp | grep 9003 (base) notebookuser@1662e83c8269:$ bash -x stop-jupyter.sh (base) notebookuser@1662e83c8269:$ bash -x install-jupyter-support-packs.sh (base) notebookuser@1662e83c8269:$ bash -x start-jupyter.sh (base) notebookuser@1662e83c8269:~$ tail -n 25 notebooks/jupyter.log Licensing

Jupyter_Spark_h2o_Kafka_c lient repor is the core for "lost_saturn" docker container with jupyter SparkML and AutoML Ho2.ai is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.Apache Spark, Apache Arrow, H2o.ai.jar and Open-JDK8 are licensed under the Apache 2.0 License . Python, PyArrow and pi py H2o.ai under the GNU GPL License The end product here would benefit for an DevOps engineer with experience in docker-compose or Terraform, feel free to contribute.Usefull to experiment in the latest frameworks : Delta.io opensource DeltaLakeDelta Lake Sink readStream writeStreamSpark Structure Streaming Programming with Delta LateIf you are a proeficient Data Engineer/Scientist use as it is, improve it, fix it, share it back!

About

This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebook…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published