Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark Installation Error on Linux Devices Using Devenv with PDM: "Java Gateway Process Exited Before Sending Its Port Number" #8

Open
shahinism opened this issue Sep 5, 2024 · 0 comments

Comments

@shahinism
Copy link

Description:

We encountered an issue while installing pyspark in a devenv-deployed environment from PyPI using pdm. Upon running the pyspark command, we consistently faced the following error:

RuntimeError: Java gateway process exited before sending its port number

Details:

  • The error only occurs on Linux devices, including NixOS and Linux-based CI environments. The same setup works fine on macOS devices.
  • We have used similar setups in other projects without any issues, which makes this occurrence even more peculiar.

What We Tried:

  1. Devenv Spark Installation: Initially, we attempted to make the setup work by installing Spark using devenv. However, the provided version was incompatible with our needs (we required Spark version 3.3.0 due to compatibility requirements with AWS Glue).

  2. Custom Package Attempt: We attempted to create a custom package, but due to the performance issues with the Apache archive server, we had to abandon this approach.

  3. Switch to Poetry: We also tried switching to poetry to manage the devenv environment, and even directly installed it into our virtual environment. However, the error persisted across all configurations.

  4. Nixpkgs Configuration: Finally, I noticed that the nixpkgs configuration in devenv was pointing to the new devenv-nixpkgs. This was the primary difference from our previous working setup. By switching back to nixpkgs-unstable, the problem was resolved.

Conclusion:

It appears that the issue is related to the patches or configurations applied in the rolling devenv-nixpkgs package. Unfortunately, I am not certain about the specific patches or changes that might be causing this issue.

For reference, you can see the devenv configuration where this setup fails in this revision of our project and here is the link to the failing CI.

Request for Assistance:

Could you please investigate whether there are any specific changes in the devenv-nixpkgs that could be causing this incompatibility with PySpark on Linux environments? Any insights or suggestions would be greatly appreciated.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant