Skip to content

WalterMarch/test-python-pyspark-updates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updating Python Versions Necessitate PySpark Updates

Overview

Recent Dependabot messages suggested updating various Python modules. I elected to go further and update Python.

Python 3.11 with pyspark 3.3.2

Using pyspark 3.3.2 on Python 3.11, the following code:

df = spark.read.json(rdd)

produced a long error that ended with

  File "/usr/local/lib/python3.11/site-packages/pyspark/serializers.py", line 468, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

tuple index out of range error was resolved by upgrading pyspark to 3.4.0.

Python 3.13 with pyspark 3.4.0

When executing the example_code.py script in Python 3.13 with pyspark 3.4.0, the following error occurs:

Traceback (most recent call last):
  File "/workspaces/test311/example_code.py", line 1, in <module>
    from pyspark import SparkConf
  File "/usr/local/lib/python3.13/site-packages/pyspark/__init__.py", line 65, in <module>
    from pyspark.broadcast import Broadcast
  File "/usr/local/lib/python3.13/site-packages/pyspark/broadcast.py", line 38, in <module>
    from typing.io import BinaryIO  # type: ignore[import]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'typing.io'; 'typing' is not a package

typing.io was removed from Python.

No module named 'typing.io' error was resolved by upgrading pyspark to 3.5.0.

About

Testing updates to pyspark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published