Skip to content

Commit 7fee291

Browse files
committed
[SPARK-52224][CONNECT][PYTHON] Introduce pyyaml as a dependency for the Python client
### What changes were proposed in this pull request? Introduces pyyaml as a dependency for the Python client. When `pip install`-ing the pyspark client, it will be installed with it. ### Why are the changes needed? The pipeline spec file described in the [Declarative Pipelines SPIP](https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0) expects data in a YAML format. YAML is superior to alternatives, for a few reasons:  - Unlike the flat files that are used for [spark-submit confs](https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file), it supports the hierarchical data required by the pipeline spec. - It's much more user-friendly to author than JSON. - It's consistent with the config files used for similar tools, like dbt. The Declarative Pipelines CLI will be a Spark Connect Python client, and thus require a Python library for loading YAML. The pyyaml library is an extremely stable dependency. The `safe_load` function that we'll use to load YAML files was introduced more than a decade ago. ### Does this PR introduce _any_ user-facing change? Yes – users who `pip install` the PySpark client library will see the pyyaml library installed. ### How was this patch tested? - Made a clean virtualenv - Ran `pip install python/packaging/client` - Confirmed that I could `import yaml` in a Python shell ### Was this patch authored or co-authored using generative AI tooling? No Closes #50944 from sryza/yaml-dep. Authored-by: Sandy Ryza <[email protected]> Signed-off-by: Sandy Ryza <[email protected]>
1 parent a9881a8 commit 7fee291

File tree

4 files changed

+7
-0
lines changed

4 files changed

+7
-0
lines changed

dev/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ mlflow>=2.3.1
1212
scikit-learn
1313
matplotlib
1414
memory-profiler>=0.61.0
15+
pyyaml>=3.11
1516

1617
# PySpark test dependencies
1718
unittest-xml-reporting

python/packaging/classic/setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ def _supports_symlinks():
155155
_minimum_pyarrow_version = "11.0.0"
156156
_minimum_grpc_version = "1.67.0"
157157
_minimum_googleapis_common_protos_version = "1.65.0"
158+
_minimum_pyyaml_version = "3.11"
158159

159160

160161
class InstallCommand(install):
@@ -365,6 +366,7 @@ def run(self):
365366
"grpcio-status>=%s" % _minimum_grpc_version,
366367
"googleapis-common-protos>=%s" % _minimum_googleapis_common_protos_version,
367368
"numpy>=%s" % _minimum_numpy_version,
369+
"pyyaml>=%s" % _minimum_pyyaml_version,
368370
],
369371
},
370372
python_requires=">=3.9",

python/packaging/client/setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@
137137
_minimum_pyarrow_version = "11.0.0"
138138
_minimum_grpc_version = "1.67.0"
139139
_minimum_googleapis_common_protos_version = "1.65.0"
140+
_minimum_pyyaml_version = "3.11"
140141

141142
with open("README.md") as f:
142143
long_description = f.read()
@@ -209,6 +210,7 @@
209210
"grpcio-status>=%s" % _minimum_grpc_version,
210211
"googleapis-common-protos>=%s" % _minimum_googleapis_common_protos_version,
211212
"numpy>=%s" % _minimum_numpy_version,
213+
"pyyaml>=%s" % _minimum_pyyaml_version,
212214
],
213215
python_requires=">=3.9",
214216
classifiers=[

python/packaging/connect/setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@
9191
_minimum_pyarrow_version = "11.0.0"
9292
_minimum_grpc_version = "1.67.0"
9393
_minimum_googleapis_common_protos_version = "1.65.0"
94+
_minimum_pyyaml_version = "3.11"
9495

9596
with open("README.md") as f:
9697
long_description = f.read()
@@ -121,6 +122,7 @@
121122
"grpcio-status>=%s" % _minimum_grpc_version,
122123
"googleapis-common-protos>=%s" % _minimum_googleapis_common_protos_version,
123124
"numpy>=%s" % _minimum_numpy_version,
125+
"pyyaml>=%s" % _minimum_pyyaml_version,
124126
],
125127
python_requires=">=3.9",
126128
classifiers=[

0 commit comments

Comments
 (0)