Skip to content

Python 3.13t (free threads) Compat #7548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Qubitium opened this issue May 2, 2025 · 6 comments
Open

Python 3.13t (free threads) Compat #7548

Qubitium opened this issue May 2, 2025 · 6 comments

Comments

@Qubitium
Copy link

Qubitium commented May 2, 2025

Describe the bug

Cannot install datasets under python 3.13t due to dependency on aiohttp and aiohttp cannot be built for free-threading python.

The free threading support issue in aiothttp is active since August 2024! Ouch.

aio-libs/aiohttp#8796 (comment)

pip install dataset

(vm313t) root@gpu-base:~/GPTQModel# pip install datasets
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/datasets/                                                                                                                                             
Collecting datasets                                                                                                                                                                          
  Using cached datasets-3.5.1-py3-none-any.whl.metadata (19 kB)
Requirement already satisfied: filelock in /root/vm313t/lib/python3.13t/site-packages (from datasets) (3.18.0)
Requirement already satisfied: numpy>=1.17 in /root/vm313t/lib/python3.13t/site-packages (from datasets) (2.2.5)
Collecting pyarrow>=15.0.0 (from datasets)
  Using cached pyarrow-20.0.0-cp313-cp313t-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting pandas (from datasets)
  Using cached pandas-2.2.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Requirement already satisfied: requests>=2.32.2 in /root/vm313t/lib/python3.13t/site-packages (from datasets) (2.32.3)
Requirement already satisfied: tqdm>=4.66.3 in /root/vm313t/lib/python3.13t/site-packages (from datasets) (4.67.1)
Collecting xxhash (from datasets)
  Using cached xxhash-3.5.0-cp313-cp313t-linux_x86_64.whl
Collecting multiprocess<0.70.17 (from datasets)
  Using cached multiprocess-0.70.16-py312-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Using cached fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting aiohttp (from datasets)
  Using cached aiohttp-3.11.18.tar.gz (7.7 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: huggingface-hub>=0.24.0 in /root/vm313t/lib/python3.13t/site-packages (from datasets) (0.30.2)
Requirement already satisfied: packaging in /root/vm313t/lib/python3.13t/site-packages (from datasets) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /root/vm313t/lib/python3.13t/site-packages (from datasets) (6.0.2)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp->datasets)
  Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata (5.9 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets)
  Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting attrs>=17.3.0 (from aiohttp->datasets)
  Using cached attrs-25.3.0-py3-none-any.whl.metadata (10 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->datasets)
  Using cached frozenlist-1.6.0-cp313-cp313t-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->datasets)
  Using cached multidict-6.4.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting propcache>=0.2.0 (from aiohttp->datasets)
  Using cached propcache-0.3.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp->datasets)
  Using cached yarl-1.20.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (72 kB)
Requirement already satisfied: idna>=2.0 in /root/vm313t/lib/python3.13t/site-packages (from yarl<2.0,>=1.17.0->aiohttp->datasets) (3.10)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /root/vm313t/lib/python3.13t/site-packages (from huggingface-hub>=0.24.0->datasets) (4.13.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /root/vm313t/lib/python3.13t/site-packages (from requests>=2.32.2->datasets) (3.4.1)
Requirement already satisfied: urllib3<3,>=1.21.1 in /root/vm313t/lib/python3.13t/site-packages (from requests>=2.32.2->datasets) (2.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /root/vm313t/lib/python3.13t/site-packages (from requests>=2.32.2->datasets) (2025.4.26)
Collecting python-dateutil>=2.8.2 (from pandas->datasets)
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas->datasets)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->datasets)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas->datasets)
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Using cached datasets-3.5.1-py3-none-any.whl (491 kB)
Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Using cached fsspec-2025.3.0-py3-none-any.whl (193 kB)
Using cached multiprocess-0.70.16-py312-none-any.whl (146 kB)
Using cached multidict-6.4.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (220 kB)
Using cached yarl-1.20.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (404 kB)
Using cached aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Using cached aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB)
Using cached attrs-25.3.0-py3-none-any.whl (63 kB)
Using cached frozenlist-1.6.0-cp313-cp313t-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385 kB)
Using cached propcache-0.3.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282 kB)
Using cached pyarrow-20.0.0-cp313-cp313t-manylinux_2_28_x86_64.whl (42.2 MB)
Using cached pandas-2.2.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.9 MB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Building wheels for collected packages: aiohttp
  Building wheel for aiohttp (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for aiohttp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [156 lines of output]
      *********************
      * Accelerated build *
      *********************
      /tmp/pip-build-env-wjqi8_7w/overlay/lib/python3.13t/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
      !!
      
              ********************************************************************************
              Please consider removing the following classifiers in favor of a SPDX license expression:
      
              License :: OSI Approved :: Apache Software License
      
              See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
              ********************************************************************************
      
      !!
        self._finalize_license_expression()
      running bdist_wheel
      running build
      running build_py
      creating build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/typedefs.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/http_parser.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/client_reqrep.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/client_ws.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_app.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/http_websocket.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/resolver.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/tracing.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/http_writer.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/http_exceptions.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/log.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/__init__.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_runner.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/worker.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/connector.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/client_exceptions.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_middlewares.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/tcp_helpers.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_response.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_server.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_request.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_urldispatcher.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_exceptions.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/formdata.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/streams.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/multipart.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_routedef.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_ws.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/payload.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/client_proto.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_log.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/base_protocol.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/payload_streamer.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/http.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_fileresponse.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/test_utils.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/client.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/cookiejar.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/compression_utils.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/hdrs.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/helpers.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/pytest_plugin.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/web_protocol.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/abc.py -> build/lib.linux-x86_64-cpython-313t/aiohttp
      creating build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/__init__.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/writer.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/models.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/reader.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/reader_c.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/helpers.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/reader_py.py -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      running egg_info
      writing aiohttp.egg-info/PKG-INFO
      writing dependency_links to aiohttp.egg-info/dependency_links.txt
      writing requirements to aiohttp.egg-info/requires.txt
      writing top-level names to aiohttp.egg-info/top_level.txt
      reading manifest file 'aiohttp.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching 'aiohttp' anywhere in distribution
      warning: no files found matching '*.pyi' anywhere in distribution
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      warning: no previously-included files matching '*.pyd' found anywhere in distribution
      warning: no previously-included files matching '*.so' found anywhere in distribution
      warning: no previously-included files matching '*.lib' found anywhere in distribution
      warning: no previously-included files matching '*.dll' found anywhere in distribution
      warning: no previously-included files matching '*.a' found anywhere in distribution
      warning: no previously-included files matching '*.obj' found anywhere in distribution
      warning: no previously-included files found matching 'aiohttp/*.html'
      no previously-included directories found matching 'docs/_build'
      adding license file 'LICENSE.txt'
      writing manifest file 'aiohttp.egg-info/SOURCES.txt'
      copying aiohttp/_cparser.pxd -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/_find_header.pxd -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/_headers.pxi -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/_http_parser.pyx -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/_http_writer.pyx -> build/lib.linux-x86_64-cpython-313t/aiohttp
      copying aiohttp/py.typed -> build/lib.linux-x86_64-cpython-313t/aiohttp
      creating build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/.hash/_cparser.pxd.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/.hash/_find_header.pxd.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/.hash/_http_parser.pyx.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/.hash/_http_writer.pyx.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/.hash/hdrs.py.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/.hash
      copying aiohttp/_websocket/mask.pxd -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/mask.pyx -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      copying aiohttp/_websocket/reader_c.pxd -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket
      creating build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket/.hash
      copying aiohttp/_websocket/.hash/mask.pxd.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket/.hash
      copying aiohttp/_websocket/.hash/mask.pyx.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket/.hash
      copying aiohttp/_websocket/.hash/reader_c.pxd.hash -> build/lib.linux-x86_64-cpython-313t/aiohttp/_websocket/.hash
      running build_ext
      building 'aiohttp._websocket.mask' extension
      creating build/temp.linux-x86_64-cpython-313t/aiohttp/_websocket
      x86_64-linux-gnu-gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -fPIC -I/root/vm313t/include -I/usr/include/python3.13t -c aiohttp/_websocket/mask.c -o build/temp.linux-x86_64-cpython-313t/aiohttp/_websocket/mask.o
      aiohttp/_websocket/mask.c:1864:80: error: unknown type name ‘__pyx_vectorcallfunc’; did you mean ‘vectorcallfunc’?
       1864 | static CYTHON_INLINE PyObject *__Pyx_PyVectorcall_FastCallDict(PyObject *func, __pyx_vectorcallfunc vc, PyObject *const *args, size_t nargs, PyObject *kw);
            |                                                                                ^~~~~~~~~~~~~~~~~~~~
            |                                                                                vectorcallfunc
      aiohttp/_websocket/mask.c: In function ‘__pyx_f_7aiohttp_10_websocket_4mask__websocket_mask_cython’:
      aiohttp/_websocket/mask.c:2905:3: warning: ‘Py_OptimizeFlag’ is deprecated [-Wdeprecated-declarations]
       2905 |   if (unlikely(__pyx_assertions_enabled())) {
            |   ^~
      In file included from /usr/include/python3.13t/Python.h:76,
                       from aiohttp/_websocket/mask.c:16:
      /usr/include/python3.13t/cpython/pydebug.h:13:37: note: declared here
         13 | Py_DEPRECATED(3.12) PyAPI_DATA(int) Py_OptimizeFlag;
            |                                     ^~~~~~~~~~~~~~~
      aiohttp/_websocket/mask.c: At top level:
      aiohttp/_websocket/mask.c:4846:69: error: unknown type name ‘__pyx_vectorcallfunc’; did you mean ‘vectorcallfunc’?
       4846 | static PyObject *__Pyx_PyVectorcall_FastCallDict_kw(PyObject *func, __pyx_vectorcallfunc vc, PyObject *const *args, size_t nargs, PyObject *kw)
            |                                                                     ^~~~~~~~~~~~~~~~~~~~
            |                                                                     vectorcallfunc
      aiohttp/_websocket/mask.c:4891:80: error: unknown type name ‘__pyx_vectorcallfunc’; did you mean ‘vectorcallfunc’?
       4891 | static CYTHON_INLINE PyObject *__Pyx_PyVectorcall_FastCallDict(PyObject *func, __pyx_vectorcallfunc vc, PyObject *const *args, size_t nargs, PyObject *kw)
            |                                                                                ^~~~~~~~~~~~~~~~~~~~
            |                                                                                vectorcallfunc
      aiohttp/_websocket/mask.c: In function ‘__Pyx_CyFunction_CallAsMethod’:
      aiohttp/_websocket/mask.c:5580:6: error: unknown type name ‘__pyx_vectorcallfunc’; did you mean ‘vectorcallfunc’?
       5580 |      __pyx_vectorcallfunc vc = __Pyx_CyFunction_func_vectorcall(cyfunc);
            |      ^~~~~~~~~~~~~~~~~~~~
            |      vectorcallfunc
      aiohttp/_websocket/mask.c:1954:45: warning: initialization of ‘int’ from ‘vectorcallfunc’ {aka ‘struct _object * (*)(struct _object *, struct _object * const*, long unsigned int,  struct _object *)’} makes integer from pointer without a cast [-Wint-conversion]
       1954 | #define __Pyx_CyFunction_func_vectorcall(f) (((PyCFunctionObject*)f)->vectorcall)
            |                                             ^
      aiohttp/_websocket/mask.c:5580:32: note: in expansion of macro ‘__Pyx_CyFunction_func_vectorcall’
       5580 |      __pyx_vectorcallfunc vc = __Pyx_CyFunction_func_vectorcall(cyfunc);
            |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      aiohttp/_websocket/mask.c:5583:16: warning: implicit declaration of function ‘__Pyx_PyVectorcall_FastCallDict’ [-Wimplicit-function-declaration]
       5583 |         return __Pyx_PyVectorcall_FastCallDict(func, vc, &PyTuple_GET_ITEM(args, 0), (size_t)PyTuple_GET_SIZE(args), kw);
            |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      aiohttp/_websocket/mask.c:5583:16: warning: returning ‘int’ from a function with return type ‘PyObject *’ {aka ‘struct _object *’} makes pointer from integer without a cast [-Wint-conversion]
       5583 |         return __Pyx_PyVectorcall_FastCallDict(func, vc, &PyTuple_GET_ITEM(args, 0), (size_t)PyTuple_GET_SIZE(args), kw);
            |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for aiohttp
Failed to build aiohttp                                                                                                                                                                      
ERROR: Failed to build installable wheels for some pyproject.toml based projects (aiohttp)

Steps to reproduce the bug

See above

Expected behavior

Install

Environment info

Ubuntu 24.04

@Qubitium
Copy link
Author

Qubitium commented May 3, 2025

Update: datasets use aiohttp for data streaming and from what I understand data streaming is useful for large datasets that do not fit in memory and/or multi-modal datasets like image/audio where you only what the actual binary bits to fed in as needed.

However, there are also many cases where aiohttp will never be used. Text datasets that are not huge, relative to machine spec, and non-multi-modal datasets.

Getting aiohttp fixed for free threading appeals to be a large task that is not going to be get done in a quick manner. It may be faster to make aiohttp optional and not forced build. Otherwise, testing python 3.13t is going to be a painful install.

I have created a fork/branch that temp disables aiohttp import so non-streaming usage of datasets can be tested under python 3.13.t:

https://github.com/Qubitium/datasets/tree/disable-aiohttp-depend

@lhoestq
Copy link
Member

lhoestq commented May 5, 2025

We are mostly relying on huggingface_hub which uses requests to stream files from Hugging Face, so maybe we can move aiohttp to optional dependencies now. Would it solve your issue ? Btw what do you think of datasets in the free-threading setting ?

@Qubitium
Copy link
Author

Qubitium commented May 6, 2025

We are mostly relying on huggingface_hub which uses requests to stream files from Hugging Face, so maybe we can move aiohttp to optional dependencies now. Would it solve your issue ? Btw what do you think of datasets in the free-threading setting ?

I am testing transformers + dataset (simple text dataset usage) + GPTQModel for quantization and there were no issues encountered with python 3.13t but my test-case is the base-bare minimal test-case since dataset is not sharded, fully in-memory, text-only, small, not used for training.

On the technical side, dataset is almost always 100% read-only so there should be zero locking issues but I have not checked the dataset internals so there may be cases where streaming, sharding, and/or cases where datset memory/states are updated needs a per dataset threading.lock.

So yes, making aiohttp optional will definitely solve my issue. There is also a companion (datasets and tokenizers usually go hand-in-hand) issue with Tokenizers as well but that's simple enough with package version update: huggingface/tokenizers#1774

@lhoestq
Copy link
Member

lhoestq commented May 6, 2025

Ok I see ! Anyway feel free to edit the setup.py to move aiohttp to optional (tests) dependencies and open a PR, we can run the CI to see if it's ok as a change

@lhoestq
Copy link
Member

lhoestq commented May 7, 2025

actually there is #7294 already, let's see if we can merge it

@Wauplin
Copy link
Contributor

Wauplin commented May 7, 2025

wouldn't it be the good reason to switch to httpx? 😄 (would require slightly more work, short term agree with #7548 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants