Should support for TensorFlow be dropped given its library anti-patterns? #1595

matthewfeickert · 2021-09-07T00:05:34Z

matthewfeickert
Sep 7, 2021
Maintainer

TensorFlow is continually becoming a burden to support and is arguably slowing down project development given library development anti-patterns like tensorflow/tensorflow#40789 and tensorflow/tensorflow#51743 where dependencies are being pinned down to the patch level. At this point, TensorFlow is not a Python library — it is a Python application that you have to build your project around. That's fine, but it means that we can't effectively treat pyhf as a Python library then if we want to provide support across all backends.

I see two ways forward:

Drop all TensorFlow support entirely with pyhf release v0.7.0.
- Figure out how to test backends independently in the tests (c.f. Issue Supporting missing backends in tests #1454)
- Remove TensorFlow from the backends extra and keep it as a tensorflow extra
- Test the tensorflow extra in a fully different workflow and say that we will support a particular minor release of TensorFlow and TensorFlow Probability for a particular release cycle of pyhf.

TensorFlow is already the slowest backend that pyhf has and it is probably because we haven't optimized pyhf around its graph options. My guess is that no on using pyhf is actually using the TensorFlow backend anyway and is using JAX instead. I'm not sure that given the goals of use that pyhf has in terms of particle physics if there is any advantage to continuing support for TensorFlow. My inclination is to drop it, but I'm curious what others think.

cc @lukasheinrich @kratsg @alexander-held @jpivarski @henryiii @eduardo-rodrigues

jpivarski · 2021-09-07T00:23:00Z

jpivarski
Sep 7, 2021
Maintainer

I don't have direct experience with TensorFlow, but if what you say is true, that's a good thing to know and to be wary of in other HEP projects.

I remember that zfit once relied heavily on TensorFlow, but I don't know if it still does. @jonas-eschle, what is your take on the usability of TensorFlow as a library, for HEP? Is it getting more application-like than library-like?

2 replies

matthewfeickert Sep 7, 2021
Maintainer Author

TensorFlow is quite aggressive as all of its dependencies in setup.py use the compatible release syntax. For example, this is taken from its setup.py for TensorFlow v2.6.0

REQUIRED_PACKAGES = [
    # NOTE: As numpy has releases that break semver guarantees and several other
    # deps depend on numpy without an upper bound, we must install numpy before
    # everything else.
    'numpy ~= 1.19.2',
    # Install other dependencies
    'absl-py ~= 0.10',
    'astunparse ~= 1.6.3',
    'clang ~= 5.0',
    'flatbuffers ~= 1.12.0',
    'google_pasta ~= 0.2',
    'h5py ~= 3.1.0',
    'keras_preprocessing ~= 1.1.2',
    'opt_einsum ~= 3.3.0',
    'protobuf >= 3.9.2',
    'six ~= 1.15.0',
    'termcolor ~= 1.1.0',
    'typing_extensions ~= 3.7.4',
    'wheel ~= 0.35',
    'wrapt ~= 1.12.1',
    # These packages need to be pinned exactly as newer versions are
    # incompatible with the rest of the ecosystem
    'gast == 0.4.0',
    # TensorFlow ecosystem packages that TF exposes API for
    # These need to be in sync with the existing TF version
    # They are updated during the release process
    # When updating these, please also update the nightly versions below
    'tensorboard ~= 2.6',
    'tensorflow_estimator ~= 2.6',
    'keras ~= 2.6',
]

I'm not concerned about the tensorboard, tensorflow_estimator, and keras restrictions, but is isn't very nice that it makes such tight constraints on numpy, h5py, opt_einsum, and typing_extensions.

TensorFlow Probability is much better (here is v0.13.0)

REQUIRED_PACKAGES = [
    'six >= 1.10.0',
    'numpy >= 1.13.3',
    'decorator',
    'cloudpickle >= 1.3',
    'gast >= 0.3.2',  # For autobatching
    'dm-tree'  # For NumPy/JAX backends (hence, also for prefer_static)
]

but unless you're using it with JAX then you need TensorFlow.

I remember that zfit once relied heavily on TensorFlow, but I don't know if it still does.

zfit is still TensorFlow only. c.f. their requirements.txt which is read by setup.py

colorama
colored
colorlog
dotmap
iminuit>=2.3
ipyopt<0.12
nlopt
numdifftools
numpy>=1.16
ordered-set
pandas
scipy>=1.2
tabulate
tensorflow>=2.5,<2.7
tensorflow-addons
tensorflow_probability>=0.12,<0.14
texttable
tf_quant_finance
uproot>=4,<5

matthewfeickert Sep 7, 2021
Maintainer Author

For comparison:

`jax` and `jaxlib`

jax v0.2.20

    install_requires=[
        'absl-py',
        'numpy>=1.18',
        'opt_einsum',
        'scipy>=1.2.1',
    ],

jaxlib v0.1.71

    install_requires=['scipy', 'numpy>=1.18', 'absl-py', 'flatbuffers >= 1.12, < 3.0'],

PyTorch

torch v1.8.2

install_requires = [
    'typing_extensions',
    'dataclasses; python_version < "3.7"'
]

alexander-held · 2021-09-07T07:28:51Z

alexander-held
Sep 7, 2021
Maintainer

A current example for the consequences of the restrictive dependency pinning in tensorflow is the following:

pip install black tensorflow tensorflow-probability

will first install tensorflow 2.6.0, and then downgrade to 2.3.4 in a few steps due to a typing-extensions conflict. black requires typing-extensions>=3.10.0.0 (as of black 21.8b0), while tensorflow requires typing-extensions~=3.7.4. In the end this installation breaks tensorflow-probability:

import tensorflow, tensorflow_probability

will fail with

ImportError: This version of TensorFlow Probability requires TensorFlow version >= 2.5; Detected an installation of version 2.3.4. Please upgrade TensorFlow to proceed.

This combination of packages comes up for example with pip install black pyhf[tensorflow], and also appears when updating cabinetry to pyhf 0.6.3 (scikit-hep/cabinetry#248).

4 replies

lukasheinrich Sep 7, 2021
Maintainer

I would vote for keeping it but having it be it's own extra

alexander-held Sep 7, 2021
Maintainer

From the cabinetry side, I am considering dropping official support for the TF backend of pyhf. Dependency resolution for the cabinetry dev environment in Python 3.9 (including pyhf[backends]) is extremely slow, I stopped it after half an hour of CI. It should still work fine if users manage to install it correctly themselves, but would not be tested any more as it looks like it adds a significant amount of overhead.

matthewfeickert Sep 7, 2021
Maintainer Author

I would vote for keeping it but having it be it's own extra

Yeah, after all the work that we've put in just to get it working it feels dissatisfying to just drop all support for it. Though the level of restriction that it makes pip's dependency resolver put on everything else at install time is becoming unfeasible.

From the cabinetry side, I am considering dropping official support for the TF backend of pyhf.

@alexander-held I don't think that anyone on the pyhf side would fault cabinetry for dropping support for one of our backends. The maintenance costs should be decided at each project.

alexander-held Sep 7, 2021
Maintainer

Following a suggestion from @matthewfeickert, we can keep all backends supported in cabinetry by moving them to a separate setup extra that is installed in a second step. This results in a usable environment and makes the dependency resolution much faster too.

jonas-eschle · 2021-09-07T12:03:42Z

jonas-eschle
Sep 7, 2021
Maintainer

Hey, that is indeed a good discussion and a thought worth. We do in zfit still mainly rely on TensorFlow, however, it is also compatible with other libraries (because we use TF and not e.g. JAX). Let me elaborate, it's a somewhat love-hate relationship with TF having a few (annoying) drawbacks but, AFAIU, TF still being the most powerful library for computations in HEP.

First, comparing to alternatives, we can neglect numba (as having a quite different concept of compiling, not tracing) and looking at jit-traceable, autograd supporting libraries (JAX, torch,...).

About the performance and graph: I had a brief look at it and it seems (AFAI see) that pyhf does not really jit TF? Since TF 2 the computational model completely changed. It's not a graph anymore (technically, but implementation detail) but a jittable library like JAX. Like jax, it actually supports XLA. I think in pyhf this is always eager, no jitting, and also no xla therefore (jit_compile flag in tf.function).

I did a preliminary test using the benchmarks and TF is exactly at the same speed as JAX and torch.
I can take a look at this if you're interested, in general TF is much like JAX (and that's how we should also wrap the objective function). Update: to be precise, TF is in my benchmarks even up to 5-10% faster than others.

Now for TensorFlow:

Disadvantages:

Yes, dependencies. As you mentioned about the patch release pin down, I think it's just pinned to the minor release (~= the wiggle leaves the last dimension floating). That is still a heavy requirement for a library. The main trouble to me comes currently with with the requirement of numpy<1.20. TF is rather slow in adopting this things, however I would see it still quite as a library, not an application, just a rather slow moving one.
packaging-wise: there are a few additional libraries (using TF) that provide utilities that are not on conda

Mixed:

the API is different from numpy, yet they offer a (still experimental) numpy API. Apart from that, I find their API in parts even superior to Numpy in terms of name consistency (for example there is a stack and unstack, the shape is always called shape, not sometimes size)

Advantages, mainly compared to JAX:

TensorFlow can handle unknown shapes jitted. JAX for example (or XLA in general), can't. This is a large thing for an open-world fitter like zfit because whenever you do something dynamically, you have unknown shapes. Accept-reject sampling, adaptive integration.... It happens more than I thought and restricting is a quite limiting thing. For pyhf this is not a problem I think as shapes are always known (if it works with JAX that is already confirmation that it does).
Interoperability: When I looked at swapping the backend the last time, this is what I remember. TensorFlow does stick to the numpy array interface. As the only library AFAIK. You can give it anything that has the array protocol implemented (numpy, but also jax, torch...), which JAX can't handle
at least currently, JAX still lacks a lot of content (such as no float64 support for convolutions. However, this is not fundamental and will (presumably) improve over time, at some point I assume that JAX will be superior in content.

Advantages, mainly compared to PyTorch:

TensorFlow has AFAIK way better support for more mathematical functionality such as complex, jagged, sparse.

Conclusion TF for HEP

TF is not the perfect library for HEP, indeed. But it seems to me still the most capable library around, with advantages (such as unknown shape handling) that are crucial. I hope and assume that JAX one day will be superior. The current disadvantages are mostly of package technical nature. Given also it's recent switch to a completely new model, the whole package seems to be stabilizing. But yes, it's big and feels somewhat intrusive.

This is why I think it's still very capable for HEP in general, including zfit.

TF for pyhf

The packaging can really increase the burden for no reason. I think that a swappable backend is nice, but more of an implementation detail (it means a library needs to restrict itself to the subset of all common features, effectively removing features). The more important aspect to me seems to be interoperable with the rest. And as pyhf has the advantage of needing only this subset, having static shapes, I think this is very reasonable.

I would suggest to go along the lines @matthewfeickert proposed:

give it a try with jitting and see the speed gain
since the problems are related to dependencies basically: drop the "support" as long as it does, unfortunately, not behave relaxed enough with requirements and is a problem, remove it from the default backends, run the normal ci without it and (maybe) have a runner with TF that is allowed to fail. In this way we see once we break something. If TF gets better with dependencies, we can always add it back.

So summarized, I think TF is still the most capable library around for speedy computations, but big and slow moving. I would keep it in general, but not at the cost of not being able to install other packages

1 reply

jonas-eschle Sep 27, 2023
Maintainer

Just to add to my own post, while TF seems still very capable, there are more serious cracks with new features. For example tensorflow/tensorflow#57365 is quite serious IMHO, but not addressed since over a year. It completely broke the zfit internals.
And of course, my all-time favorite of a serious GPU memory leak tensorflow/tensorflow#36465 (comment) that we cannot simply "fix" by using processes.

mihaimaruseac · 2021-09-07T18:28:13Z

mihaimaruseac
Sep 7, 2021

Currently working on untentangling this mess but progress is unfortunately slow (devinfra team severely reduced, cleaning up the mess not being considered a promotion worthy project, so very few takers for these; then issues with internal Google monorepo + single versioning policy)

0 replies

mihaimaruseac · 2021-10-04T20:01:45Z

mihaimaruseac
Oct 4, 2021

TF should now have a better setup.py

1 reply

matthewfeickert Oct 4, 2021
Maintainer Author

Yup. I've been following tensorflow/tensorflow#51743 (comment) so was glad to see that today.

Nice work to you and your team @mihaimaruseac, as indeed, this is quite an improvement and will be nice to have in the next release. :)

REQUIRED_PACKAGES = [
    # NOTE: As numpy has releases that break semver guarantees and several other
    # deps depend on numpy without an upper bound, we must install numpy before
    # everything else.
    'numpy >= 1.14.5', # keras 2.6 needs 1.19.2, h5py needs 1.14.5 for py37
    # Install other dependencies
    'absl-py >= 0.4.0',
    'astunparse >= 1.6.0',
    'libclang >= 9.0.1',
    'flatbuffers >= 1.12, < 3.0', # capped as jax 0.1.71 needs < 3.0
    'google_pasta >= 0.1.1',
    'h5py >= 2.9.0', # capped since 3.3.0 lacks py3.6
    'keras_preprocessing >= 1.1.1', # 1.1.0 needs tensorflow==1.7
    'opt_einsum >= 2.3.2', # sphinx pin not removed up til 3.3.0 release
    'protobuf >= 3.9.2',
    'six >= 1.12.0',
    'termcolor >= 1.1.0',
    'typing_extensions >= 3.6.6',
    'wheel >= 0.32.0, < 1.0', # capped as astunparse 1.6.0-1.6.3 requires < 1.0
    'wrapt >= 1.11.0',
    # These packages need to be pinned exactly as newer versions are
    # incompatible with the rest of the ecosystem
    'gast >= 0.2.1, < 0.5.0', # TODO(lpak): if this breaks, revert to 0.4.0
    # TensorFlow ecosystem packages that TF exposes API for
    # These need to be in sync with the existing TF version
    # They are updated during the release process
    # When updating these, please also update the nightly versions below
    'tensorboard ~= 2.6',
    'tensorflow_estimator ~= 2.6',
    'keras ~= 2.6',
    'tensorflow-io-gcs-filesystem >= 0.20.0',
]

matthewfeickert · 2023-05-16T06:06:04Z

matthewfeickert
May 16, 2023
Maintainer Author

Similar problematic behavior is unfortunately continuing with tensorflow/probability#1723

1 reply

mihaimaruseac Aug 18, 2023

Unfortunately, it's very hard to do a TF 3.0, and APIs need to be deprecated/replaced/changed. See tensorflow/community#450 for the current mechanism to achieve these goals

matthewfeickert · 2023-09-26T23:29:55Z

matthewfeickert
Sep 26, 2023
Maintainer Author

TensorFlow v2.14.0 breaks TensorFlow Probability at import. tensorflow/probability#1752

6 replies

jonas-eschle Sep 27, 2023
Maintainer

Ah yes, this can happen, usually the TFP versions are released against a specific TF/JAX version, i.e. 0.21 is against TF 2.13, and the release usually happens a couple of days, weeks after it.

What is helpful is to test in this case against the nightlies, they have them in "tf-nightly" or "tfp-nigthly", seems like "tensorflow2.14" against "tfp-nightly" works

matthewfeickert Sep 27, 2023
Maintainer Author

Yes, I'm aware. Though, no, it is not helpful to test against the *-nightly wheels. The HEAD of TFP is not a rc and what matters is testing in user space.

jonas-eschle Sep 27, 2023
Maintainer

Yeah, fully agree, I meant it "may" helps to preview a bit, but you're of course right!
I tend to therefore restrict the TFP version not to break previous releases. Conservative, maybe, but the code runs for sure also in the future... technically, TFP should do that but they don't, AFAIU, because of different TF versions, i.e. TF-core, TF-macos, TF-gpu (some are not used anymore, but still)

alexander-held Oct 5, 2023
Maintainer

Now that the temporary restriction of v2.14.0 is removed again in pyhf via #2344, I was looking forward to this resolving a dependency clash I observed via typing_extensions. Unfortunately, tensorflow-probability introduces another clash: tensorflow/probability#1753 (my specific case is scikit-hep/cabinetry#428).

matthewfeickert Oct 5, 2023
Maintainer Author

😞

matthewfeickert · 2024-03-12T21:36:01Z

matthewfeickert
Mar 12, 2024
Maintainer Author

c.f. tensorflow/probability#1795 and PR #2452.

A reminder that pyhf is dropping support for tensorflow in pyhf v0.9.0 given that tensorflow is too brittle of an ecosystem to use in library development.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should support for TensorFlow be dropped given its library anti-patterns? #1595

{{title}}

Replies: 8 comments 15 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Should support for TensorFlow be dropped given its library anti-patterns? #1595

matthewfeickert Sep 7, 2021 Maintainer

Replies: 8 comments · 15 replies

jpivarski Sep 7, 2021 Maintainer

matthewfeickert Sep 7, 2021 Maintainer Author

matthewfeickert Sep 7, 2021 Maintainer Author

jax and jaxlib

PyTorch

alexander-held Sep 7, 2021 Maintainer

lukasheinrich Sep 7, 2021 Maintainer

alexander-held Sep 7, 2021 Maintainer

matthewfeickert Sep 7, 2021 Maintainer Author

alexander-held Sep 7, 2021 Maintainer

jonas-eschle Sep 7, 2021 Maintainer

jonas-eschle Sep 27, 2023 Maintainer

mihaimaruseac Sep 7, 2021

mihaimaruseac Oct 4, 2021

matthewfeickert Oct 4, 2021 Maintainer Author

matthewfeickert May 16, 2023 Maintainer Author

mihaimaruseac Aug 18, 2023

matthewfeickert Sep 26, 2023 Maintainer Author

jonas-eschle Sep 27, 2023 Maintainer

matthewfeickert Sep 27, 2023 Maintainer Author

jonas-eschle Sep 27, 2023 Maintainer

alexander-held Oct 5, 2023 Maintainer

matthewfeickert Oct 5, 2023 Maintainer Author

matthewfeickert Mar 12, 2024 Maintainer Author

matthewfeickert
Sep 7, 2021
Maintainer

Replies: 8 comments 15 replies

jpivarski
Sep 7, 2021
Maintainer

matthewfeickert Sep 7, 2021
Maintainer Author

matthewfeickert Sep 7, 2021
Maintainer Author

`jax` and `jaxlib`

alexander-held
Sep 7, 2021
Maintainer

lukasheinrich Sep 7, 2021
Maintainer

alexander-held Sep 7, 2021
Maintainer

matthewfeickert Sep 7, 2021
Maintainer Author

alexander-held Sep 7, 2021
Maintainer

jonas-eschle
Sep 7, 2021
Maintainer

jonas-eschle Sep 27, 2023
Maintainer

mihaimaruseac
Sep 7, 2021

mihaimaruseac
Oct 4, 2021

matthewfeickert Oct 4, 2021
Maintainer Author

matthewfeickert
May 16, 2023
Maintainer Author

matthewfeickert
Sep 26, 2023
Maintainer Author

jonas-eschle Sep 27, 2023
Maintainer

matthewfeickert Sep 27, 2023
Maintainer Author

jonas-eschle Sep 27, 2023
Maintainer

alexander-held Oct 5, 2023
Maintainer

matthewfeickert Oct 5, 2023
Maintainer Author

matthewfeickert
Mar 12, 2024
Maintainer Author