You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current flatdata-py implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:
Benchmark two implementations on the same data, to know the gap, monitor the benchmarks in CI. Performance benchmarks #9
Optimize pure-python implementation.
Introduce parallel processing in pure python implementation (or ease integration with a library that would do it for us, like dask).
As an alternative approach, create flatdata-py-ext implementation which would build and use binary extensions to improve performance.
The text was updated successfully, but these errors were encountered:
As far as I understand, the python implementation is fully functional.
I think we should make this issue more precise. E.g. by specifying what performance problems you see right now. Some benchmark numbers could also help. This would enable us either to split this issue or introduce a precise check-list what needs to be done.
I'm curious, do you already have something that we could commit to produce performance figures? i.e. compare C++ implementation vs the Python implementation with different Python runtimes (CPython, PyPy, ...)
More Efficient Python Implementation
Current
flatdata-py
implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:dask
).The text was updated successfully, but these errors were encountered: