Python: high performance backend #8

imagovrn · 2018-01-26T17:49:20Z

More Efficient Python Implementation

Current flatdata-py implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:

Benchmark two implementations on the same data, to know the gap, monitor the benchmarks in CI. Performance benchmarks #9
Optimize pure-python implementation.
Introduce parallel processing in pure python implementation (or ease integration with a library that would do it for us, like dask).
As an alternative approach, create flatdata-py-ext implementation which would build and use binary extensions to improve performance.

The text was updated successfully, but these errors were encountered:

boxdot · 2018-01-27T17:35:51Z

As far as I understand, the python implementation is fully functional.

I think we should make this issue more precise. E.g. by specifying what performance problems you see right now. Some benchmark numbers could also help. This would enable us either to split this issue or introduce a precise check-list what needs to be done.

imagovrn · 2018-01-27T20:29:50Z

@boxdot Thanks for the comment. Updated the issue. And should stop creating items here from phone, not to confuse anybody.

gferon · 2018-02-06T20:14:47Z

I'm curious, do you already have something that we could commit to produce performance figures? i.e. compare C++ implementation vs the Python implementation with different Python runtimes (CPython, PyPy, ...)

imagovrn · 2018-02-09T08:40:28Z

@gferon not yet. That'd be #9

imagovrn added the python label Jan 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: high performance backend #8

Python: high performance backend #8

imagovrn commented Jan 26, 2018 •

edited

Loading

boxdot commented Jan 27, 2018 •

edited

Loading

imagovrn commented Jan 27, 2018

gferon commented Feb 6, 2018 •

edited

Loading

imagovrn commented Feb 9, 2018

Python: high performance backend #8

Python: high performance backend #8

Comments

imagovrn commented Jan 26, 2018 • edited Loading

More Efficient Python Implementation

boxdot commented Jan 27, 2018 • edited Loading

imagovrn commented Jan 27, 2018

gferon commented Feb 6, 2018 • edited Loading

imagovrn commented Feb 9, 2018

imagovrn commented Jan 26, 2018 •

edited

Loading

boxdot commented Jan 27, 2018 •

edited

Loading

gferon commented Feb 6, 2018 •

edited

Loading