faster and fully conformant
- all conformance streams decode correctly
- faster, parallel SAO
- image memory can be allocated externally
- frame-dropping API
- selection which temporal sub-stream to decode
- fake speedup options to turn off deblocking and SAO
- new parallelization architecture
- sherlock265 can show slice boundaries and tiles
- core rewritten in C++ for better maintainability (API still C only)