VTT Writer performances #438

nywhere · 2025-01-28T07:52:31Z

Hi,

Source is an SCC with 1033 captions.

Converting it to TTML is ok
Converting it to VTT is very slow (10s on MacBook pro)
Converting it to TTML then to VTT is very slow (10s on MacBook pro)

source.scc.zip

Top 3 Time-Consuming Operations:

_process_element (isd.py:413)

Consumed 35.644s cumulative time
Called 2,165,274 times
Recursive function (note the ncalls format: 2165274/48445)

_compute_styles (isd.py:400)

Consumed 11.755s cumulative time
Called 150,222 times

set_style (model.py:339)

Consumed 5.424s cumulative time
Called 10,642,028 times

Please let me know if you need more details.

palemieux · 2025-01-28T16:29:25Z

The current algorithm is not optimized when the input document both generates a large number of regions with indefinite temporal intervals and a large number of captions/subtitles: all regions must be visited for each captions/subtitle (NxM problem).

Couple of options come to mind:

reduce the number of regions generated when reading an SCC document by coalescing regions with similar dimensions (probably a good idea in any event)
optimize the ISD generation algorithm (probably following a pattern similar to that at sandflow/imscJS@b728b68)
add multi-processing support (not sure it is entirely worth the effort)

nywhere · 2025-01-28T17:17:06Z

The first option definitely makes sense.
Option 2 would be nice too.

Maybe some caching on computed style could help too, if applicable of course.

palemieux added enhancement New feature or request c-scc-reader c-isd labels Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VTT Writer performances #438

VTT Writer performances #438

nywhere commented Jan 28, 2025

palemieux commented Jan 28, 2025

nywhere commented Jan 28, 2025

VTT Writer performances #438

VTT Writer performances #438

Comments

nywhere commented Jan 28, 2025

palemieux commented Jan 28, 2025

nywhere commented Jan 28, 2025