[Proposal] Make DASH parsing first pass closer to original XML #1652
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #1482 (removal of our "native" MPD parser).
Initial problem: DASH Patch Document
I was looking into the implementation of the "MPD Patch Document", which allows for theoretically more optimal MPD updates by only communicating what changed since the last update.
From what I can see, it relies on a special format indicating that some XML attribute or element is either added, removed or updated compared to the last loaded MPD (also an XML).
Any element or attribute seems to be updateable here, which is a problem I would imagine for most DASH players as we generally transform a loaded MPD (the original XML document) before using it, often in a new format that looses a lot of information from the original XML, including the original structure (e.g. was a
SegmentTemplate
at thePeriod
or at theRepresentation
level?).One of the strategies, seen in the shaka-player, could be to only consider the most sensible subset of properties that are generally updated, but I was a little afraid this led to a insufficient implementation status for the application or packagers, which may need this for things we did not prepare for yet.
Interestingly, the DASH 5th edition specification has the following sentence on that subject:
So we have to have a preferably structurally non-destructive and efficient (not re-parsing everything each time and preferably memory-efficient) way to store the previous MPD.
Our MPD "Intermediate Representation"
We had already such a format (parsed XML, but with all the original structural information still here), which we called the "MPD intermediate representation", which is outputed by our first pass of our DASH MPD parsers.
The MPD Intermediate Representation is basically the XML format transformed it into a JS Object, with what were semantically dates and numbers transformed into JavaScript's numbers, what were XML elements into objects and so on.
Here we could e.g. keep this object around when we detect that "MPD patching" is possible,
But there were still small modifications compared to the original structure. Minor ones like the
lang
attribute becominglanguage
and more major ones like some elements having a special syntax, some children being added to an array and other not etc.We could also argue that the transformation of the original textual format into numbers (e.g. ISO8601 duration into number of seconds) could be a loss of information, but I cannot think of any case where the format in which an attribute or Element inner content would be important to keep for Patch Documents.
This proposal
So here I propose just to greatly simplify the output of that first pass:
children
,attributes
andvalue
(for the stringified inner content)There are still some exceptions to this: some elements (
SegmentTimeline
,EventStream
), still have a special syntax for now because it would not be as straigtforward to have this format for them for example.This work does not even begin to implement PATCH Documents, but I thought that it was a good standalone work anyway, as it makes the first parsing pass much easier to update: just keep the same structure than in the original XML.