Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine support for [orx:]instanceID & [jr:]preload="uid"; vast improvements in support for XML/XPath namespace semantics #310

Merged
merged 9 commits into from
Feb 19, 2025

Conversation

eyelidlessness
Copy link
Member

@eyelidlessness eyelidlessness commented Feb 14, 2025

Closes #278. Closes #270.

I have verified this PR works in these browsers (latest versions):

  • Chrome
  • Firefox
  • Safari (macOS)
  • Safari (iOS)
  • Not applicable

I don't think there's anything browser-specific to check here! I did run a few forms in Chrome to investigate performance regressions.

What else has been done to verify that this works as intended?

I believe the tests added in the first commit cover all of the requirements and assumptions.

Why is this the best possible solution? Were any other approaches considered?

This is a hodgepodge of special case and general solution, so... yes other approaches were considered!

On the one hand, the main special case: this addresses only the uid subset of jr:preload (#122). In hindsight, I don't think the general case would have been much more onerous!

On the other hand, the general solution: this addresses a very large portion of the remaining gaps in support for XML/XPath namespace semantics. I think it may have been possible to address this more narrowly, but I think it would have taken more work and would have resulted in code which is harder to understand, maintain, and ultimately correct for the general case.

Under What's changed, I also go into more detail about alternative approaches to the portion of the change addressing #270.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

There are some assertions around deferred resolution of prefixes in QualifiedName which could occur in maybe surprising ways? But I think the fact that they don't show up in test runs is a pretty good sign.

Do we need any specific form for testing your changes? If so, please attach one.

N/A

What's changed

Support for jr:preload="uid"

  1. The parsing portion of this is a bit more expansive than the narrow special case of uid.
  2. The actual implementation of uid is ... not really ideal! But we plan to address this more generally in Add support for jr:preload #122 so I didn't want to hold this up since the next section was a much larger scope.

Vast improvements in support for XML/XPath namespace semantics

I think this is best summarized by the commit notes and changeset. Happy to elaborate further here if that would help!

Inject meta and instanceID when missing in form definition

I'm mostly calling this out separately because it expands an area of functionality which might be surprising in review, and which feels increasingly questionable in the first place. Specifically, this aspect of the change is handled in XFormDOM, an oddity which has survived the early Web Forms prototyping phase. The gist is this: we initially parse ODK XForms XML into a browser DOM representation, and then we do some targeted preprocessing/normalization before the more comprehensive parsing into a coherent data model (i.e. the various *Definitions).

I did briefly explore doing this normalization in construction of RootDefinition (which is where I think we'd start if we didn't have this oddity). That approach would have produced something like a MetaNodeDefinition (probably subclass of SubtreeDefinition) and InstanceIDNodeDefinition (probably subclass of LeafNodeDefinition). I bailed on this approach when it didn't seem like it would be any simpler. So... well, I apologize for adding more browser DOM code! But it does feel like the least worst option for moving this forward now. And it'll probably be worth thinking about the whole set of normalization concepts together if/when we decide to get rid of XFormDOM.

…ling preload=“uid” test

(Doing a TDD-ish workflow for once!)

**ALL** of these tests will fail as of this commit. I believe this tests all of the requirements and assumptions in these two issues:

- [#278: Add uid preload for instanceID](#278)
- [#270: Inject instanceID node and v4 UUID value in submission if form definition doesn't specify](#270)

A series of following commits will address these test failures.

Note that the existing test which was previously marked failing **is not concerned** with the special `/*/orx:meta/orx:instanceID` element, but it will also pass once `jr:preload=“uid”` is implemented. At least one added test may look kindasorta redundant to that one, but there’s value in the full suite of guarantees about the engine’s meta/instanceID behavior we intend to support.
Re “ONE OFF SPECIAL CASE”: see various `@todo`s and other details in comments.
… to same

This is preparation to support namespace-qualified nodes, in turn to support e.g. `/data/orx:meta/orx:instanceID`.
Copy link

changeset-bot bot commented Feb 14, 2025

🦋 Changeset detected

Latest commit: 4d97e54

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@getodk/xforms-engine Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

…space usage

This is an ENORMOUS yak shave. The user-facing functionality this achieves is addressed in point 0 below.

Addresses:

0. FEATURE: Support and preserve XML namespace of a primary instance’s special meta subtree, as well as that subtree’s instanceID child element.
1. SPEC: All primary instance nodes consistently retain namespace information as expressed in the source form definition (ODK XForms XML)
2. SPEC: Model bindings referencing namespace-prefixed nodes correctly resolve those nodes
3. SPEC: Form-defined XPath expressions referencing non-default-namespace primary instance nodes with prefixes resolve those referenced nodes correctly
4. SPEC: non-default namespaces of nodes defined in **internal** secondary instances _should_ be handled correctly (not tested! Don’t necessarily quote me on this; testing deprioritized as this case has been deemed highly unlikely)
5. SPEC: non-default namespaces of nodes defined in **external XML** secondary instances are now a narrower gap; their namespaces are resolved according to defaults we’ve already used prior to this commit

To support all of this, a **very general** (and even more generalizable, see JSDoc especially on `QualifiedName`) set of namespace and name abstractions have been introduced. These abstractions are used here to ensure consistency of namespace-/name-related logic throughout the engine’s _runtime representation_.

I’ve done my best to ensure most of the important details for _understanding this change_ are expressed in JSDoc or other comments calling out important semantic and logical details.

One minor detritus note worth mentioning because it may stand out in the diff: the intermediate abstract class `StaticNamedNode` has been removed, somewhat ironically because any value it may have had was superceded by the new abstractions introduced in this change.

- - -

Lastly, a note on the scope of this change. As I’ve called out at least a couple times above, this change is targeted at _runtime instance state_. This includes the parsed representation of _primary instance model nodes_, because their definitions are the basis for namespace-/name-related logic associated with their `InstanceNode` implementations. This also includes the representations of secondary instances, since their nodes can be arbitrarily named, and referenced in primary instance **computations**.

To be explicit: there are still some inconsistencies in namespace strictness throughout the engine’s parsing logic. Some of these remaining inconsistencies are intentional (matching behavior in JavaRosa/Collect). The rest are _very probably harmless_, as they will now largely be restricted to parsing a form’s `<h:body>`.
@eyelidlessness eyelidlessness force-pushed the features/engine/meta-instanceid branch from 889e9c1 to ed868de Compare February 16, 2025 15:03
Not really sure if these tests have any value at all anymore. I definitely don’t think there’s any value in testing the normalization behavior here (the complete behavior is more comprehensively tested in scenario, and the complete behavior has more meaning than checking sub-structures in isolation). For now, the least questionable thing to do (i.e. without thinking about whether to remove tests like this entirely) seems to be expanding the test fixture’s structure to bypass said normalization, and expanding the assertion to match.
It’s a little bit surprising that the biggest perf hit on this branch comes from `NamespaceURL`! This commit mitigates the biggest hit I see in profiling. After this I’m going to do a brief timeboxed spike into using branded types rather than runtime to clearly distinguish between the inherent pair of nullable strings (namespace URI, prefix). If another commit doesn’t follow this one on the PR, safe to assume I didn’t ultimately pursue that direction (at least for now).
@eyelidlessness eyelidlessness changed the title (WIP) Engine support for [orx:]instanceID & [jr:]preload="uid"; vast improvements in support for XML/XPath namespace semantics Engine support for [orx:]instanceID & [jr:]preload="uid"; vast improvements in support for XML/XPath namespace semantics Feb 17, 2025
@eyelidlessness eyelidlessness marked this pull request as ready for review February 17, 2025 18:28
Copy link
Collaborator

@latin-panda latin-panda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The changes are clear, and the terms are well-defined ( I got a better understanding of namespaces and qualified names handling)

let instanceID = getMetaChildElement(meta, 'instanceID');

if (meta == null) {
meta = createNamespacedChildElement(primaryInstanceRoot, OPENROSA_XFORMS_NAMESPACE_URI, 'meta');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Historical) question:

Was meta introduced by OpenRosa? Hence, the use of the OpenRosa namespace here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back well before my time, so I can only ask the same and/or do some archaeology. I found this1, which fills in some gaps for me!

Footnotes

  1. Found from this, a couple git blame steps back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. That was an interesting thread to read!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blast from the past! In practice I don't think namespaces have turned out to be very valuable, unfortunately.

Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I wish we didn't have to support namespaces! I checked that the tests match my expectations and have good coverage. I didn't spend a ton of time on the implementation but did go through and nothing caught my eye other than one related thought inline.

* like a form definition's XML. In all other cases, we'd want to resolve a
* prefix here from the _primary instance_ context. However, we don't (yet) have
* access to the primary instance context from a static node. So we currently
* fall back to the previous (incomplete/potentially wrong) default mapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, there's another case where we'll need primary instance context from a static node -- it's possible to reference primary instance values in secondary instance labels: https://docs.google.com/spreadsheets/d/1K9I-nbdn1g_YYyiG_IuQW-YhtcODG9IPmQA2qRKsds8/edit?gid=0#gid=0

This is not my favorite feature but it is used. I imagine this will be closely related to references in translated text? Should it be its own issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost 100% certain the pattern in your linked form already works! Labels from secondary instances are parsed and computed just like inline labels in <item>s.1

I wasn't able to test it directly to confirm this with the linked form, because it gives me ODK Validate errors:

Error: ODK Validate Errors:
Selection choice label text and image uri are both missing for: ${person} choice: 1.

Selection value is missing for: ${person} choice: 1. The XML is invalid.

Selection choice label text and image uri are both missing for: ${person} choice: 2.

Selection value is missing for: ${person} choice: 2. The XML is invalid.

Xform is invalid!

The following files failed validation:
${Dynamic} choice labels.xml

Result: Invalid

(I tried in both the Web Forms Preview and XLSForm Online, just to be sure there isn't some config or other env difference causing this.)

I'm also pretty sure I could derive valid XML from what I see in the XLSForm to confirm, but it may take a bit for me to get to that. Feel free to confirm in the meantime! But I don't think we should file an issue for this without confirming it doesn't work.


Clarification of "primary instance context from a static node"

This comment is about having access to the primary instance context during parsing. Addressing this would pretty much involve:

  1. Introduce a namespace resolving concept slightly earlier in the parse flow
  2. Change the StaticNode &co constructor signatures to accept that concept
  3. Call into that at name resolution time
  4. Possibly cache aggressively, because the affected nodes are known to be static

I held off on it in this PR mainly because I wanted to avoid exploding the scope of this change (even further), especially for a case which is pretty unlikely in real world forms (IIRC you said this was the case, and that definitely seems right for XLSForms!). Secondarily I held off because I anticipate another change to the StaticNode (&co) constructor signatures in the very near term, and it may be trivial to incorporate addressing this last sizable gap into that.

Footnotes

  1. Which is very likely going to be a good conceptual starting point for how we'll end up addressing Allow output in translation text #22. The only reason they aren't solved the same way already is because I took the spec signature (i.e. that itext returns a string) too literally. I think what we'll actually want is for jr:itext to return a node-set, then we can treat translated labels as if the resolved translation node is substituted in place of the itext ref.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. The problem is with the select from repeat, not with the dynamic labels. I'll look into why Validate doesn't like it, I think I have an idea. Here's the XLSForm without that question: Dynamic choice labels(1).xlsx Looks like it doesn't work yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh geez 🤦 it doesn't work because it is #22. 🤣

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess there's still no need to file an(other) issue.

I did just add a slew of notes there, expanding on the footnote from the comment above. I thought it would be good to capture those ideas while it was fresh, and now I know they're directly responsive to this as well!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, sorry! I'll give it a try with inline labels too and add the form here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that with inline itemsets (not secondary instance), references to form elements work. I tried to add outputs to labels in secondary instances but that didn't work in any of our clients. pyxform always generates the reference in itext anyway so I think #22 is all we need to track!

@eyelidlessness eyelidlessness merged commit d7ecc33 into main Feb 19, 2025
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants