Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the semantics of exclude-inline-prefixes? #66

Open
ndw opened this issue Jan 19, 2025 · 11 comments
Open

What are the semantics of exclude-inline-prefixes? #66

ndw opened this issue Jan 19, 2025 · 11 comments

Comments

@ndw
Copy link

ndw commented Jan 19, 2025

Consider this pipeline:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:ex="http://example.com/"
                xmlns:fn="http://www.w3.org/2005/xpath-functions"
                version="3.0">
  <p:output port="result" serialization="map{'omit-xml-declaration':true()}"/>

  <p:identity>
    <p:with-input>
      <p:inline exclude-inline-prefixes="ex fn">
        <doc xmlns:test="http://example.com/"
             xmlns:x="http://example.com/"
             xmlns:alt="http://www.w3.org/2005/xpath-functions">
          <x:p/>
          <ex:q/>
          <r xmlns="http://example.com/"/>
        </doc>
      </p:inline>
    </p:with-input>
  </p:identity>

</p:declare-step>

What output does it produce? I think the spec is clear, but I also think that what is specified is sufficiently counter-intuitive that it's perhaps not clear enough. The spec says:

A namespace URI designated by using an exclude-inline-prefixes attribute ... is excluded.

(Where the ellipsis just elides the small amount of faff about how exclude-inline-prefixes attributes are combined if they appear on ancestor elements.)

I think it follows that this is one possible output:

<doc>
  <x:p xmlns:x="http://example.com/"/>
  <x:q xmlns:x="http://example.com/"/>
  <x:r xmlns:x="http://example.com/"/>
</doc>

Critically, the binding for x and alt have been removed because they have the same namespace URI as an excluded prefix. In order to keep the document namespace well-formed, a binding has been re-inserted where it's required. This example uses x in both places, it could equally have used any random (not otherwise in-scope) prefix (including ex) and it could have used different prefixes for p and q.

A more intuitive semantic would be that the binding is excluded. So you'd get something like this:

<doc xmlns:test="http://example.com/"
     xmlns:x="http://example.com/"
     xmlns:alt="http://www.w3.org/2005/xpath-functions">
  <x:p/>
  <_1:q xmlns:_1="http://example.com/"/>
  <r xmlns="http://example.com/"/>
</doc>

(I've used the prefix _1 where a prefix had to be inserted, but equally, I could have used x or test or even the default namespace. Things are a little more complicated for attributes, but I don't think that's relevant here.)

I've implemented what I believe the spec says, so I haven't thought through all of the consequences of the more intuitive semantic.

One consequences, though, would be that exclude-inline-prefixes and p:namespace-delete would have very different semantics. In p:namespace-delete (I believe that) it's really important that what is removed is the namespace URI, regardless of the binding. In a random input document, the pipeline author cannot reasonable be expected to know what (all of) the prefixes are for a particular namespace URI. Nor can the author refer to those prefixes in a straightforward way.

That's not really as important in a pipeline where the pipeline author has control over all the bindings used.

For more perspectives on this issue, see xmlcalabash/xmlcalabash3#201

@xatapult
Copy link
Contributor

For what its worth: I think that what the pipeline should do is somehow keep the namespace of the resulting elements intact. How doesn't matter. Maybe with an invented prefix, maybe with a default namespace declaration.

If you do an exclude-inline-prefixes for prefixes that are in a result, you hand over control to the namespace fixup mechanisms of the processor, implementation defined.

And of course, semantically it doesn't matter how the namespace is declared. It means the same whatever mechanism.

@ndw
Copy link
Author

ndw commented Jan 19, 2025

I'm not sure I understand your answer, @xatapult . In my original question, do you expect the first result, or the second?

I think it's uncontroversial that the resulting XML has to be namespace well-formed and all of the elements and attributes have to be in the correct namespaces.

@gimsieke
Copy link

gimsieke commented Jan 19, 2025

The spec seems to be clear: Output 1 is a possible outcome, output 2 isn’t since xmlns:whatever="http://example.com/" may only be attached to an element during inevitable namespace fixup.

The 1.0 spec already says:

A namespace URI designated by using an exclude-inline-prefixes attribute on the enclosing p:inline is excluded.

This seems to have been copied to the current spec verbatim.

But also in the current spec, in p:input, p:output, p:variable, and p:with-option, we say rather infomally (omitting the word “attribute” by mistake…):

The exclude-inline-prefixes allows the pipeline author to exclude some namespace declarations in inline content, see p:inline.

As an aside: On p:declare-step, the sentence has undergone even less proofreading:

The a description of exclude-inline-prefixes, see p:inline.

So maybe our intent, as expressed in the less formal sentence above, was really to specify that only the namespace declaration(s), not the affected namespace URI(s) altogether, should be omitted in the output. In my view, this also aligns better with what I intuitively expect when seeing an attribute that is called exlude-inline-prefixes rather than, for instance, exclude-inline-namespace-uris.

So output 1 is correct, but output 2 should be correct. Therefore I think we should consider altering the spec, even if the semantics differ from 1.0 and even if existing pipelines are affected.

@xml-project
Copy link
Member

I am sorry, I am totally confused. Can someone help me out?

If the namespace associated with an excluded prefix is used in the expanded-QName of a descendant element or attribute, the processor may include that prefix anyway, or it may generate a new prefix.

This makes me think, that both strategies are possible as they generate a new prefix.
Can someone please help me out, explaining the point in question?

@ndw
Copy link
Author

ndw commented Jan 19, 2025

Consider <y:x xmlns:y="test" y:a="5"/>. If that's an inline and you exclude y, no matter what strategy is applied, the output must have some binding for "test" and it must associate that binding with <x> and @a. Does that help?

@xml-project
Copy link
Member

@ndw Thanks, but both of your original examples satisfied this criterion too, don't they? As well as simply keeping "ex". So I am still confused about the discussion.

@ndw
Copy link
Author

ndw commented Jan 19, 2025

Sorry. Not sure where I made things confusing. I don't think the question of keeping the final document namespace well-formed is really at issue. That's not negotiable. The possible distinction is, what's excluded, the specific binding of a namespace URI or all bindings to that URI.

@xml-project
Copy link
Member

@ndw wrote:

The possible distinction is, what's excluded, the specific binding of a namespace URI or all bindings to that URI.

Ah, ok. Now I got it. Thanks!

@xatapult
Copy link
Contributor

I'm not sure I understand your answer, @xatapult . In my original question, do you expect the first result, or the second?

I think it's uncontroversial that the resulting XML has to be namespace well-formed and all of the elements and attributes have to be in the correct namespaces.

Well, what I mean is: If you're so stupid to exclude namespace prefixes that you'll actually need, your at the mercy of the implementation to fix things. So any outcome you proposed would be ok for me, as long as the XML semantics don't change. However, that's not the direction this discussion is going. Bottom line for me: I don't mind.

@ndw
Copy link
Author

ndw commented Jan 31, 2025

Gerrit proposes that we change the semantics to remove the prefix (and any bindings for the prefix).

If there are multiple bindings for the same prefix, they all go.

@ndw
Copy link
Author

ndw commented Feb 27, 2025

In email, Achim made the important point that the current semantics are the same as the semantics for exclude-result-prefixes in XSLT. (Because that was the model we used when we added it to XProc 1.0)

I have very cold feet about making a backwards incompatible change here.

Achim also observes that we could keep this behavior and add a new feature to provide "remove this binding"

I propose we make no change in 3.1 and move this to the Vnext repo.

@ndw ndw transferred this issue from xproc/3.0-specification Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants