Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom contracts to customize the label #2176

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

jneem
Copy link
Member

@jneem jneem commented Feb 21, 2025

This adds an additional (optional) label field to the functions that define validators and custom contracts. When applying a contract, if that contract returns a label along with the error data, the returned label will be blamed instead of the label that was passed to the contract.

For example, here is a custom array contract that blames the element location if it's a boolean but blames the array as a whole if it finds a string:

let Contract =
  std.contract.custom (fun lbl value =>
    if !std.is_array value then
      'Error { message = "expected an array" }
    else
      value
      |> std.array.fold_left
        (fun acc x =>
          if std.is_string x then
            'Error { message = "nah", label = lbl }
          else
            std.contract.check Number lbl x |> match { 'Ok _v => acc, 'Error e => 'Error e }
        )
        'Ok
      |> match {
        'Ok => 'Ok value,
        'Error e => 'Error e,
      }
  )
in
[1, 2, 3] | Contract

This gives

[1, 2, true] | Contract

=>

error: contract broken by a value
...
21 │ [1, 2, true ] | Contract
   │        ----     ------ expected type
   │        │
   │        evaluated to this expression

but

[1, 2, "hi" ] | Contract
error: contract broken by a value
       nah
...
21 │ [1, 2, "hi"] | Contract
   │ ^^^^^^^^^^^^   ------ expected type
   │ │
   │ applied to this expression

Copy link
Contributor

github-actions bot commented Feb 21, 2025

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the following reservations with the current approach:

  • In the example, why is the label explicitly included for the std.is_string check but not for the std.is_array check? Seems like both should blame the whole contract. If I understand correctly, with the proposed solution, not including a label will always automatically blame the parent contract, which is a strange choice to make, as a child.
  • I think having to change all leafs contracts (here Number, String, etc.) is costly. Now all leaf contracts need to explicitly propagate their label. It's probably very easy to forget for a custom contract implementer, especially given that label is optional, so this won't be caught by a runtime error.

From what I recall, contract.apply takes the following decision: if you call a child contract, then the blame will point to the child's contract argument, basically. Couldn't we make check do something similar? With the following reasoning:

  • blaming the child contract is the default behavior you want (and you don't want to choose your behavior as a child in the place of your parent contract). The fact that all standard contracts are changed without distinction to include their label seems to corroborate that this sounds like a default behavior. So maybe we should make check just embed the label in the returned error automatically, unless there is one already.
  • If we go with this, a parent contract can't decide how a child may blame. But I think it makes sense: the child should know better. However, the parent can decide to short-circuit the child contract if it wants to blame something else, which is an exceptional case (you apply a child contract, but you still blame the whole contract).

I think with your example, that would give the same result without having do to anything special: you don't need to include the label when blaming yourself as a whole (it's the default behavior), and check would automatically propagate the child label. We wouldn't have to change a single custom contract of the stdlib, I think. This would be a change in behavior for existing code, but having thought more about the weekly's discussion 1. it's not a true backward incompatibility, it's just different error messages for contract failures 2. It seems to be the right default choice.

A last topic is documentation. I fear things start to become complicated: we have immediate contracts (predicate and validator), and general custom contracts, that can have an immediate part and a delayed part. At least, up to now, there is a simple conceptual distinction: we use ADTs for direct error reporting, which makes sense (à la Rust, OCaml, Haskell, etc.), and a label object for delayed throwing, which is a different mechanism. Although it's already a bit confusing that a label and the error data of a validator are quite similar; in fact the latter can bee seen as a Nickel projection of a subset of the label. Still, we need to use getters/setters on labels which are opaque, while error data are vanilla record. Well, such is life.

Now, we can embed a label within an ADT for immediate error reporting. This label itself can have notes and error messages inside, which is a bit puzzling. What happens if I do 'Error { message = "hello", label = std.contract.label.with_message "world" label}? I'm sure we can find a reasonable technical answer to this, but is it something that makes sense and that we should allow? If yes, how are we going to explain that in the docs?

The solution could be as simple as not calling that a label (even if it is under the hood - that's an implementation detail), but something else, like blame_location (because I believe in most of those cases we don't care about the error message and notes, which should be conveyed by the error data ADT, but rather about the location information in the label). Maybe there are other ideas to better articulate label and error messages that would reduce the friction?

@jneem
Copy link
Member Author

jneem commented Feb 24, 2025

Good points.

In the example, why is the label explicitly included for the std.is_string check

An oversight -- you can leave it out and the behavior is the same.

if you call a child contract, then the blame will point to the child's contract argument, basically. Couldn't we make check do something similar?

I think that would make sense, but I had some trouble with the implementation. The problem is that in std.contract.check the label parameter actually points to the parent location. It's only in the implementation of ContractCheck in eval/operation.rs that the label is modified to point to the child. What I've done for now is to modify std.contract.custom so that it puts the child's location as the default. I think this should give the right default behavior for user-defined contracts, but it requires us to add locations to all the internal leaf contracts. Do you know of a better way?

@yannham
Copy link
Member

yannham commented Feb 24, 2025

Indeed %contract/check% is the one augmenting the label with the right position information for the argument. One possibility would be to try to decompose/factor out the marking operation and the actual application, but I think while this is possible, this will prove non trivial - we'll probably need to add another other ad-hoc application operator in the evaluator or something.

Another solution is to make %contract/check% handle that directly on the Rust side. It can seem hard at first but you don't need to actually write AST-encoded Nickel code in Rust: you can factor out stuff in the internals module and then just set up the stack to call the corresponding helper. In fact, %contract/apply% does precisely that when its argument is a custom contract (as opposed to a naked function):

// If we're evaluating a plain contract application but we are applying
// something with the signature of a custom contract, we need to setup some
// post-processing.
//
// We prepare the stack so that `contract/postprocess_result` will be
// applied afteward. This primop converts the result of a custom contract
// `'Ok value` or `'Error err_data` to either `value` or a proper contract
// error with `err_data` included in the label.
//
// That is, prepare the stack to represent the evaluation context
// `%contract/postprocess_result% [.] label`
if matches!(
(&*t1, &b_op),
(
Term::CustomContract(_) | Term::Record(_),
BinaryOp::ContractApply
)
) {
self.stack
.push_arg(Closure::atomic_closure(new_label.clone()), pos_op_inh);
self.stack.push_op_cont(
OperationCont::Op1(
UnaryOp::ContractPostprocessResult,
pos1.into_inherited(),
),
self.call_stack.len(),
pos_op_inh,
);
}

We could do the same thing, giving the label and the result to the post-processor (which would basically be what you added to std.contract.custom).

Another possible benefit of doing that in %contract/check% instead of std.contract.custom is that only calls to check need to pay the price. In the current approach, each and every custom contract embeds its label even if there is no parent contract (i.e. if it's applied directly), and there's a possible post-processing match operation as well for user-defined contracts. I'll benchmark both solutions but I think it might have a performance impact: on the largest bench, I remember that seemingly small details in the hot path of contract application (hot meaning the standard path that was taken for each and every contract app) could have an important cumulated impact.

@@ -102,8 +102,8 @@ pub mod internals {

generate_accessor!(stdlib_contract_equal);

generate_accessor!(prepare_custom_contract);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this doesn't exist anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's possible. Probably the first version that was quite inefficient.

@jneem
Copy link
Member Author

jneem commented Feb 25, 2025

Thanks for the hint, I went for pushing an extra op after %contract/check% and I think it's better now.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but it's missing a documentation update. I'm setting "request changes" mostly as a self remainder of running the benchmarks to see if this effects on real use cases.

@@ -1708,7 +1712,8 @@
'Ok Dyn,
'Error {
message | String | optional,
notes | Array String | optional
notes | Array String | optional,
blame_location | Dyn | optional,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One detail: we don't have to make this optional, because in fact check ensures that blame_location is always set. On the other hand, this would make the contract from check and the return value of custom to mismatch, which might be annoying for typed code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the annoyance with typed code was why I left it optional, especially since I'd expect it to be common in custom contracts to just return the result of a std.contract.check. I think at some point we discussed subtyping for records with optional fields? If we had that, this would be nicer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, let's leave it optional 👍 . It's not breaking to change an optional field to required in the return type, so we can wait for a better situation around those kind of things.

@@ -102,8 +102,8 @@ pub mod internals {

generate_accessor!(stdlib_contract_equal);

generate_accessor!(prepare_custom_contract);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's possible. Probably the first version that was quite inefficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants