Allow custom contracts to customize the label #2176

jneem · 2025-02-21T08:50:38Z

This adds an additional (optional) label field to the functions that define validators and custom contracts. When applying a contract, if that contract returns a label along with the error data, the returned label will be blamed instead of the label that was passed to the contract.

For example, here is a custom array contract that blames the element location if it's a boolean but blames the array as a whole if it finds a string:

let Contract =
  std.contract.custom (fun lbl value =>
    if !std.is_array value then
      'Error { message = "expected an array" }
    else
      value
      |> std.array.fold_left
        (fun acc x =>
          if std.is_string x then
            'Error { message = "nah", label = lbl }
          else
            std.contract.check Number lbl x |> match { 'Ok _v => acc, 'Error e => 'Error e }
        )
        'Ok
      |> match {
        'Ok => 'Ok value,
        'Error e => 'Error e,
      }
  )
in
[1, 2, 3] | Contract

This gives

[1, 2, true] | Contract

=>

error: contract broken by a value
...
21 │ [1, 2, true ] | Contract
   │        ----     ------ expected type
   │        │
   │        evaluated to this expression

but

[1, 2, "hi" ] | Contract
error: contract broken by a value
       nah
...
21 │ [1, 2, "hi"] | Contract
   │ ^^^^^^^^^^^^   ------ expected type
   │ │
   │ applied to this expression

github-actions · 2025-02-21T09:07:35Z

Bencher Report

Branch	thread-label
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	microseconds (µs)
fibonacci 10	📈 view plot 🚷 view threshold	386.46 µs
foldl arrays 50	📈 view plot 🚷 view threshold	1,575.60 µs
foldl arrays 500	📈 view plot 🚷 view threshold	5,778.00 µs
foldr strings 50	📈 view plot 🚷 view threshold	6,635.90 µs
foldr strings 500	📈 view plot 🚷 view threshold	57,457.00 µs
generate normal 250	📈 view plot 🚷 view threshold	45,176.00 µs
generate normal 50	📈 view plot 🚷 view threshold	1,808.50 µs
generate normal unchecked 1000	📈 view plot 🚷 view threshold	2,726.40 µs
generate normal unchecked 200	📈 view plot 🚷 view threshold	641.23 µs
pidigits 100	📈 view plot 🚷 view threshold	2,736.10 µs
pipe normal 20	📈 view plot 🚷 view threshold	1,295.60 µs
pipe normal 200	📈 view plot 🚷 view threshold	8,298.30 µs
product 30	📈 view plot 🚷 view threshold	744.27 µs
scalar 10	📈 view plot 🚷 view threshold	1,295.80 µs
sum 30	📈 view plot 🚷 view threshold	734.96 µs

🐰 View full continuous benchmarking report in Bencher

yannham

I have the following reservations with the current approach:

In the example, why is the label explicitly included for the std.is_string check but not for the std.is_array check? Seems like both should blame the whole contract. If I understand correctly, with the proposed solution, not including a label will always automatically blame the parent contract, which is a strange choice to make, as a child.
I think having to change all leafs contracts (here Number, String, etc.) is costly. Now all leaf contracts need to explicitly propagate their label. It's probably very easy to forget for a custom contract implementer, especially given that label is optional, so this won't be caught by a runtime error.

From what I recall, contract.apply takes the following decision: if you call a child contract, then the blame will point to the child's contract argument, basically. Couldn't we make check do something similar? With the following reasoning:

blaming the child contract is the default behavior you want (and you don't want to choose your behavior as a child in the place of your parent contract). The fact that all standard contracts are changed without distinction to include their label seems to corroborate that this sounds like a default behavior. So maybe we should make check just embed the label in the returned error automatically, unless there is one already.
If we go with this, a parent contract can't decide how a child may blame. But I think it makes sense: the child should know better. However, the parent can decide to short-circuit the child contract if it wants to blame something else, which is an exceptional case (you apply a child contract, but you still blame the whole contract).

I think with your example, that would give the same result without having do to anything special: you don't need to include the label when blaming yourself as a whole (it's the default behavior), and check would automatically propagate the child label. We wouldn't have to change a single custom contract of the stdlib, I think. This would be a change in behavior for existing code, but having thought more about the weekly's discussion 1. it's not a true backward incompatibility, it's just different error messages for contract failures 2. It seems to be the right default choice.

A last topic is documentation. I fear things start to become complicated: we have immediate contracts (predicate and validator), and general custom contracts, that can have an immediate part and a delayed part. At least, up to now, there is a simple conceptual distinction: we use ADTs for direct error reporting, which makes sense (à la Rust, OCaml, Haskell, etc.), and a label object for delayed throwing, which is a different mechanism. Although it's already a bit confusing that a label and the error data of a validator are quite similar; in fact the latter can bee seen as a Nickel projection of a subset of the label. Still, we need to use getters/setters on labels which are opaque, while error data are vanilla record. Well, such is life.

Now, we can embed a label within an ADT for immediate error reporting. This label itself can have notes and error messages inside, which is a bit puzzling. What happens if I do 'Error { message = "hello", label = std.contract.label.with_message "world" label}? I'm sure we can find a reasonable technical answer to this, but is it something that makes sense and that we should allow? If yes, how are we going to explain that in the docs?

The solution could be as simple as not calling that a label (even if it is under the hood - that's an implementation detail), but something else, like blame_location (because I believe in most of those cases we don't care about the error message and notes, which should be conveyed by the error data ADT, but rather about the location information in the label). Maybe there are other ideas to better articulate label and error messages that would reduce the friction?

jneem · 2025-02-24T04:38:57Z

Good points.

In the example, why is the label explicitly included for the std.is_string check

An oversight -- you can leave it out and the behavior is the same.

if you call a child contract, then the blame will point to the child's contract argument, basically. Couldn't we make check do something similar?

I think that would make sense, but I had some trouble with the implementation. The problem is that in std.contract.check the label parameter actually points to the parent location. It's only in the implementation of ContractCheck in eval/operation.rs that the label is modified to point to the child. What I've done for now is to modify std.contract.custom so that it puts the child's location as the default. I think this should give the right default behavior for user-defined contracts, but it requires us to add locations to all the internal leaf contracts. Do you know of a better way?

yannham · 2025-02-24T10:13:21Z

Indeed %contract/check% is the one augmenting the label with the right position information for the argument. One possibility would be to try to decompose/factor out the marking operation and the actual application, but I think while this is possible, this will prove non trivial - we'll probably need to add another other ad-hoc application operator in the evaluator or something.

Another solution is to make %contract/check% handle that directly on the Rust side. It can seem hard at first but you don't need to actually write AST-encoded Nickel code in Rust: you can factor out stuff in the internals module and then just set up the stack to call the corresponding helper. In fact, %contract/apply% does precisely that when its argument is a custom contract (as opposed to a naked function):

nickel/core/src/eval/operation.rs

Lines 1899 to 1928 in 0733850

    
           // If we're evaluating a plain contract application but we are applying 
        
           // something with the signature of a custom contract, we need to setup some 
        
           // post-processing. 
        
           // 
        
           // We prepare the stack so that `contract/postprocess_result` will be 
        
           // applied afteward. This primop converts the result of a custom contract 
        
           // `'Ok value` or `'Error err_data` to either `value` or a proper contract 
        
           // error with `err_data` included in the label. 
        
           // 
        
           // That is, prepare the stack to represent the evaluation context 
        
           // `%contract/postprocess_result% [.] label` 
        
           if matches!( 
        
               (&*t1, &b_op), 
        
               ( 
        
                   Term::CustomContract(_) | Term::Record(_), 
        
                   BinaryOp::ContractApply 
        
               ) 
        
           ) { 
        
               self.stack 
        
                   .push_arg(Closure::atomic_closure(new_label.clone()), pos_op_inh); 
        
               self.stack.push_op_cont( 
        
                   OperationCont::Op1( 
        
                       UnaryOp::ContractPostprocessResult, 
        
                       pos1.into_inherited(), 
        
                   ), 
        
                   self.call_stack.len(), 
        
                   pos_op_inh, 
        
               ); 
        
           }

We could do the same thing, giving the label and the result to the post-processor (which would basically be what you added to std.contract.custom).

Another possible benefit of doing that in %contract/check% instead of std.contract.custom is that only calls to check need to pay the price. In the current approach, each and every custom contract embeds its label even if there is no parent contract (i.e. if it's applied directly), and there's a possible post-processing match operation as well for user-defined contracts. I'll benchmark both solutions but I think it might have a performance impact: on the largest bench, I remember that seemingly small details in the hot path of contract application (hot meaning the standard path that was taken for each and every contract app) could have an important cumulated impact.

jneem · 2025-02-25T07:20:38Z

core/src/stdlib.rs

@@ -102,8 +102,8 @@ pub mod internals {

    generate_accessor!(stdlib_contract_equal);

-    generate_accessor!(prepare_custom_contract);


It seems like this doesn't exist anymore?

Ah, it's possible. Probably the first version that was quite inefficient.

jneem · 2025-02-25T07:24:26Z

Thanks for the hint, I went for pushing an extra op after %contract/check% and I think it's better now.

yannham

LGTM, but it's missing a documentation update. I'm setting "request changes" mostly as a self remainder of running the benchmarks to see if this effects on real use cases.

yannham · 2025-02-25T09:43:19Z

core/stdlib/std.ncl

@@ -1708,7 +1712,8 @@
        'Ok Dyn,
        'Error {
          message | String | optional,
-          notes | Array String | optional
+          notes | Array String | optional,
+          blame_location | Dyn | optional,


One detail: we don't have to make this optional, because in fact check ensures that blame_location is always set. On the other hand, this would make the contract from check and the return value of custom to mismatch, which might be annoying for typed code.

Right, the annoyance with typed code was why I left it optional, especially since I'd expect it to be common in custom contracts to just return the result of a std.contract.check. I think at some point we discussed subtyping for records with optional fields? If we had that, this would be nicer...

In any case, let's leave it optional 👍 . It's not breaking to change an optional field to required in the return type, so we can wait for a better situation around those kind of things.

yannham · 2025-02-25T09:44:27Z

core/src/stdlib.rs

@@ -102,8 +102,8 @@ pub mod internals {

    generate_accessor!(stdlib_contract_equal);

-    generate_accessor!(prepare_custom_contract);


Ah, it's possible. Probably the first version that was quite inefficient.

core/src/eval/operation.rs

Co-authored-by: Yann Hamdaoui <[email protected]>

doc/manual/contracts.md

Co-authored-by: Yann Hamdaoui <[email protected]>

jneem added 3 commits February 21, 2025 14:45

Thread the label through contract checks.

18c68e9

Add labels to built-in contracts

6ce95e3

Add a test

5d814b0

yannham reviewed Feb 21, 2025

View reviewed changes

checkpoint

b6435cd

jneem added 4 commits February 25, 2025 14:09

Move the default label attachment to internals

c72dbc9

Revert some formatting changes

579e683

Undo some stdlib changes

2a4ab72

Commas

b80e32b

jneem commented Feb 25, 2025

View reviewed changes

yannham requested changes Feb 25, 2025

View reviewed changes

jneem and others added 2 commits February 25, 2025 17:13

Update core/src/eval/operation.rs

e8b5b4f

Co-authored-by: Yann Hamdaoui <[email protected]>

Add section on blame_location to the manual

c2c1745

yannham reviewed Feb 25, 2025

View reviewed changes

doc/manual/contracts.md Outdated Show resolved Hide resolved

doc/manual/contracts.md Outdated Show resolved Hide resolved

Apply suggestions from code review

a4baaa2

Co-authored-by: Yann Hamdaoui <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom contracts to customize the label #2176

Allow custom contracts to customize the label #2176

jneem commented Feb 21, 2025

github-actions bot commented Feb 21, 2025 •

edited

Loading

yannham left a comment •

edited

Loading

jneem commented Feb 24, 2025

yannham commented Feb 24, 2025

jneem Feb 25, 2025

yannham Feb 25, 2025

jneem commented Feb 25, 2025

yannham left a comment

yannham Feb 25, 2025

jneem Feb 25, 2025

yannham Feb 25, 2025

yannham Feb 25, 2025

		@@ -102,8 +102,8 @@ pub mod internals {

		generate_accessor!(stdlib_contract_equal);

		generate_accessor!(prepare_custom_contract);

Allow custom contracts to customize the label #2176

Are you sure you want to change the base?

Allow custom contracts to customize the label #2176

Conversation

jneem commented Feb 21, 2025

github-actions bot commented Feb 21, 2025 • edited Loading

Bencher Report

yannham left a comment • edited Loading

Choose a reason for hiding this comment

jneem commented Feb 24, 2025

yannham commented Feb 24, 2025

jneem Feb 25, 2025

Choose a reason for hiding this comment

yannham Feb 25, 2025

Choose a reason for hiding this comment

jneem commented Feb 25, 2025

yannham left a comment

Choose a reason for hiding this comment

yannham Feb 25, 2025

Choose a reason for hiding this comment

jneem Feb 25, 2025

Choose a reason for hiding this comment

yannham Feb 25, 2025

Choose a reason for hiding this comment

yannham Feb 25, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 21, 2025 •

edited

Loading

yannham left a comment •

edited

Loading