Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split function into constructor/relation/(custom)function; Remove default; Disallow function lookup in the RHS of a rule #461

Open
wants to merge 152 commits into
base: main
Choose a base branch
from

Conversation

FTRobbin
Copy link
Collaborator

@FTRobbin FTRobbin commented Nov 6, 2024

This PR fixes Issue #420. Lookup actions in rules will now cause a type error LookupInRuleDisallowed.

Move specifically, this PR:

  1. Removes -naive flag and related desugaring code due to being replaced by this change.

  2. Fixes 'fail' failing due to not being identified as global in the remove_global rewrite pass.

  3. Adds new positive and negative tests for this type error.

  4. Rewrites the existing tests for compatibility with the new type error.

@FTRobbin FTRobbin requested a review from a team as a code owner November 6, 2024 23:35
@FTRobbin FTRobbin requested review from mwillsey and removed request for a team November 6, 2024 23:35
Copy link

codspeed-hq bot commented Nov 6, 2024

CodSpeed Performance Report

Merging #461 will not alter performance

Comparing haobinni-0904 (9163ac3) with haobinni-0904 (8a75e7e)

Summary

✅ 10 untouched benchmarks
🆕 2 new benchmarks

Benchmarks breakdown

Benchmark haobinni-0904 haobinni-0904 Change
🆕 merge_read N/A 286.8 µs N/A
🆕 set_sort_function N/A 369.6 µs N/A

src/gj.rs Outdated
// for the later atoms, we consider everything
let mut timestamp_ranges =
vec![0..u32::MAX; cq.query.funcs().collect::<Vec<_>>().len()];
if do_seminaive {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still want to keep the -naive flag as well as the code here, so the user can still do naive evaluation (useful for debugging, also have a different semantics than semi-naive for "unsafe" egglog).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that we should probably keep naive evaluation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should keep it. It is unhelpful and adds complexity to the later passes as they need to support the naive semantics correctly. I am also against keeping it as a use-at-your-own-risk feature.

For Egglog users, if you don't use delete, semi-naive and naive are indistinguishable, so it is unhelpful for debugging. If you use delete, then you care much about performance, and there's no point in using naive. Even when you debug with unsafe features, you should probably debug the semi-naive case instead because that's what you want.

For Egglog developers, I see some value in being a sanity check for ensuring semi-naive is implemented correctly. But we are not doing this now, and it can also be done through stronger end-to-end test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was convincing to me. I've never personally used it before
@yihozhang what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What complexity does this add to later passes? I thought the only difference between seminaive and naive is that in the seminaive case, we split the original query into many small queries depending on the timestamps (i.e., what this code snippet does).

I strongly recommend that we keep the naive evaluation. We can view semi-naive as an optimization of the naive evaluation, and this optimization is not always semantic-preserving, when given bizarre programs that violate certain assumptions. Examples include

  • rules that use extract / user-defined primitives
  • rules where the merge function is not associative or idempotent

I'm also not confident that our semi-naive is implemented correctly- do we really update timestamp every time we update the table? I just looked at table.rs and it seems we don't update the timestamp for at least get_mut. The naive evaluation serves as a ground truth for this purpose. Personally, when I am debugging a primitive I wrote, the first thing I do is to disable semi-naive evaluation.

Copy link
Collaborator Author

@FTRobbin FTRobbin Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the naive flag, either we need to split the latter passes into two, which is unlikely, or each piece of downstream code must support both naive and semi-naive. I am skeptical about the claim that semi-naive code would just work for naive. For one thing, I don't see how semi-naive can be implemented as pure syntactic rewrites. As you pointed out, something more needs to happen to the timestamps in the semi-naive case. And yet, the naive flag is not used anywhere else in the codebase.

There are cases where the two give different semantics. However, the naive semantics is not more helpful to the users in those cases because they still need semi-naive to work in the end.

For your last point: Firstly, you still need to debug your new primitive for semi-naive. Secondly, I will only trust naive evaluation as a ground truth if it is well supported with a clear separation between the two semantics. Relying on your program to be tested to produce the test output is a terrible idea to me.

However, I do think this discussion raised a significant concern about the correctness of Egglog. We should investigate the issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: Keep

  • Not comprising the comfort of -naive for a smaller core
  • Too much effort to actually implement -naive, we settle for the timestamp hack
  • Reconsider when merging the new backend

@saulshanabrook
Copy link
Member

I'm a little worried about the time improvements, especially for lambda... That one is so dramatic I worry that maybe the semantics of the example changed?

Seeing all the changes, I also worry about the degradation for UX, it seems just more unwieldy with this change.

I know you said that automated desuguring had some issues, but I am wondering if that could be used to at least addressost of these cases? Where there particular issues with it for some cases or just in general?

//Disallowing Let/Set actions to look up non-constructor functions in rules
for action in head.iter() {
match action {
GenericAction::Let(_, _, Expr::Call(_, symbol, _)) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to check if this is a function vs a constructor call here?

((set (ival lhs) (IntI n n))))
(rule ((= lhs (Node (PureOp (Const (IntT) (const) (Num n)))))
(= nval (IntI n n)))
((set (ival lhs) nval)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntI is a constructor, not a function
So this isn't a lookup and doesn't need to be changed

@FTRobbin FTRobbin changed the title Delete -naive flag and disallow lookup actions in rules Split function into constructor/relation/(custom)function; Remove default; Disallow function lookup in the RHS of a rule Dec 3, 2024
@FTRobbin FTRobbin changed the title Split function into constructor/relation/(custom)function; Remove default; Disallow function lookup in the RHS of a rule Split function into constructor/relation/(custom)function; Remove default; Disallow function lookup in the RHS of a rule Dec 3, 2024
@FTRobbin
Copy link
Collaborator Author

FTRobbin commented Dec 3, 2024

Bumping up this PR again for review:

  • Removed default keyword, resolving Removing :default keyword #421.
  • Reverted the -naive flag change as discussed.
  • Implemented splitting function into three subtypes: constructor/relation/(custom)function, resolving Disallowing looking up non-constructor functions #420 & Renaming function whose output is an E-class to constructor #422.
    • function is not allowed in the RHS of a rule (merge functions are unchecked).
      • function can have eqSort as an output.
        • It does not have union as the default merge function
    • constructor and relation are allowed
      • A constructor expression inserts a new enode
        • It has union as the default merge function
      • A relation expression inserts a new edge
    • A constructor’s output type must be sort
  • Reverted the previous changes to tests and then fixed all the tests again
  • Added new negative and positive tests

Copy link
Collaborator

@yihozhang yihozhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Code is clean, with detailed documentation and good tests.

Let us make a release after this PR is merged.

function.insert(values, value, ts);
value
} else {
return Err(Error::NotFoundError(NotFoundError(format!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably provide a different error message given this PR, since the only case this is possible is when there is a bug in our checker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can still be triggered by merge functions reading a table, e.g.:

(function foo () i64)

(function bar () i64 :merge (foo))

(set (bar) 0)

(fail (set (bar) 1))

/// Now `MathVec` can be used as an input or output sort.
Sort(Span, Symbol, Option<(Symbol, Vec<Expr>)>),

/// Egglog supports three types of functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice documentation!

src/ast/mod.rs Outdated
/// A relation models a datalog-style mathematical relation
/// It can only be defined through the `relation` command
///
/// A custom function is a map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map part of the definition is a bit weird to me, but it's fine for now since we will need a big documentation refactor anyway.

src/ast/mod.rs Outdated
/// ```text
/// (sort MathVec (Vec Math))
/// (Constructor Add (i64 i64) Math)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit Constructor -> constructor

src/ast/mod.rs Outdated
/// ```
///
/// However, this function is not:
/// Specifically, a custom function can also have an EqSort output type:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not an example where a custom function has an EqSort output

@@ -13,7 +13,7 @@
(rule ((End a s)
(= s (getString pos)))
((P 1 pos a)
(union (B 1 pos a) (T a s))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should B be a constructor so that union would still work?

Copy link
Collaborator Author

@FTRobbin FTRobbin Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

union still works for functions whose output is an EqSort. I have added this case to the documentation.

@@ -27,8 +27,8 @@
(let t2p (f (f b2)))
(union t2 t2p)

(union (intersect a1 a2) a3)
(union (intersect b1 b2) b3)
(set (intersect a1 a2) a3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could just be union?

@@ -62,20 +62,20 @@
(function evals-to (Term) Value)

(rule ((= e (Val val)))
((union (evals-to e) val)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. It's an interesting choice to make evals-to a custom function instead of a constructor. This is a new pattern to me, but it seems to work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a new pattern. I'll explain more during the meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants