-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add proto converters and visitor, both incomplete atm #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
namespace Substrait.Expression | ||
{ | ||
/// <summary> | ||
/// The FIELD_REFERENCE relational expression, <see cref="Protobuf.Expression.Types.FieldReference"/> | ||
/// </summary> | ||
public class FieldReference : RelExpression | ||
{ | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
using Substrait.Relation; | ||
|
||
namespace Substrait.Expression | ||
{ | ||
/// <summary> | ||
/// Base type for all relational expressions, <see cref="Protobuf.Expression"/> | ||
/// </summary> | ||
abstract public class RelExpression | ||
{ | ||
public virtual void Accept(SubstraitRelVisitor visitor) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be abstract? Also, is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on separation of visitor for relations and expressions (Java implementation can serve as a model). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, separation would be more meaningful. If it is ok with you both, I'd like to fix this in a subsequent PR. |
||
{ | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
using Substrait.Expression; | ||
using Substrait.Protobuf; | ||
using Substrait.Relation; | ||
using System.Collections.Immutable; | ||
using ProtoRel = Substrait.Protobuf.Rel; | ||
using ProtoRex = Substrait.Protobuf.Expression; | ||
using Rel = Substrait.Relation.Rel; | ||
|
||
namespace Substrait.Core | ||
{ | ||
/// <summary> | ||
/// Utility to convert Substrait protobuf messages to/from charp domain-objects. | ||
/// </summary> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please more details on the motivation to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One very useful feature is that
This can be used, for example, to do a depth-first traversal of the relations tree. However, in the protobuf objects, there is no such property. For example, a join relation has There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. |
||
public class ProtoConverter | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One possible recommendation for the Said that, it might still be fine to push this PR since we are in very early stages of the library development (I'd like to know what @westonpace thinks too) but maybe add some comments in the files with a TODO/warning note so there is a clear understanding that translation has some significant gaps (as it is already pointed out in the PR title)? Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a comment, or even throwing a I've found that is quite quick to get to 80% coverage and then the last 20% ends up being quite tricky. For example, user defined types and extension relations will probably require more thought. Going operator by operator can be challenging because, for example, a project relation needs an expression which could be anything (and, as mentioned, user defined type expressions might come quite a bit later). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was not implying that conversion should be complete (maybe I did not express that clearly enough); as you mention, that final 20% is usually hard and would slow down the whole development. I was thinking more about making translation less arbitrary and support boundaries a bit more clear, which you also referred to. But it's not a very strong opinion, overall I'm happy that we continue making progress one way or another. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jcamachor, I see value in your suggestion, particularly from tracking and reviews. Anyone who checks out this commit would know that what features will work. I started it like that with just @westonpace, as you said my intention is to gather early feedback and let others jump in if feasible. Thanks both ! |
||
{ | ||
/// <summary> | ||
/// Create domain object from a protobuf encoded message | ||
/// </summary> | ||
/// <param name="protoRel">Relation encoded as protobuf message</param> | ||
/// <returns><see cref="Rel"/> corresponding to message's relation</returns> | ||
public Rel ToRel(ProtoRel protoRel) | ||
{ | ||
var relType = protoRel.RelTypeCase; | ||
|
||
return relType switch | ||
{ | ||
ProtoRel.RelTypeOneofCase.Aggregate => CreateAgg(protoRel.Aggregate), | ||
ProtoRel.RelTypeOneofCase.Cross => CreateCross(protoRel.Cross), | ||
ProtoRel.RelTypeOneofCase.Fetch => CreateFetch(protoRel.Fetch), | ||
ProtoRel.RelTypeOneofCase.Filter => CreateFilter(protoRel.Filter), | ||
ProtoRel.RelTypeOneofCase.Join => CreateJoin(protoRel.Join), | ||
ProtoRel.RelTypeOneofCase.Project => CreateProject(protoRel.Project), | ||
ProtoRel.RelTypeOneofCase.Read => CreateRead(protoRel.Read), | ||
ProtoRel.RelTypeOneofCase.Sort => CreateSort(protoRel.Sort), | ||
_ => throw new NotImplementedException(relType.ToString()), | ||
}; | ||
} | ||
|
||
/// <summary> | ||
/// Create domain object specific to relational expression type from protobuf message | ||
/// </summary> | ||
/// <param name="protoRex">Relational expression encoded as protobuf message</param> | ||
/// <returns><see cref="Rel"/> corresponding to message's expression type</returns> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This returns a |
||
private RelExpression ToExpression(ProtoRex protoRex) | ||
{ | ||
var rexType = protoRex.RexTypeCase; | ||
|
||
return rexType switch | ||
{ | ||
ProtoRex.RexTypeOneofCase.Selection => CreateFieldReference(protoRex.Selection), | ||
ProtoRex.RexTypeOneofCase.None => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.Literal => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.ScalarFunction => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.WindowFunction => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.IfThen => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.SwitchExpression => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.SingularOrList => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.MultiOrList => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.Cast => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.Subquery => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.Nested => throw new NotImplementedException(), | ||
ProtoRex.RexTypeOneofCase.Enum => throw new NotImplementedException(), | ||
_ => throw new NotImplementedException(rexType.ToString()), | ||
}; | ||
} | ||
|
||
private Rel CreateAgg(AggregateRel agg) | ||
{ | ||
var input = ToRel(agg.Input); | ||
var inputs = ImmutableList.Create(input); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems there are important parts missing in the translation, e.g., grouping columns or aggregates. Same comment for other operators below (e.g., expressions for join, filter, or project, different type of read rels, etc.) |
||
return new Aggregate() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateCross(CrossRel cross) | ||
{ | ||
var left = ToRel(cross.Left); | ||
var right = ToRel(cross.Right); | ||
var inputs = ImmutableList.Create(left, right); | ||
return new Cross() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateFetch(FetchRel fetch) | ||
{ | ||
var input = ToRel(fetch.Input); | ||
var inputs = ImmutableList.Create(input); | ||
return new Fetch() | ||
{ | ||
Count = fetch.Count, | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateFilter(FilterRel filter) | ||
{ | ||
var input = ToRel(filter.Input); | ||
var inputs = ImmutableList.Create(input); | ||
return new Filter() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateJoin(JoinRel join) | ||
{ | ||
var left = ToRel(join.Left); | ||
var right = ToRel(join.Right); | ||
var _ = ToExpression(join.Expression); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems a touch arbitrary (e.g. why grab join expression and not the join filter? why not grab a filter's filter expression?) but there's probably no reason to get hung up on it. |
||
var inputs = ImmutableList.Create(left, right); | ||
return new Join() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateProject(ProjectRel project) | ||
{ | ||
var input = ToRel(project.Input); | ||
var inputs = ImmutableList.Create(input); | ||
return new Project() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private Rel CreateRead(ReadRel read) | ||
{ | ||
return new Read() | ||
{ | ||
Inputs = ImmutableList<Rel>.Empty, | ||
}; | ||
} | ||
|
||
private Rel CreateSort(SortRel sort) | ||
{ | ||
var input = ToRel(sort.Input); | ||
var inputs = ImmutableList.Create(input); | ||
return new Sort() | ||
{ | ||
Inputs = inputs, | ||
}; | ||
} | ||
|
||
private RelExpression CreateFieldReference(ProtoRex.Types.FieldReference field) | ||
{ | ||
return new FieldReference(); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
namespace Substrait.Relation | ||
{ | ||
/// <summary> | ||
/// The CROSS relational operator representing cartesian product, <see cref="Protobuf.CrossRel"/> | ||
/// </summary> | ||
public class Cross : Rel | ||
{ | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,5 +5,6 @@ | |
/// </summary> | ||
public class Fetch : Rel | ||
{ | ||
public long Count { get; init; } | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,65 @@ | ||
namespace Substrait.Relation | ||
using Substrait.Expression; | ||
using Substrait.Protobuf; | ||
|
||
namespace Substrait.Relation | ||
{ | ||
/// <summary> | ||
/// The binary JOIN relational operator, <see cref="Protobuf.JoinRel"/> | ||
/// </summary> | ||
public class Join : Rel | ||
{ | ||
public RelExpression? Condition { get; init; } | ||
public RelExpression? PostJoinFilter { get; init; } | ||
public JoinType JoinType { get; init; } = JoinType.UNKNOWN; | ||
|
||
public override void Accept(SubstraitRelVisitor visitor) | ||
{ | ||
base.Accept(visitor); | ||
Condition?.Accept(visitor); | ||
PostJoinFilter?.Accept(visitor); | ||
} | ||
} | ||
|
||
public enum JoinType | ||
{ | ||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Anti"/> | ||
/// </summary> | ||
ANTI, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Inner"/> | ||
/// </summary> | ||
INNER, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Left"/> | ||
/// </summary> | ||
LEFT, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Outer"/> | ||
/// </summary> | ||
OUTER, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Right"/> | ||
/// </summary> | ||
RIGHT, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Semi"/> | ||
/// </summary> | ||
SEMI, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Single"/> | ||
/// </summary> | ||
SINGLE, | ||
|
||
/// <summary> | ||
/// <see cref="JoinRel.Types.JoinType.Unspecified"/> | ||
/// </summary> | ||
UNKNOWN | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
using Substrait.Relation; | ||
|
||
namespace Substrait.Core | ||
namespace Substrait.Relation | ||
{ | ||
/// <summary> | ||
/// Visitor to transform, compile, and/or process SQL logical operators represented using Substrait. The visitor has | ||
|
@@ -99,7 +97,7 @@ public void Visit(Sort sort) | |
Fallback(sort); | ||
} | ||
|
||
public void Fallback(Rel _) | ||
protected void Fallback(object _) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made it type independent since |
||
{ | ||
throw new InvalidOperationException(); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
using Substrait.Core; | ||
using Substrait.Relation; | ||
using ProtoRel = Substrait.Protobuf.Rel; | ||
|
||
namespace Substrait.Lineage | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm...I'm maybe a little confused here. What is the intent of this project? Isn't lineage an implicit property of It seems a bit odd that "grabbing lineage for one relation" is a |
||
{ | ||
/// <summary> | ||
/// A concrete implementation of <see cref="SubstraitRelVisitor"/> for processing Substrait relations and extracting | ||
/// lineage information. | ||
/// </summary> | ||
public class LineageExtractorRelVisitor : SubstraitRelVisitor | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this class really needed? I would think the normal behavior for a system would be to do something like:
In other words, I would expect to convert at the boundaries, then visiting the converted plan as needed. Instead this appears to convert each time I need to visit something. |
||
{ | ||
public void Execute(ProtoRel protoRel) | ||
{ | ||
ProtoConverter converter = new(); | ||
var rel = converter.ToRel(protoRel); | ||
rel.Accept(this); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<TargetFramework>net6.0</TargetFramework> | ||
<ImplicitUsings>enable</ImplicitUsings> | ||
<Nullable>enable</Nullable> | ||
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<ProjectReference Include="..\Substrait.Core\Substrait.Core.csproj" /> | ||
</ItemGroup> | ||
|
||
</Project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
using Substrait.Core; | ||
using Substrait.Protobuf; | ||
|
||
namespace Substrait.Relation.Tests | ||
{ | ||
[TestClass()] | ||
public class ProtoConverterTests | ||
{ | ||
[TestMethod()] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: |
||
public void ToRel_FetchRel_Test() | ||
{ | ||
FetchRel fetchRel = new FetchRel(); | ||
fetchRel.Count = 1; // limit 1 | ||
var rel = new Protobuf.Rel() | ||
{ | ||
Fetch = fetchRel | ||
}; | ||
|
||
fetchRel.Input = rel; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test fails with stack overflow due to this. |
||
|
||
ProtoConverter converter = new(); | ||
var result = converter.ToRel(rel) as Fetch; | ||
Assert.IsNotNull(result); | ||
Assert.AreEqual(1, result.Count); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
RelExpression
and not justExpression
? An expression is not an instance or a specialization of aRel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for the confusion. I called it so to mean
relational expression
. I didn't useExpression
orRex
because it was hurting code readability. Like my earlier comment, I can fix it later next week or in a subsequent PR if this one is merged. Thanks !