-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IKKBZ Join ordering #1330
base: master
Are you sure you want to change the base?
IKKBZ Join ordering #1330
Conversation
/cc @joka921 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1330 +/- ##
==========================================
+ Coverage 89.07% 89.08% +0.01%
==========================================
Files 369 380 +11
Lines 34251 34739 +488
Branches 3870 3979 +109
==========================================
+ Hits 30509 30948 +439
- Misses 2480 2494 +14
- Partials 1262 1297 +35 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, to improve this design I would suggest the following first step:
- Get the combining of nodes right using the follwing design:
class Relation {
// ID, selectivity, cardinality
};
// created by the NormalizeStep.
class CombinedRelations {
std::vector<Relation>; // contained relations.
double rank() ; // compute the rank.
}
// Called for two consecutive elements in a chain where the rank is out of order
CombinedRelations combine(const CombinedRelations& first, const CombinedRelations& second);
// Run IKKBZ-Normalize on a single chain until it is fully normalized.
void IkkbzNormalize(std::vector<CombinedRelations>& chain);
// Givien the above structure, this is now simple, just merge by rank.
std::vector<CombinedRelations> IkkbzMerge(std::vector<std::vector<CombinedRelations>>);
Maybe start this design in a separate PR, and then star with unit tests for those rather small functions.
Then in a much simpler IKKBZ graph, you can incorporate those structures, but maybe let me know first,
so that we can talk about the next steps.
src/engine/joinOrdering/QueryGraph.h
Outdated
std::map<N, std::map<N, RJoin>> r; | ||
std::map<N, std::vector<N>> hist; | ||
std::map<N, int> cardinality; | ||
std::map<N, float> selectivity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much cleaner if
- The RJoin class is called
EdgeInfo
. - the
r
is callededges_
and storesstd::pair<N, EdgeInfo
>`. - Use
std:unordered_map
or betterad_utility::HashMap
instead of std::map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You currently have many copies of the N
object. Maybe just store all the relations in one place, and then
only pass around references or pointers (but that can be left open for a later optimization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the full type signature of edges_
?
// Mahmoud Khalaf (2024-, [email protected]) | ||
|
||
#pragma once | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, to improve this design I would suggest the following first step:
- Get the combining of nodes right using the follwing design:
class Relation {
// ID, selectivity, cardinality
};
// created by the NormalizeStep.
class CombinedRelations {
std::vector<Relation>; // contained relations.
double rank() ; // compute the rank.
}
// Called for two consecutive elements in a chain where the rank is out of order
CombinedRelations combine(const CombinedRelations& first, const CombinedRelations& second);
// Run IKKBZ-Normalize on a single chain until it is fully normalized.
void IkkbzNormalize(std::vector<CombinedRelations>& chain);
// Givien the above structure, this is now simple, just merge by rank.
std::vector<CombinedRelations> IkkbzMerge(std::vector<std::vector<CombinedRelations>>);
Maybe start this design in a separate PR, and then star with unit tests for those rather small functions.
Then in a much simpler IKKBZ graph, you can incorporate those structures, but maybe let me know first,
so that we can talk about the next steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the "merge" out of the query graph. yet I am a little puzzled by this CombinedRelations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a combined relation
is a list of relations which have been combined by IKKBZ, because their ranks were out of ORDER
(they are combined into a single node when forming the chains that are ordered by rank).
Does that answer my question?.
So it is for example the two relations "first R3 and then R5" combined into a single graph node which has one rank which can be computed from the cardinalities and selectivities of R3 and R5.
There's examples for this operation in the lecture and exercise slides, so you can use those as unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the thread that is worth pursuing for now so we can build up a correct and maintainable implementation of IKKBZ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my argument for keeping the relation as a pure data class is that is way more convenient to pass it around as const& all over that place without mutating it while doing the all the book-keeping necessary to calculate the rank on the QueryGraph
.
rm CostASI decouple cost function connection weight GOO draft memorize rank
60a2716
to
6215742
Compare
pre pair hist subchain root exclude hist of pairs mem C, T and rank unpack tests
Quality Gate passedIssues Measures |
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
Finds optimal left-deep tree for an acyclic graph in polynomial time and a necessary prerequisite for search-space linearization.
the cost function has a slight implementation mistake due to a misunderstanding on my behalf, which subsequently has a slight effect on the relation rank. i will fix that ASAP.