Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IKKBZ Join ordering #1330

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open

Conversation

Goblin80
Copy link
Collaborator

@Goblin80 Goblin80 commented Apr 21, 2024

Finds optimal left-deep tree for an acyclic graph in polynomial time and a necessary prerequisite for search-space linearization.

the cost function has a slight implementation mistake due to a misunderstanding on my behalf, which subsequently has a slight effect on the relation rank. i will fix that ASAP.

@Goblin80
Copy link
Collaborator Author

/cc @joka921

Copy link

codecov bot commented Apr 21, 2024

Codecov Report

Attention: Patch coverage is 89.34426% with 52 lines in your changes missing coverage. Please review.

Project coverage is 89.08%. Comparing base (d60b610) to head (27f3269).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/joinOrdering/QueryGraph.cpp 85.84% 7 Missing and 8 partials ⚠️
src/engine/joinOrdering/JoinTree.cpp 80.55% 4 Missing and 10 partials ⚠️
src/engine/joinOrdering/CostCout.cpp 83.33% 2 Missing and 9 partials ⚠️
src/engine/joinOrdering/IKKBZ.cpp 95.45% 0 Missing and 6 partials ⚠️
src/engine/joinOrdering/CostIKKBZ.cpp 85.71% 0 Missing and 4 partials ⚠️
src/engine/joinOrdering/JoinNode.cpp 94.73% 0 Missing and 1 partial ⚠️
src/engine/joinOrdering/LinearizedDP.cpp 96.96% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1330      +/-   ##
==========================================
+ Coverage   89.07%   89.08%   +0.01%     
==========================================
  Files         369      380      +11     
  Lines       34251    34739     +488     
  Branches     3870     3979     +109     
==========================================
+ Hits        30509    30948     +439     
- Misses       2480     2494      +14     
- Partials     1262     1297      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/engine/JoinTree.hpp Outdated Show resolved Hide resolved
@Goblin80 Goblin80 requested a review from joka921 April 27, 2024 21:54
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, to improve this design I would suggest the following first step:

  1. Get the combining of nodes right using the follwing design:
class Relation {
 // ID, selectivity, cardinality
};

// created by the NormalizeStep.
class CombinedRelations {
  std::vector<Relation>; // contained relations.
  double rank() ; // compute the rank.
}

// Called for two consecutive elements in a chain where the rank is out of order
CombinedRelations combine(const CombinedRelations& first, const CombinedRelations& second);


// Run IKKBZ-Normalize on a single chain until it is fully normalized.
void IkkbzNormalize(std::vector<CombinedRelations>& chain);

// Givien the above structure, this is now simple, just merge by rank.
std::vector<CombinedRelations> IkkbzMerge(std::vector<std::vector<CombinedRelations>>);

Maybe start this design in a separate PR, and then star with unit tests for those rather small functions.

Then in a much simpler IKKBZ graph, you can incorporate those structures, but maybe let me know first,
so that we can talk about the next steps.

Comment on lines 63 to 66
std::map<N, std::map<N, RJoin>> r;
std::map<N, std::vector<N>> hist;
std::map<N, int> cardinality;
std::map<N, float> selectivity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is much cleaner if

  1. The RJoin class is called EdgeInfo.
  2. the r is called edges_ and stores std::pair<N, EdgeInfo>`.
  3. Use std:unordered_map or better ad_utility::HashMap instead of std::map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You currently have many copies of the N object. Maybe just store all the relations in one place, and then
only pass around references or pointers (but that can be left open for a later optimization).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the full type signature of edges_?

src/engine/joinOrdering/QueryGraph.h Outdated Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.h Outdated Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.cpp Outdated Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.cpp Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.cpp Outdated Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.cpp Outdated Show resolved Hide resolved
src/engine/joinOrdering/IKKBZ.cpp Show resolved Hide resolved
src/engine/joinOrdering/QueryGraph.cpp Outdated Show resolved Hide resolved
// Mahmoud Khalaf (2024-, [email protected])

#pragma once

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, to improve this design I would suggest the following first step:

  1. Get the combining of nodes right using the follwing design:
class Relation {
 // ID, selectivity, cardinality
};

// created by the NormalizeStep.
class CombinedRelations {
  std::vector<Relation>; // contained relations.
  double rank() ; // compute the rank.
}

// Called for two consecutive elements in a chain where the rank is out of order
CombinedRelations combine(const CombinedRelations& first, const CombinedRelations& second);


// Run IKKBZ-Normalize on a single chain until it is fully normalized.
void IkkbzNormalize(std::vector<CombinedRelations>& chain);

// Givien the above structure, this is now simple, just merge by rank.
std::vector<CombinedRelations> IkkbzMerge(std::vector<std::vector<CombinedRelations>>);

Maybe start this design in a separate PR, and then star with unit tests for those rather small functions.

Then in a much simpler IKKBZ graph, you can incorporate those structures, but maybe let me know first,
so that we can talk about the next steps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the "merge" out of the query graph. yet I am a little puzzled by this CombinedRelations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, a combined relation
is a list of relations which have been combined by IKKBZ, because their ranks were out of ORDER
(they are combined into a single node when forming the chains that are ordered by rank).
Does that answer my question?.

So it is for example the two relations "first R3 and then R5" combined into a single graph node which has one rank which can be computed from the cardinalities and selectivities of R3 and R5.
There's examples for this operation in the lecture and exercise slides, so you can use those as unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the thread that is worth pursuing for now so we can build up a correct and maintainable implementation of IKKBZ.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my argument for keeping the relation as a pure data class is that is way more convenient to pass it around as const& all over that place without mutating it while doing the all the book-keeping necessary to calculate the rank on the QueryGraph.

@Goblin80 Goblin80 requested a review from joka921 April 30, 2024 01:29
@Goblin80 Goblin80 marked this pull request as draft April 30, 2024 20:03
@Goblin80 Goblin80 marked this pull request as ready for review May 1, 2024 17:45
rm CostASI

decouple cost function

connection weight

GOO draft

memorize rank
@Goblin80 Goblin80 force-pushed the joinorder-ikkbz branch 3 times, most recently from 60a2716 to 6215742 Compare May 6, 2024 05:49
@Goblin80 Goblin80 force-pushed the joinorder-ikkbz branch from 6215742 to 6f4ff38 Compare May 6, 2024 23:24
pre pair hist

subchain root exclude

hist of pairs

mem C, T and rank

unpack tests
Copy link

sonarqubecloud bot commented May 8, 2024

Quality Gate Passed Quality Gate passed

Issues
63 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@sparql-conformance
Copy link

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants