Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tdigest): add tdigest algorithm and storage encoding implementations #2741

Merged
merged 52 commits into from
Feb 12, 2025

Conversation

LindaSummer
Copy link
Contributor

Issue

Proposal

Proposed Changes

  • add tdigest core algorithm support.
  • add encoding and decoding method as designed.
  • add core implementations for create, add and quantile.
  • add basic unit tests for create, add and quantile.

TODO List

  • add more unit tests for corner cases and benchmark.
  • find a more stable way to evaluate the error of quantile.
  • refactor the DummyCentroids to an iteration way for efficiency.
  • integrate the core implementations to commands and add integration tests.

@LindaSummer
Copy link
Contributor Author

LindaSummer commented Jan 25, 2025

Hi Team,

Sorry for delaying this feature's implementation.

I kept the commit log since we will squash merging.
Maybe it can help new-commers if they meet similar problems as me for a new feature. 😄

Please give me some suggestions and help take a review.

There are still many parts need to be improved, and I will be very happy to try my best for any improvement. 😊

Best Regards,
Edward

src/storage/storage.h Outdated Show resolved Hide resolved
src/types/tdigest.h Outdated Show resolved Hide resolved
src/types/tdigest.cc Outdated Show resolved Hide resolved
src/types/tdigest.cc Outdated Show resolved Hide resolved
src/types/redis_tdigest.cc Outdated Show resolved Hide resolved
src/types/redis_tdigest.cc Outdated Show resolved Hide resolved
@PragmaTwice PragmaTwice changed the title feat(tdigest): add tdigest core implementations. feat(tdigest): add tdigest core implementations Jan 26, 2025
@LindaSummer LindaSummer force-pushed the feature/tdigest-for-first-pr branch from 26d8348 to f0f2f3d Compare February 3, 2025 16:07
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM

Comment on lines 352 to 353
uint64_t total_observations = 0;
uint64_t merge_times = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment for that one

return iter_ != centroids_.cend();
}
bool Prev() {
if (Valid()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume it's at end, what's it expected behavior here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

At the beginning of tdigest centroids iteration, we have confirmed the value must be inside the range of (min_, max_).
So, if iteration reaches the cend(), it means that the precheck logic doesn't cover all cases or the centroids may be corrupted.

Best Regards,
Edward

src/types/redis_tdigest.cc Show resolved Hide resolved
Comment on lines 125 to 131
if (status.ok()) {
return {};
}

if (!status.IsNotFound()) {
return status;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge these two lines together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Thanks for your suggestion! 😊

We need to return both status and the existence of same tdigest before creation.
I have refactored the std::optional with an explicit flag exists with docstring.

Best Regards,
Edward

src/types/redis_tdigest.cc Show resolved Hide resolved
src/types/redis_tdigest.cc Show resolved Hide resolved
@PragmaTwice PragmaTwice requested a review from mapleFU February 9, 2025 05:12
@LindaSummer
Copy link
Contributor Author

Hi @mapleFU ,

Sorry to interrupt you again. 😊
Could you help take a review for latest update?

Have a nice day!

Best Regards,
Edward

mapleFU
mapleFU previously approved these changes Feb 11, 2025
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM. We can first move on and add a go unittests

}
return iter_ != centroids_.cend();
}
bool Prev() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to explain the Prev() requires to be valid? This function call is ok to be but a bit weird. Since if the iter is the cend(), for std::vector, it's not "Valid" but it prev is valid?

I've read the caller in tdigest.cc, this interface is ok since it prevent to call Prev() if !Valid(), but this requires some comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Thanks for your suggestion! 😊
I have added comment for this function.

Best Regards,
Edward


namespace redis {

// It should be replaced by a iteration of the rocksdb iterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a todo and create an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Got it!
I have added a TODO label for this comment and attach it to the tracking issue #2558 . 😊

Best Regards,
Edward

InternalKey ikey(key, storage_->IsSlotIdEncoded());
auto subkey = ikey.GetSubKey();
auto type_flg = static_cast<uint8_t>(SegmentType::kGuardFlag);
GetFixed8(&subkey, &type_flg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked the return value of Get{} api here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Got it! 😊
I have added checking logic for the decode logic.

Best Regards,
Edward

@LindaSummer
Copy link
Contributor Author

General LGTM. We can first move on and add a go unittests

Hi @mapleFU ,

Thanks very much for your warm help and patience! 😊
Do I need to add command with go integration tests in this PR or create a new PR?

Best Regards,
Edward

@mapleFU
Copy link
Member

mapleFU commented Feb 12, 2025

Let's merge first and then add lots of go tests

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

@PragmaTwice PragmaTwice merged commit 54c5862 into apache:unstable Feb 12, 2025
35 of 36 checks passed
@PragmaTwice
Copy link
Member

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants