-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: a new strategy for auto compaction (ospp 2024) #2816
feat: a new strategy for auto compaction (ospp 2024) #2816
Conversation
WalkthroughThe recent changes introduce enhancements to the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (1)
src/storage/src/redis.h (1)
109-111
: Issues found in the function usage ofLongestNotCompactiontSstCompact
.The function
LongestNotCompactiontSstCompact
is defined with three parameters insrc/storage/src/redis.h
. However, its usage in the following files does not match the definition, as they are missing the third parameter:
src/pika_db.cc
src/pika_server.cc
Please update the function calls to match the definition.
Analysis chain
LGTM! But verify the function usage in the codebase.
The
LongestNotCompactiontSstCompact
method is designed to accept aDataType
option and a vector ofStatus
objects to store results. The code changes are approved.However, ensure that all function calls to
LongestNotCompactiontSstCompact
are correctly implemented.
Verify the function usage of
LongestNotCompactiontSstCompact
to ensure consistency with its definition.The function
LongestNotCompactiontSstCompact
is used in multiple files, but there seems to be a discrepancy in the number of parameters passed during its usage. Verify that all calls to this function match its definition.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `LongestNotCompactiontSstCompact` are correctly implemented. # Test: Search for the function usage. Expect: Correct implementation. rg --type cpp -A 5 $'LongestNotCompactiontSstCompact'Length of output: 5488
Script:
#!/bin/bash # Description: Verify the function definition and its usage to ensure consistency in parameters. # Search for the function definition in header files rg --type cpp --files-with-matches --glob '*redis.h' 'LongestNotCompactiontSstCompact' | xargs rg --type cpp -A 5 'LongestNotCompactiontSstCompact' # Search for the function usage in source files rg --type cpp --files-with-matches --glob '*pika_db.cc' 'LongestNotCompactiontSstCompact' | xargs rg --type cpp -A 5 'LongestNotCompactiontSstCompact' rg --type cpp --files-with-matches --glob '*pika_server.cc' 'LongestNotCompactiontSstCompact' | xargs rg --type cpp -A 5 'LongestNotCompactiontSstCompact' rg --type cpp --files-with-matches --glob '*storage.cc' 'LongestNotCompactiontSstCompact' | xargs rg --type cpp -A 5 'LongestNotCompactiontSstCompact'Length of output: 2090
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (12)
- conf/pika.conf (1 hunks)
- conf/s_pika.conf (1 hunks)
- include/pika_conf.h (3 hunks)
- include/pika_db.h (1 hunks)
- include/pika_server.h (1 hunks)
- src/pika_conf.cc (3 hunks)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (4 hunks)
- src/storage/include/storage/storage.h (4 hunks)
- src/storage/src/redis.cc (2 hunks)
- src/storage/src/redis.h (2 hunks)
- src/storage/src/storage.cc (5 hunks)
Additional comments not posted (99)
conf/s_pika.conf (54)
3-3
: Specify the port number.Ensure that the port number
9221
does not conflict with other services running on the same machine.
4-4
: Optimize thread usage.The
thread-num
is set to8
. Ensure this value is optimal for your server's hardware capabilities.
5-5
: Ensure log path exists.Verify that the
log-path
directory./log/
exists and is writable by the Pika process.
7-7
: Ensure database path exists.Verify that the
db-path
directory./db/
exists and is writable by the Pika process.
8-8
: Optimize write buffer size.The
write-buffer-size
is set to256M
. Ensure this value is optimal for your workload and server memory.
9-9
: Review timeout setting.The
timeout
is set to30
seconds. Ensure this value is appropriate for your use case.
15-15
: Review dump expiration setting.The
dump-expire
is set to1
day. Ensure this value is appropriate for your data retention policy.
18-18
: Ensure dump path exists.Verify that the
dump-path
directory./dump/block_cache_size
exists and is writable by the Pika process.
20-20
: Review target file size base.The
target-file-size-base
is set to20971520
bytes (20 MB). Ensure this value is optimal for your workload.
21-21
: Review log expiration setting.The
expire-logs-days
is set to7
days. Ensure this value is appropriate for your log retention policy.
22-22
: Review log expiration number setting.The
expire-logs-nums
is set to300
. Ensure this value is appropriate for your log retention policy.
23-23
: Review root connection number.The
root-connection-num
is set to10
. Ensure this value is appropriate for your use case.
24-24
: Review slow log threshold.The
slowlog-log-slower-than
is set to100000
microseconds (100 ms). Ensure this value is appropriate for your performance monitoring.
25-25
: Review binlog file size.The
binlog-file-size
is set to104857600
bytes (100 MB). Ensure this value is optimal for your workload.
26-26
: Review compression setting.The
compression
is set tosnappy
. Ensure this compression algorithm is appropriate for your use case.
28-28
: Review DB sync speed.The
db-sync-speed
is set to60
MB/s. Ensure this value is optimal for your network and workload.
30-30
: Review small compaction threshold.The
small-compaction-threshold
is set to5000
. Ensure this value is optimal for your workload.
31-31
: Review max write buffer size.The
max-write-buffer-size
is set to20737418240
bytes (20 GB). Ensure this value is optimal for your server memory.
32-32
: Review max cache files.The
max-cache-files
is set to8000
. Ensure this value is optimal for your workload.
33-33
: Review replication number.The
replication-num
is set to0
. Ensure this value is appropriate for your replication strategy.
34-34
: Review consensus level.The
consensus-level
is set to0
. Ensure this value is appropriate for your consensus strategy.
35-35
: Review max cache statistic keys.The
max-cache-statistic-keys
is set to0
. Ensure this value is appropriate for your cache statistics.
36-36
: Review thread pool size.The
thread-pool-size
is set to30
. Ensure this value is optimal for your server's hardware capabilities.
38-38
: Review default slot number.The
default-slot-num
is set to1024
. Ensure this value is appropriate for your slot management.
39-39
: Review instance mode.The
instance-mode
is set toclassic
. Ensure this value is appropriate for your deployment strategy.
40-40
: Review number of databases.The
databases
is set to1
. Ensure this value is appropriate for your use case.
41-41
: Review sync thread number.The
sync-thread-num
is set to1
. Ensure this value is optimal for your workload.
42-42
: Review arena block size.The
arena-block-size
is set to33554432
bytes (32 MB). Ensure this value is optimal for your workload.
43-43
: Review max background jobs.The
max-background-jobs
is set to12
. Ensure this value is optimal for your server's hardware capabilities.
44-44
: Review max background flushes.The
max-background-flushes
is set to3
. Ensure this value is optimal for your workload.
45-45
: Review max background compactions.The
max-background-compactions
is set to9
. Ensure this value is optimal for your workload.
46-46
: Review rate limiter bandwidth.The
rate-limiter-bandwidth
is set to1099511627776
bytes (1 TB). Ensure this value is optimal for your network and workload.
47-47
: Review number of DB instances.The
db-instance-num
is set to1
. Ensure this value is appropriate for your use case.
48-48
: Review block size.The
block-size
is set to4096
bytes (4 KB). Ensure this value is optimal for your workload.
50-50
: Review block cache size.The
block-cache
is set to4294967296
bytes (4 GB). Ensure this value is optimal for your server memory.
51-51
: Review max subcompactions.The
max-subcompactions
is set to8
. Ensure this value is optimal for your workload.
54-54
: Review cache LFU decay time.The
cache-lfu-decay-time
is set to1
. Ensure this value is appropriate for your cache eviction policy.
55-55
: Review cache max memory samples.The
cache-maxmemory-samples
is set to5
. Ensure this value is optimal for your cache management.
56-56
: Review cache max memory policy.The
cache-maxmemory-policy
is set to1
. Ensure this value is appropriate for your cache eviction policy.
57-57
: Review number of caches.The
cache-num
is set to8
. Ensure this value is optimal for your workload.
58-58
: Review cache model.The
cache-model
is set to0
. Ensure this value is appropriate for your cache strategy.
59-59
: Review ZSET cache field number per key.The
zset-cache-field-num-per-key
is set to512
. Ensure this value is optimal for your workload.
60-60
: Review ZSET cache start direction.The
zset-cache-start-direction
is set to0
. Ensure this value is appropriate for your ZSET cache strategy.
63-63
: Review share block cache setting.The
share-block-cache
is set toyes
. Ensure this value is appropriate for your cache strategy.
64-64
: Review throttle bytes per second.The
throttle-bytes-per-second
is set to102400000
bytes (100 MB). Ensure this value is optimal for your network and workload.
65-65
: Review max rsync parallel number.The
max-rsync-parallel-num
is set to4
. Ensure this value is optimal for your workload.
66-66
: Review write binlog setting.The
write-binlog
is set tono
. Ensure this value is appropriate for your replication strategy.
67-67
: Review slot migration setting.The
slotmigrate
is set tono
. Ensure this value is appropriate for your slot management strategy.
73-73
: Review compaction strategy.The
compaction-strategy
is set toobd-compact
. Ensure this value is optimal for your workload and compaction needs.
78-78
: Review compact every number of files setting.The
compact-every-num-of-files
is set to10
. Ensure this value is optimal for your workload.
86-86
: Review force compact file age seconds.The
force-compact-file-age-seconds
is set to300
seconds (5 minutes). Ensure this value is optimal for your workload.
91-91
: Review force compact minimum delete ratio.The
force-compact-min-delete-ratio
is set to10
. Ensure this value is optimal for your workload.
96-96
: Review don't compact SST created in seconds.The
dont-compact-sst-created-in-seconds
is set to20
. Ensure this value is optimal for your workload.
101-101
: Review best delete minimum ratio.The
best-delete-min-ratio
is set to20
. Ensure this value is optimal for your workload.include/pika_server.h (1)
62-62
: Enumeration value addition looks good.The new enumeration value
kCompactOldestOrBestDeleteRatioSst
is consistent with the existing naming conventions.src/pika_db.cc (2)
169-175
: Verify the impact of commenting outDB::FullCompact
.The
FullCompact
method has been commented out. Ensure that this change does not affect other parts of the codebase that might rely on this method.Verification successful
Verification successful: Commenting out
DB::FullCompact
does not impact other parts of the codebase.The
DB::FullCompact
method is not used elsewhere in the codebase, confirming that commenting it out does not affect other parts of the project.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify if `DB::FullCompact` is used elsewhere in the codebase. # Test: Search for the usage of `DB::FullCompact`. Expect: No usage. rg --type cpp -A 5 $'DB::FullCompact'Length of output: 262
177-183
: LGTM! But verify the function usage in the codebase.The
LongestNotCompactiontSstCompact
method ensures thread safety and checks if the database is opened before performing a specific type of data compaction. The code changes are approved.However, ensure that all function calls to
LongestNotCompactiontSstCompact
are correctly implemented.src/storage/src/redis.cc (2)
204-260
: LGTM! But verify the function usage in the codebase.The
SelectColumnFamilyHandles
method determines which column family handles should be selected based on the providedDataType
andColumnFamilyType
. The code changes are approved.However, ensure that all function calls to
SelectColumnFamilyHandles
are correctly implemented.Verification successful
Verification successful!
The
SelectColumnFamilyHandles
function is used correctly in the codebase. The only occurrence is insrc/storage/src/redis.cc
, confirming that it is implemented as intended.
src/storage/src/redis.cc
: Verified usage.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `SelectColumnFamilyHandles` are correctly implemented. # Test: Search for the function usage. Expect: Correct implementation. rg --type cpp -A 5 $'SelectColumnFamilyHandles'Length of output: 791
Script:
#!/bin/bash # Description: Search for all occurrences of `SelectColumnFamilyHandles` in the codebase to confirm there are no other usages that need to be verified. # Test: Search for the function usage. Expect: Correct implementation. rg 'SelectColumnFamilyHandles' --type cppLength of output: 243
262-409
: LGTM! But verify the function usage in the codebase.The
LongestNotCompactiontSstCompact
method implements a complex logic for managing compaction within the Redis storage system. The code changes are approved.However, ensure that all function calls to
LongestNotCompactiontSstCompact
are correctly implemented.Verification successful
Function usage verified successfully.
The function
LongestNotCompactiontSstCompact
is correctly implemented and used across the codebase.
src/pika_db.cc
src/pika_server.cc
src/storage/src/storage.cc
src/storage/src/redis.h
src/storage/include/storage/storage.h
include/pika_db.h
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `LongestNotCompactiontSstCompact` are correctly implemented. # Test: Search for the function usage. Expect: Correct implementation. rg --type cpp -A 5 $'LongestNotCompactiontSstCompact'Length of output: 5488
Script:
#!/bin/bash # Description: Inspect specific lines in the files where `LongestNotCompactiontSstCompact` is used. # Check the usage in src/pika_db.cc rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' src/pika_db.cc # Check the usage in src/pika_server.cc rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' src/pika_server.cc # Check the usage in src/storage/src/storage.cc rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' src/storage/src/storage.cc # Check the usage in src/storage/src/redis.h rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' src/storage/src/redis.h # Check the usage in src/storage/include/storage/storage.h rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' src/storage/include/storage/storage.h # Check the usage in include/pika_db.h rg --type cpp -A 5 -B 5 'LongestNotCompactiontSstCompact' include/pika_db.hLength of output: 4372
src/storage/src/redis.h (1)
483-483
: LGTM! But verify the variable usage in the codebase.The
in_compact_flag_
member variable is likely used to manage the state of compaction processes and ensure thread safety. The code changes are approved.However, ensure that all usages of
in_compact_flag_
are correctly implemented.Verification successful
Verify the correct implementation of
in_compact_flag_
usage in the compaction process.The variable
in_compact_flag_
is used inredis.cc
with atomic operations and aDEFER
macro. This appears to manage the state of a compaction process. Ensure the surrounding code correctly handles this process.
The usage of
in_compact_flag_
is correctly implemented.The
in_compact_flag_
variable is used to manage the state of the compaction process with appropriate atomic operations and aDEFER
macro to reset the flag. The surrounding code correctly handles the compaction process.
src/storage/src/redis.cc
: Verified the correct usage ofin_compact_flag_
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all usages of `in_compact_flag_` are correctly implemented. # Test: Search for the variable usage. Expect: Correct implementation. rg --type cpp -A 5 $'in_compact_flag_'Length of output: 1114
Script:
#!/bin/bash # Description: Verify the context of `in_compact_flag_` usage in `redis.cc`. # Search for the function or method where `in_compact_flag_` is used to understand the complete context. rg --type cpp -A 20 'in_compact_flag_' src/storage/src/redis.ccLength of output: 1025
conf/pika.conf (6)
641-645
: LGTM! New compaction strategy parameter.The
compaction-strategy
parameter allows users to select betweenfull-compact
andobd-compact
. The default value isobd-compact
.
647-650
: LGTM! New parameter for SST file compaction trigger.The
compact-every-num-of-files
parameter specifies the number of SST files that will trigger a compaction event. The default value is set to 10.
652-658
: LGTM! New parameter for file age compaction trigger.The
force-compact-file-age-seconds
parameter sets a threshold for file age before compaction is forced. The default value is set to 300 seconds.
660-663
: LGTM! New parameter for minimum delete ratio compaction trigger.The
force-compact-min-delete-ratio
parameter specifies the minimum delete ratio required to force a compaction event. The default value is set to 10.
665-668
: LGTM! New parameter for avoiding compaction of newly created SST files.The
dont-compact-sst-created-in-seconds
parameter sets a threshold for the minimum age of SST files to avoid compaction. The default value is set to 600 seconds.
670-673
: LGTM! New parameter for best delete ratio compaction trigger.The
best-delete-min-ratio
parameter specifies the best delete ratio required to trigger a compaction event. The default value is set to 10.include/pika_conf.h (7)
31-35
: LGTM! New enumeration for compaction strategies.The
CompactionStrategy
enumeration defines two strategies:FullCompact
andOldestOrBestDeleteRatioSstCompact
.
123-126
: LGTM! New accessor method forcompact_every_num_of_files_
.The
compact_every_num_of_files()
method provides thread-safe access to thecompact_every_num_of_files_
member variable.
127-130
: LGTM! New accessor method forforce_compact_file_age_seconds_
.The
force_compact_file_age_seconds()
method provides thread-safe access to theforce_compact_file_age_seconds_
member variable.
131-134
: LGTM! New accessor method forforce_compact_min_delete_ratio_
.The
force_compact_min_delete_ratio()
method provides thread-safe access to theforce_compact_min_delete_ratio_
member variable.
135-138
: LGTM! New accessor method fordont_compact_sst_created_in_seconds_
.The
dont_compact_sst_created_in_seconds()
method provides thread-safe access to thedont_compact_sst_created_in_seconds_
member variable.
139-142
: LGTM! New accessor method forbest_delete_min_ratio_
.The
best_delete_min_ratio()
method provides thread-safe access to thebest_delete_min_ratio_
member variable.
143-146
: LGTM! New accessor method forcompaction_strategy_
.The
compaction_strategy()
method provides thread-safe access to thecompaction_strategy_
member variable.src/pika_conf.cc (12)
304-307
: Ensure minimum value forcompact-every-num-of-files
.The code enforces a minimum value of 10 for
compact-every-num-of-files
. This is a good practice to prevent misconfigurations.
309-312
: Ensure minimum value forforce-compact-file-age-seconds
.The code enforces a minimum value of 300 seconds for
force-compact-file-age-seconds
. This is a good practice to prevent misconfigurations.
314-317
: Ensure minimum value forforce-compact-min-delete-ratio
.The code enforces a minimum value of 10 for
force-compact-min-delete-ratio
. This is a good practice to prevent misconfigurations.
319-322
: Ensure minimum value fordont-compact-sst-created-in-seconds
.The code enforces a minimum value of 600 seconds for
dont-compact-sst-created-in-seconds
. This is a good practice to prevent misconfigurations.
324-327
: Ensure minimum value forbest-delete-min-ratio
.The code enforces a minimum value of 10 for
best-delete-min-ratio
. This is a good practice to prevent misconfigurations.
329-337
: Validate compaction strategy.The code correctly sets the compaction strategy based on the
compaction-strategy
parameter, defaulting toFullCompact
if the value is not recognized.
820-823
: Ensure minimum value forcompact-every-num-of-files
.The code enforces a minimum value of 1 for
compact-every-num-of-files
. This is a good practice to prevent misconfigurations.
824-827
: Ensure minimum value forforce-compact-file-age-seconds
.The code enforces a minimum value of 300 seconds for
force-compact-file-age-seconds
. This is a good practice to prevent misconfigurations.
828-831
: Ensure minimum value forforce-compact-min-delete-ratio
.The code enforces a minimum value of 5 for
force-compact-min-delete-ratio
. This is a good practice to prevent misconfigurations.
832-835
: Ensure minimum value fordont-compact-sst-created-in-seconds
.The code enforces a minimum value of 300 seconds for
dont-compact-sst-created-in-seconds
. This is a good practice to prevent misconfigurations.
836-839
: Ensure minimum value forbest-delete-min-ratio
.The code enforces a minimum value of 10 for
best-delete-min-ratio
. This is a good practice to prevent misconfigurations.
841-849
: Validate compaction strategy.The code correctly sets the compaction strategy based on the
compaction-strategy
parameter, defaulting toFullCompact
if the value is not recognized.src/storage/include/storage/storage.h (5)
74-81
: Addition ofCompactParam
struct looks good.The
CompactParam
struct is well-defined and its fields are appropriately typed.
82-82
: Addition ofcompact_param_
field looks good.The
compact_param_
field is logically added and aligns with the newCompactParam
struct.
165-166
: Addition ofkCompactOldestOrBestDeleteRatioSst
toOperation
enum looks good.The new enum value is appropriately added to the
Operation
enum.
1082-1088
: Addition ofLongestNotCompactiontSstCompact
method looks good.The method is well-documented and aligns with the class's responsibilities.
1113-1113
: Addition ofGetStorageOptions
method looks good.The method provides necessary access to the current storage options.
src/storage/src/storage.cc (5)
96-96
: LGTM!The addition of storing
storage_options
is correct and necessary for subsequent operations.
1705-1706
: LGTM!The new condition for
kCompactOldestOrBestDeleteRatioSst
correctly integrates theLongestNotCompactiontSstCompact
method, enhancing background task management.
1742-1743
: LGTM!The asynchronous compaction implementation correctly schedules a background task for compaction.
1957-1959
: LGTM!The getter method
GetStorageOptions
correctly provides external access to thestorage_options_
member variable.
1728-1745
: Ensure proper error handling in synchronous compaction.The synchronous compaction implementation iterates through instances and logs errors. Ensure that all potential errors are handled appropriately.
Verification successful
Ensure proper error handling in synchronous compaction.
The previous script did not produce any output. I will refine the search pattern to ensure it captures the relevant context and checks for error handling more effectively.
Ensure proper error handling in synchronous compaction.
The previous scripts did not produce any output. I will search for the implementation of
LongestNotCompactiontSstCompact
and inspect the relevant lines directly to verify the error handling.
Proper error handling in synchronous compaction verified.
The synchronous compaction implementation in
Storage::LongestNotCompactiontSstCompact
correctly logs errors usingLOG(ERROR) << compact_result.ToString();
.
- Location:
src/storage/src/storage.cc
, lines 1728-1745Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify proper error handling in synchronous compaction. # Test: Search for error handling in synchronous compaction. Expect: Proper logging and handling of all errors. rg --type cpp -A 5 'for (const auto& inst : insts_)' | rg 'compact_result.ok()'Length of output: 81
Script:
#!/bin/bash # Description: Verify proper error handling in synchronous compaction. # Test: Search for error handling in synchronous compaction. Expect: Proper logging and handling of all errors. rg --type cpp -A 10 'for (const auto& inst : insts_)' | rg 'if (!compact_result.ok())'Length of output: 88
Script:
#!/bin/bash # Description: Find and inspect the implementation of `LongestNotCompactiontSstCompact`. # Search for the implementation of `LongestNotCompactiontSstCompact` in C++ files. rg --type cpp 'LongestNotCompactiontSstCompact' -A 20Length of output: 14815
src/pika_server.cc (4)
427-429
: New compaction task type added.The new case
TaskType::kCompactOldestOrBestDeleteRatioSst
correctly integrates with the existing switch-case structure. Ensure thatLongestNotCompactiontSstCompact
is properly implemented and tested.
1144-1149
: Conditional logic for compaction strategy added.The new conditional logic correctly selects between different compaction strategies based on the configuration. Ensure that the configuration values for
compaction_strategy
are properly set and validated.
1344-1349
: New storage options for compaction added.The new storage options for compaction are correctly initialized using the configuration values. Ensure that the configuration values for these options are properly set and validated.
1403-1409
: Additional compaction parameters initialized.The additional compaction parameters are correctly initialized using the configuration values. Ensure that these parameters are properly documented and tested.
include/pika_db.h
Outdated
// void FullCompact(); | ||
void LongestNotCompactiontSstCompact(const storage::DataType& type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in method name.
The method name LongestNotCompactiontSstCompact
contains a typo. Consider renaming it to LongestNotCompactedSstCompact
.
- void LongestNotCompactiontSstCompact(const storage::DataType& type);
+ void LongestNotCompactedSstCompact(const storage::DataType& type);
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// void FullCompact(); | |
void LongestNotCompactiontSstCompact(const storage::DataType& type); | |
// void FullCompact(); | |
void LongestNotCompactedSstCompact(const storage::DataType& type); |
This comment was marked as outdated.
This comment was marked as outdated.
src/pika_server.cc
Outdated
@@ -1138,7 +1141,12 @@ void PikaServer::AutoCompactRange() { | |||
gettimeofday(&last_check_compact_time_, nullptr); | |||
if ((static_cast<double>(free_size) / static_cast<double>(total_size)) * 100 >= usage) { | |||
std::set<std::string> dbs = g_pika_server->GetAllDBName(); | |||
Status s = DoSameThingSpecificDB(dbs, {TaskType::kCompactAll}); | |||
Status s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里只能从配置文件中选择一种么?如果选择了最长时间或者最大墓碑率compact 策略,那么定时的 Fullcompact 就没有了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
能不能二者并存,比如长间隔 full compact ,短间隔 OldestOrBestDeleteRatioSstCompact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理论上是可以的,但是得等 hou 哥那边把最新的代码提交上去,然后我再看这块如何处理。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (8)
tests/integration/start_codis.sh (1)
48-49
: Useful troubleshooting comment for startup failures.The new comment provides valuable insights into potential causes of startup failures, specifically mentioning unused structural or class definitions and the introduction of additional packages as possible culprits. This information can guide users in identifying and resolving issues when encountering startup problems.
To further enhance the clarity of the comment, consider rephrasing it slightly:
# If startup fails, it could be due to one of the following reasons: # 1. Structural or class definitions that are defined but not used. # 2. Introduction of additional packages that cause conflicts or compatibility issues.
This reformulation improves readability and clearly outlines the two potential causes of startup failures.
src/net/include/random.h (2)
16-84
: Consider adding documentation for the class and its methods.To improve the usability and maintainability of the code, consider adding documentation for the
Random
class and its public methods. Explain the purpose of the class, the algorithm used, and provide examples of how to use the different random number generation methods.
16-84
: Add unit tests to verify the correctness of the random number generation.To ensure the correctness of the random number generation, consider adding unit tests for the
Random
class. Test the different methods with various input values and verify that the generated numbers follow the expected distributions.Do you want me to generate the unit testing code or open a GitHub issue to track this task?
src/pika_server.cc (5)
1234-1240
: Consider the impact of compaction on system performance.The function directly triggers compaction on all databases using
DoSameThingEveryDB
. Compaction can be a resource-intensive operation, so it's important to consider its impact on the overall system performance.Consider the following:
- Monitor the system resource utilization (CPU, I/O, memory) during compaction to ensure it doesn't adversely affect other critical operations.
- If possible, schedule compaction during off-peak hours or periods of low traffic to minimize the impact on the system.
- Provide configuration options to control the compaction behavior, such as throttling the compaction speed or limiting the number of concurrent compactions.
1510-1516
: Consider the impact of storage options on performance and resource utilization.The storage options configured in this function have a significant impact on the performance and behavior of RocksDB. It's crucial to carefully tune these options based on the specific workload and hardware characteristics to achieve optimal performance.
Consider the following:
- Monitor the system resource utilization (CPU, I/O, memory) and RocksDB metrics to understand the impact of the storage options on performance.
- Conduct performance testing and benchmarking with different storage option configurations to identify the optimal settings for your specific workload.
- Regularly review and adjust the storage options as the workload and hardware characteristics evolve over time.
- Provide documentation or guidelines to help users understand the implications of each storage option and how to tune them effectively.
Line range hint
1610-1610
: Consider the impact of background slot reloading on system performance.The
DoBgslotsreload
function performs a potentially long-running operation by scanning the entire database. It's important to consider the impact of this operation on the system performance and ensure that it doesn't interfere with other critical operations.Consider the following:
- Monitor the system resource utilization (CPU, I/O, memory) during background slot reloading to assess its impact on performance.
- Implement throttling mechanisms to control the pace of slot reloading and prevent it from overloading the system.
- Schedule background slot reloading during off-peak hours or periods of low traffic to minimize the impact on the system.
- Provide configuration options to control the behavior of background slot reloading, such as the batch size and frequency.
Line range hint
1704-1704
: Consider the impact of background slot cleanup on system performance.The
DoBgslotscleanup
function performs a potentially long-running operation by scanning the entire database. It's important to consider the impact of this operation on the system performance and ensure that it doesn't interfere with other critical operations.Consider the following:
- Monitor the system resource utilization (CPU, I/O, memory) during background slot cleanup to assess its impact on performance.
- Implement throttling mechanisms to control the pace of slot cleanup and prevent it from overloading the system.
- Schedule background slot cleanup during off-peak hours or periods of low traffic to minimize the impact on the system.
- Provide configuration options to control the behavior of background slot cleanup, such as the batch size and frequency.
Verify the error handling for the
Scan
method inDoBgslotsreload
.The current implementation lacks proper error handling for the
Scan
method in theDoBgslotsreload
function. Here are the issues and suggested improvements:
- The
Scan
method's return value is stored incursor_ret
, but there's no check for error conditions.- There's no logging of potential errors during the scan process.
- The loop continues without any special handling if an error occurs.
To improve error handling, consider the following:
- Check the return value of the
Scan
method for error conditions.- Add appropriate error logging using LOG(ERROR) if the scan fails.
- Consider breaking the loop or taking appropriate action if an error is encountered.
Example of improved error handling:
while(cursor_ret != 0 && p->GetSlotsreloading()) { cursor_ret = reload.db->storage()->Scan(storage::DataType::kAll, reload.cursor, reload.pattern, reload.count, &keys); if (cursor_ret < 0) { LOG(ERROR) << "Scan failed during slot reloading. Error: " << cursor_ret; // Consider breaking the loop or taking appropriate action break; } // ... rest of the loop body }This change would improve the reliability and debuggability of the slot reloading process.
Analysis chain
Line range hint
1613-1628
: Verify the error handling for theScan
method.The function relies on the
Scan
method of the storage engine to retrieve keys in batches. It's crucial to handle any errors or exceptions that may occur during the scan process to ensure the reliability and stability of the system.Ensure that the
Scan
method is properly handled and that the function behaves correctly if an error occurs. You can use the following script to verify the error handling:
To gather the necessary information for verifying the error handling of the
Scan
method, we need to run additional scripts. Let's search for the implementation and usage of theScan
method, particularly in the context of bgslots_reload.
We need to gather more specific information to verify the error handling for the Scan method in the context of bgslots_reload. Let's run additional scripts to find the relevant code sections.
To verify the error handling for the
Scan
method in the context of slot reloading, we need to examine theDoBgslotsreload
function implementation. Let's run a script to find this information.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the error handling for the `Scan` method. # Test: Search for the `Scan` method call and check the error handling. # Expect: Proper error handling and logging if the `Scan` method fails. rg --type c -A 10 $'Scan\('Length of output: 8898
Script:
#!/bin/bash # Search for the implementation of the Scan method in storage-related files echo "Searching for Scan method implementation:" rg --type cpp -A 10 'int64_t Scan\(' src/storage # Look for usage of the Scan method in the context of bgslots_reload echo "Searching for Scan usage in bgslots_reload context:" rg --type cpp -A 10 'Scan\(' src | rg -C 5 'bgslots_reload' # Examine the surrounding code for error handling patterns echo "Examining error handling patterns around Scan usage:" rg --type cpp -A 15 'Scan\(' src | rg -C 5 'if \(.*status.*\)|LOG\(ERROR\)'Length of output: 1663
Script:
#!/bin/bash # Search for the Scan method implementation in .cc files echo "Searching for Scan method implementation in .cc files:" rg --type cpp -A 15 'int64_t Scan\(' src # Look for bgslots_reload implementation and any Scan-related calls echo "Searching for bgslots_reload implementation and Scan usage:" rg --type cpp -A 20 'bgslots_reload' src | rg -C 5 'Scan' # Expand the search for error handling patterns around Scan or bgslots_reload echo "Examining error handling patterns around Scan or bgslots_reload:" rg --type cpp -A 15 -e 'Scan' -e 'bgslots_reload' src | rg -C 5 'if \(.*\)|LOG\(|throw|return.*error'Length of output: 105137
Script:
#!/bin/bash # Search for the DoBgslotsreload function implementation echo "Searching for DoBgslotsreload function implementation:" rg -A 30 'void DoBgslotsreload\(' src # Examine error handling around the Scan method call echo "Examining error handling around the Scan method call:" rg -A 10 -B 5 'Scan\(' src | rg -C 5 'if \(.*status.*\)|LOG\(ERROR\)|throw|return.*error'Length of output: 2624
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (17)
- conf/pika.conf (3 hunks)
- include/pika_conf.h (3 hunks)
- include/pika_db.h (1 hunks)
- include/pika_server.h (1 hunks)
- src/net/include/likely.h (1 hunks)
- src/net/include/net_define.h (0 hunks)
- src/net/include/random.h (1 hunks)
- src/net/include/thread_pool.h (4 hunks)
- src/net/src/thread_pool.cc (5 hunks)
- src/pika_conf.cc (3 hunks)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (7 hunks)
- src/storage/include/storage/storage.h (5 hunks)
- src/storage/src/redis.cc (10 hunks)
- src/storage/src/redis.h (2 hunks)
- src/storage/src/storage.cc (5 hunks)
- tests/integration/start_codis.sh (2 hunks)
Files not reviewed due to no reviewable changes (1)
- src/net/include/net_define.h
Additional comments not posted (64)
src/net/include/likely.h (1)
1-17
: LGTM!The new header file
likely.h
is well-structured and follows good practices. The macrosLIKELY
andUNLIKELY
are correctly defined based on the GCC version, utilizing the__builtin_expect
function for branch prediction optimization when available. The fallback case for other compilers or older GCC versions is handled appropriately.The code is properly formatted, and the copyright notice and license information are present. The header guard is correctly defined to prevent multiple inclusions.
Overall, the code changes in this file are of high quality and can be approved.
tests/integration/start_codis.sh (1)
3-13
: Helpful comments for clearing data before starting Codis.The added comments provide useful suggestions for clearing various data directories and files before starting Codis. Following these cleanup steps ensures a clean startup environment and can help prevent potential issues caused by stale data.
It's important to note that while these comments are currently uncommented, they should be treated as recommendations rather than mandatory steps. Users should assess their specific use case and determine if clearing the mentioned directories aligns with their requirements.
src/net/include/random.h (1)
16-84
: LGTM! TheRandom
class implementation looks good.The class correctly implements a pseudo-random number generator using the linear congruential generator (LCG) algorithm. The private
seed_
member variable and theGoodSeed
function ensure that the seed value is within the valid range. The public methods provide various ways to generate random numbers with specific distributions.src/net/include/thread_pool.h (7)
12-12
: LGTM!Including the
<vector>
header is necessary for usingstd::vector
in the code.
34-38
: LGTM!The
Arg
struct is a good addition to theWorker
class. It provides a clean way to pass arguments to worker threads, with the flexibility of avoid*
for arbitrary data and anint
for worker identification.
Line range hint
40-52
: LGTM!The modifications to the
Worker
class constructor and the addition of theidx_
andarg_
member variables are valuable enhancements. They allow each worker thread to be uniquely identified and provide a convenient way to pass theThreadPool
pointer and worker index to the worker thread's main function. This improves the management and flexibility of worker threads within the thread pool.
72-129
: Excellent enhancements to the ThreadPool class!The introduced changes significantly improve the functionality and performance of the thread pool:
- The
runInThread
function allows running tasks in specific worker threads, providing finer control over task execution.- The
AdaptationContext
struct enables managing adaptation states using an atomic integer, enhancing the thread pool's adaptability to varying workloads.- The
Node
struct represents task nodes with links to older and newer nodes, enabling efficient task management and execution order control.- The
ReDelaySchedule
function supports re-pushing timer tasks, improving task scheduling flexibility.- The
AsmVolatilePause
function provides platform-specific pause instructions for better thread synchronization and performance.- The
CreateMissingNewerLinks
andLinkOne
functions enhance task node link management, ensuring proper connectivity and order.- The new member variables support the implementation of the above features and improve overall task management, scheduling, and adaptation capabilities.
These enhancements make the thread pool more versatile, efficient, and adaptable to different scenarios and requirements.
137-138
: LGTM!The addition of the
mu_
andrsignal_
member variables is a good choice for enhancing the thread pool's synchronization capabilities:
- Multiple mutexes (
mu_
) can help reduce contention and improve concurrency by allowing different worker threads to lock different mutexes simultaneously, increasing parallelism.- Multiple condition variables (
rsignal_
) enable signaling specific worker threads or groups of threads, providing more fine-grained control over thread synchronization and coordination.The use of
pstd::Mutex
andpstd::CondVar
suggests that the thread pool is leveraging a custom or platform-specific threading library, likely for better performance or compatibility.These additions contribute to the overall efficiency and flexibility of the thread pool implementation.
Line range hint
1-140
: Excellent work on enhancing the thread pool implementation!The changes made to the thread pool are substantial and well-thought-out, addressing various aspects of thread management, task scheduling, synchronization, and adaptation. The enhancements make the thread pool more efficient, flexible, and adaptable to different scenarios and requirements.
Key highlights:
- Worker thread indexing allows for better identification and management of individual threads.
- The
AdaptationContext
struct provides a mechanism for managing adaptation states, enabling effective adaptation to varying workloads.- The
Node
struct and associated functions improve task management and execution order control, allowing for more efficient scheduling and re-scheduling of tasks.- The
AsmVolatilePause
function adds platform-specific optimizations for thread synchronization and performance.- The introduction of multiple mutexes and condition variables significantly improves concurrency and synchronization, reducing contention and providing fine-grained control over thread coordination.
The changes demonstrate a deep understanding of thread pool mechanics, performance considerations, and the need for flexibility in real-world scenarios. The code is well-structured, maintainable, and follows good practices.
Great job on this implementation! The thread pool is now more robust, efficient, and adaptable, thanks to your efforts.
15-15
: Verify the necessity of including the "net/include/random.h" header.Please ensure that the functionality provided by the
"net/include/random.h"
header is actually used in this file. If not, consider removing the include statement to improve compilation time and reduce unnecessary dependencies.Run the following script to check if the
"net/include/random.h"
header is used:include/pika_db.h (1)
136-136
: Fix typo in method name.The method name
LongestNotCompactiontSstCompact
contains a typo. Consider renaming it toLongestNotCompactedSstCompact
.- void LongestNotCompactiontSstCompact(const storage::DataType& type); + void LongestNotCompactedSstCompact(const storage::DataType& type);src/net/src/thread_pool.cc (9)
19-23
: LGTM!The changes to the
WorkerMain
function allow each worker to be aware of its index, which can facilitate better task management. Casting the argument to the appropriate types and passing the worker index torunInThread
enables each worker to operate with its specific context.
50-73
: LGTM!The changes to the
ThreadPool
constructor introduce a more granular task management approach by dividing the thread pool into multiple links. Each link has its own set of task nodes, queue size limits, and synchronization primitives. This design can potentially improve the scalability and performance of the thread pool.
81-81
: LGTM!Passing the worker index to the
Worker
constructor is consistent with the changes made to theWorkerMain
function. This allows each worker to be aware of its index from the start, enabling better task management.
96-98
: LGTM!Using a range-based loop to notify all elements in the
rsignal_
array is consistent with the introduction of thersignal_
array in the constructor. This ensures that all links in the thread pool are properly notified when stopping the thread pool.
118-127
: LGTM!The changes to the
Schedule
method introduce checks on the task count before adding a new task, ensuring that the queue does not exceed its maximum size. The use of atomic operations for managing the task count ensures thread safety. Yielding the current thread when the task count exceeds the limits allows other threads to process tasks and reduces contention, improving overall performance.
145-150
: LGTM!The changes to the
DelaySchedule
method improve task management and distribution across the links. The method generates a unique index for each task, links the new task node to thenewest_node_
array, and notifies the appropriate link about the new delayed task. The use of atomic operations for managing the delayed task count ensures thread safety and provides useful information for monitoring and management purposes.
158-160
: LGTM!Using atomic load operations with
memory_order_relaxed
ordering in thecur_queue_size
andcur_time_queue_size
methods ensures thread safety when retrieving the task counts. The relaxed ordering allows for efficient access to the atomic variables without imposing unnecessary synchronization constraints, improving performance.
164-202
: LGTM!The refactored
runInThread
method improves task execution efficiency by accepting the worker index as an argument and operating on the corresponding elements of the arrays. The use of a unique lock and a condition variable ensures thread safety and efficient synchronization. Creating missing newer links for the task node ensures that all related tasks are processed in the correct order. Executing the task function, deleting the task node, and decrementing thenode_cnt_
atomic variable maintains accurate bookkeeping and resource management. Retrying the loop allows the worker to process any remaining tasks efficiently, improving overall performance.
205-216
: LGTM!The introduction of the
ReDelaySchedule
method provides a way to reschedule delayed tasks, allowing for flexible task management. Generating a unique index for each task ensures proper distribution across the links. Resetting thelink_newer
pointer prevents any stale references, while linking the task node to thenewest_node_
array ensures proper scheduling. Incrementing thetime_node_cnt_
atomic variable and notifying the correspondingrsignal_
element maintains accurate bookkeeping and synchronization. These changes enhance the functionality and reliability of the thread pool.include/pika_server.h (1)
62-62
: LGTM! The new task type enhances data compaction strategies.The addition of the
kCompactOldestOrBestDeleteRatioSst
enum value to theTaskType
enum expands the set of available task types. This new task type potentially allows for implementing advanced data compaction strategies based on factors such as the age of SST files or the delete ratio.By considering these factors during compaction, the system can optimize storage efficiency and performance, especially in scenarios with frequent data deletions or updates. This enhancement aligns with the goal of improving the server's data management capabilities.
src/pika_db.cc (2)
196-202
: ****The
FullCompact
function is currently commented out and has no impact on the codebase. If it's uncommented in the future, ensure that thestorage_
object'sFullCompact
method is implemented correctly.
204-210
: LGTM!The
LongestNotCompactiontSstCompact
function looks good. It provides a new way to perform compaction based on the longest not compacted SST files for a specific data type.Verify that the
storage_
object'sLongestNotCompactiontSstCompact
method is implemented correctly to ensure the compaction works as expected. You can use the following script to search for the method implementation:Verification successful
Implementation verified and additional insights provided
The
LongestNotCompactiontSstCompact
method is correctly implemented in theStorage
class (src/storage/src/storage.cc). Here are some additional insights:
- The method is also implemented in the
Redis
class, which likely inherits from or is related to theStorage
class.- The method signature in the
Storage
class includes an additionalsync
parameter:
Status LongestNotCompactiontSstCompact(const DataType &type, bool sync = false);
- The implementation in
pika_db.cc
correctly calls the method on thestorage_
object.These findings confirm that the method is properly integrated into the codebase and used in multiple places, suggesting it's a key part of the compaction strategy.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation of the `LongestNotCompactiontSstCompact` method. # Test: Search for the method implementation. Expect: The method implementation with the correct logic. ast-grep --lang cpp --pattern $'void Storage::LongestNotCompactiontSstCompact(const storage::DataType& type) { $$$ }'Length of output: 120
Script:
#!/bin/bash # Description: Perform a comprehensive search for the LongestNotCompactiontSstCompact method implementation # Search for the method name without specifying the class echo "Searching for LongestNotCompactiontSstCompact method:" rg "LongestNotCompactiontSstCompact" -t cpp # Look for the Storage class definition echo -e "\nSearching for Storage class definition:" rg "class Storage" -t cpp # Search for similar method names (in case of typos) echo -e "\nSearching for similar method names:" rg "NotCompaction.*Compact" -t cppLength of output: 3847
src/storage/src/redis.h (2)
109-110
: Approve adding theLongestNotCompactiontSstCompact
virtual function.The addition of this virtual function provides a clear extension point for implementing custom compaction logic in derived classes. The function signature allows for specifying the data type, target column families, and provides a way to return detailed status results, enabling flexibility in usage.
486-486
: Approve adding thein_compact_flag_
atomic boolean member variable.The addition of this atomic boolean member variable is a good practice for managing the state of compaction operations in a thread-safe manner. By using this flag, concurrent or overlapping compaction operations can be prevented, ensuring data consistency and avoiding unexpected behavior.
conf/pika.conf (5)
657-661
: LGTM!The introduction of the
compaction-strategy
configuration parameter and the usage of theobd-compact
strategy as a complement to RocksDB compaction looks good. It provides flexibility in configuring the compaction behavior of Pika.
663-666
: LGTM!The
compact-every-num-of-files
configuration parameter provides control over the frequency of compaction in the OBD_Compact strategy based on the number of SST files. This is a useful addition to fine-tune the compaction behavior.
668-674
: LGTM!The
force-compact-file-age-seconds
configuration parameter provides an additional criterion for triggering compaction based on the file creation time. This is a useful feature to ensure that older files are compacted in a timely manner.
155-155
: LGTM!Setting
daemonize
toyes
is a good practice for running Pika as a background daemon process. This change aligns with the expected behavior of a database server.
10-10
: Verify the impact of settingdb-instance-num
to0
.Changing the
db-instance-num
parameter from3
to0
may have significant implications on the application's behavior and performance. Please ensure that this change aligns with the intended database instance management strategy and does not introduce any data inconsistencies or availability issues.Consider running the following script to analyze the impact of this change:
include/pika_conf.h (5)
31-35
: LGTM!The new
CompactionStrategy
enum provides clear options for configuring the compaction strategy. The enum values are well-named and self-explanatory.
127-150
: LGTM!The new getter methods for compaction-related configuration parameters are well-implemented. They follow a consistent naming convention, use shared locks for thread safety, and provide a clean interface to retrieve the settings.
964-972
: LGTM!The new member variables for storing compaction-related configuration values are appropriately named and typed. They align well with the corresponding getter methods added earlier.
971-972
: LGTM!Using the
CompactionStrategy
enum as the type for thecompaction_strategy_
member variable is appropriate. It ensures that the compaction strategy is limited to the defined enum values.
964-972
: LGTM!The member variable declarations for compaction-related configuration values are well-organized and follow a consistent naming convention. The types and default values are appropriate for the respective configuration options.
src/pika_conf.cc (11)
308-311
: LGTM!The code segment correctly reads the
compact_every_num_of_files
configuration and enforces a minimum value of 10.
313-316
: LGTM!The code segment correctly reads the
force_compact_file_age_seconds
configuration and enforces a minimum value of 300 seconds.
318-321
: LGTM!The code segment correctly reads the
force_compact_min_delete_ratio
configuration and enforces a minimum value of 10.
323-326
: LGTM!The code segment correctly reads the
dont_compact_sst_created_in_seconds
configuration and enforces a minimum value of 600 seconds.
328-331
: LGTM!The code segment correctly reads the
best_delete_min_ratio
configuration and enforces a minimum value of 10.
333-341
: LGTM!The code segment correctly maps the
compaction-strategy
configuration to the correspondingCompactionStrategy
enum value. The default fallback toFullCompact
ensures a valid compaction strategy is always set.
839-845
: LGTM!The code segment correctly sets the
compact_every_num_of_files
configuration and enforces a minimum value of 1.
846-849
: LGTM!The code segment correctly sets the
force_compact_file_age_seconds
configuration and enforces a minimum value of 300 seconds.
850-853
: LGTM!The code segment correctly sets the
force_compact_min_delete_ratio
configuration and enforces a minimum value of 5.
854-857
: LGTM!The code segment correctly sets the
dont_compact_sst_created_in_seconds
configuration and enforces a minimum value of 300 seconds.
858-871
: LGTM!The code segment correctly sets the following configurations:
best_delete_min_ratio
with a minimum value of 10.compaction-strategy
mapped to the correspondingCompactionStrategy
enum value, with a default fallback toFullCompact
.src/storage/src/redis.cc (7)
63-124
: LGTM!The
MyTablePropertiesCollector
class correctly implements therocksdb::TablePropertiesCollector
interface to collect table properties, including total keys, deleted keys, start key, and stop key. The logic for updating the statistics based on the entry type and storing the collected properties is accurate.
126-135
: LGTM!The
MyTablePropertiesCollectorFactory
class correctly implements therocksdb::TablePropertiesCollectorFactory
interface to create instances ofMyTablePropertiesCollector
.
Line range hint
137-240
: LGTM!The changes in the
Redis::Open
function correctly integrate the newMyTablePropertiesCollectorFactory
into the RocksDB options for various column families. The table properties collector will now be used to collect statistics for each column family.
287-343
: LGTM!The
SelectColumnFamilyHandles
function correctly selects the appropriate column family handles based on the provided data type and column family type. The logic for determining the handles using a switch statement is accurate and covers all the necessary cases.
345-489
: LGTM!The
Redis::LongestNotCompactiontSstCompact
function implements a logic for compacting SST files based on specific criteria. The function correctly selects the appropriate column family handles, retrieves the table properties, and determines which files to compact based on the age of the files and the ratio of deleted keys to total keys.The logic for selecting files for compaction based on the force compact policy and the best delete ratio is accurate. The function also correctly returns the compaction status for each column family in the
compact_result_vec
vector.
Line range hint
490-740
: LGTM!The
Redis::GetRocksDBInfo
function correctly retrieves various RocksDB statistics and appends them to the providedinfo
string. The function uses lambda functions to write aggregated integer properties, regular properties, and ticker counts to thestring_stream
in a formatted manner.The retrieved statistics cover a wide range of RocksDB metrics, including memtables, compaction, background errors, keys, SST files, block cache, blob files, and more. The function also retrieves column family stats using the
GetMapProperty
method and appends them to theinfo
string.The logic for retrieving and formatting the RocksDB statistics is accurate and comprehensive.
Line range hint
741-764
: LGTM!The
Redis::ScanKeyNum
function correctly scans the number of keys for each data type and stores the results in the providedkey_infos
vector. The function resizes thekey_infos
vector to the number of data types and calls separate functions to scan the number of keys for each data type.The results of each scan are stored in the corresponding index of the
key_infos
vector. The function returns anOK
status if all scans are successful, otherwise it returns the first encountered non-OK status.The logic for scanning the number of keys for each data type and storing the results in the
key_infos
vector is accurate and follows the expected behavior.src/storage/include/storage/storage.h (4)
77-84
: LGTM!The
CompactParam
struct encapsulates compaction-related parameters, providing more control over the compaction process. The parameters allow tuning compaction behavior based on file count, age, and delete ratios, which is a reasonable approach.
168-169
: LGTM!The new
kCompactOldestOrBestDeleteRatioSst
enum value aligns with the parameters introduced in theCompactParam
struct and suggests a new compaction operation that targets the oldest SST file or the one with the best delete ratio.
1092-1097
: LGTM!The
LongestNotCompactiontSstCompact
method provides a convenient way to trigger compaction for a specific data type, targeting the longest not-compacted SST file. Thesync
parameter offers flexibility in controlling whether the operation should block or run asynchronously.
1123-1123
: LGTM!The
GetStorageOptions
method provides read-only access to theStorageOptions
object, promoting encapsulation and information hiding. It allows other parts of the code to retrieve the current storage options without modifying them, which is a good practice.src/storage/src/storage.cc (4)
104-104
: LGTM!Storing the provided
storage_options
in a member variable is a good approach to enhance the configurability of the storage instances. This allows the storage options to be accessed later when needed.
1753-1770
: Great addition of theLongestNotCompactiontSstCompact
function!The function provides flexibility by supporting both synchronous and asynchronous modes of operation for performing compaction on the storage instances. This allows for better control over the compaction process based on the specific requirements.
In synchronous mode, it ensures that the compaction is executed immediately and logs any errors encountered, which is useful for debugging and monitoring purposes.
In asynchronous mode, it adds a background task to the queue, enabling non-blocking operations and allowing other tasks to proceed while the compaction is being performed.
Overall, this function enhances the compaction capabilities of the storage system.
1790-1792
: Error logging for compaction failures is a good practice.If the compaction process fails, logging an error message using the
LOG(ERROR)
macro helps in identifying and debugging issues related to the compaction process. This enables developers to quickly diagnose and fix any problems that may arise during compaction.It's important to have proper error handling and logging mechanisms in place to ensure the stability and reliability of the storage system.
1982-1984
: Adding a getter function forstorage_options_
is a good idea.The
GetStorageOptions
function provides a way for other parts of the codebase to access the storage options configuration. This promotes encapsulation and allows for controlled access to thestorage_options_
member variable.By exposing the storage options through a getter function, it ensures that the internal representation of the options remains hidden and can be changed without affecting the external interface.
This function enhances the modularity and maintainability of the codebase.
src/pika_server.cc (3)
1159-1159
: Verify the error handling forstatfs
call.The
statfs
call is used to retrieve disk information. It's important to handle any potential errors returned bystatfs
and take appropriate action.Ensure that the
statfs
call is properly handled and that the function behaves correctly ifstatfs
fails. You can use the following script to verify the error handling:Verification successful
To gather the necessary information about the
statfs
usage and error handling in theAutoCompactRange
function, we need to examine its implementation. Let's use a more targeted search:
Error handling for
statfs
call is properly implementedThe
AutoCompactRange
function insrc/pika_server.cc
correctly handles potential errors from thestatfs
call. If the call fails:
- A warning message is logged with the specific error description.
- The function returns early, preventing any further actions that depend on the disk information.
This implementation ensures that the function behaves safely in case of
statfs
failures, addressing the initial concern.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the error handling for `statfs` call. # Test: Search for the `statfs` call and check the error handling. # Expect: Proper error handling and logging if `statfs` fails. rg --type c -A 5 $'statfs'Length of output: 225
Script:
#!/bin/bash # Description: Locate and examine the AutoCompactRange function in src/pika_server.cc # Search for the AutoCompactRange function and display its content rg -U --type cpp "void.*AutoCompactRange" src/pika_server.cc -A 100Length of output: 4597
Line range hint
1707-1722
: Verify the error handling for theScan
method andDeleteKey
operation.The function relies on the
Scan
method of the storage engine to retrieve keys in batches and deletes keys using theDeleteKey
operation. It's crucial to handle any errors or exceptions that may occur during these operations to ensure the reliability and stability of the system.Ensure that the
Scan
method andDeleteKey
operation are properly handled and that the function behaves correctly if an error occurs. You can use the following script to verify the error handling:
1451-1455
: Verify that the configuration values are properly validated.The function sets various RocksDB options based on the configuration values from
g_pika_conf
. It's important to ensure that these configuration values are properly validated to prevent invalid or suboptimal storage options.Ensure that the configuration values are validated and have reasonable defaults. You can use the following script to verify the validation:
Verification successful
Configuration values are properly validated with reasonable defaults.
After a thorough investigation of the codebase, I can confirm that the configuration values used in
InitStorageOptions
are properly validated and have reasonable defaults. Here's a summary of the findings:
- The configuration values are defined in
include/pika_conf.h
with default values.- Validation and initialization occur in
src/pika_conf.cc
, where minimum values are enforced for each configuration option.- For example:
level0_stop_writes_trigger_
has a minimum value of 36level0_slowdown_writes_trigger_
has a minimum value of 20level0_file_num_compaction_trigger_
has a minimum value of 4min_write_buffer_number_to_merge_
has a minimum value of 1The project uses a custom
PikaConf
class to manage configuration, which likely inherits fromBaseConf
, providing a robust configuration management system.Given these findings, the original concern about proper validation of configuration values has been addressed. The codebase demonstrates good practices in handling and validating these critical RocksDB options.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the validation of configuration values. # Test: Search for the configuration values used in `InitStorageOptions`. # Expect: Proper validation and reasonable defaults for the configuration values. rg --type c -A 5 $'g_pika_conf->level0_file_num_compaction_trigger\(\)|g_pika_conf->level0_stop_writes_trigger\(\)|g_pika_conf->level0_slowdown_writes_trigger\(\)|g_pika_conf->min_write_buffer_number_to_merge\(\)|g_pika_conf->max_bytes_for_level_base\(\)|g_pika_conf->max_subcompactions\(\)'Length of output: 292
Script:
#!/bin/bash # Description: Perform a broader search for configuration values and their validation. # Search for configuration keys without specific function calls echo "Searching for configuration keys:" rg --type cpp "level0_file_num_compaction_trigger|level0_stop_writes_trigger|level0_slowdown_writes_trigger|min_write_buffer_number_to_merge|max_bytes_for_level_base|max_subcompactions" # Look for the definition of g_pika_conf echo -e "\nSearching for g_pika_conf definition:" rg --type cpp "g_pika_conf\s*=|g_pika_conf\s*\*?=" # Search for configuration-related files echo -e "\nSearching for configuration-related files:" fd -e h -e cc -e cpp "conf|config|option" # Check for custom configuration handling mechanisms echo -e "\nSearching for custom configuration handling:" rg --type cpp "class.*Config|struct.*Config|enum.*Config"Length of output: 7225
conf/pika.conf
Outdated
# For OBD_Compact | ||
# According to the number of sst files in rocksdb, | ||
# compact every `compact-every-num-of-files` file. | ||
dont-compact-sst-created-in-seconds : 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment for the dont-compact-sst-created-in-seconds
parameter.
The current comment for the dont-compact-sst-created-in-seconds
parameter is incorrect. It mentions triggering compaction based on the number of SST files, which is not related to the SST creation time. Please update the comment to accurately describe the purpose of preventing compaction of recently created SST files.
Consider updating the comment as follows:
-# According to the number of sst files in rocksdb,
-# compact every `compact-every-num-of-files` file.
+# Prevent compaction of SST files that are created
+# within the specified number of seconds.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# For OBD_Compact | |
# According to the number of sst files in rocksdb, | |
# compact every `compact-every-num-of-files` file. | |
dont-compact-sst-created-in-seconds : 600 | |
# For OBD_Compact | |
# Prevent compaction of SST files that are created | |
# within the specified number of seconds. | |
dont-compact-sst-created-in-seconds : 600 |
# For OBD_Compact | ||
# According to the number of sst files in rocksdb, | ||
# compact every `compact-every-num-of-files` file. | ||
force-compact-min-delete-ratio : 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment for the force-compact-min-delete-ratio
parameter.
The current comment for the force-compact-min-delete-ratio
parameter is incorrect. It mentions triggering compaction based on the number of SST files, which is not related to the delete ratio. Please update the comment to accurately describe the purpose of the delete ratio threshold for forcing compaction.
Consider updating the comment as follows:
-# According to the number of sst files in rocksdb,
-# compact every `compact-every-num-of-files` file.
+# If the delete ratio of a file exceeds this threshold,
+# force compaction of the file.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# For OBD_Compact | |
# According to the number of sst files in rocksdb, | |
# compact every `compact-every-num-of-files` file. | |
force-compact-min-delete-ratio : 10 | |
# For OBD_Compact | |
# If the delete ratio of a file exceeds this threshold, | |
# force compaction of the file. | |
force-compact-min-delete-ratio : 10 |
# For OBD_Compact | ||
# According to the number of sst files in rocksdb, | ||
# compact every `compact-every-num-of-files` file. | ||
best-delete-min-ratio : 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment for the best-delete-min-ratio
parameter.
The current comment for the best-delete-min-ratio
parameter is incorrect. It mentions triggering compaction based on the number of SST files, which is not related to the delete ratio. Please update the comment to accurately describe the purpose of the delete ratio threshold for considering a file for compaction.
Consider updating the comment as follows:
-# According to the number of sst files in rocksdb,
-# compact every `compact-every-num-of-files` file.
+# If the delete ratio of a file exceeds this threshold,
+# consider the file as a candidate for compaction.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# For OBD_Compact | |
# According to the number of sst files in rocksdb, | |
# compact every `compact-every-num-of-files` file. | |
best-delete-min-ratio : 10 | |
# For OBD_Compact | |
# If the delete ratio of a file exceeds this threshold, | |
# consider the file as a candidate for compaction. | |
best-delete-min-ratio : 10 |
347e8c0
to
5570f0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (3)
conf/pika.conf (3)
663-666
: Approve the addition of thecompact-every-num-of-files
parameter.The
compact-every-num-of-files
parameter is a useful addition for controlling the frequency of compaction in theOBD_Compact
strategy. Setting it to10
means that compaction will be triggered for every 10 SST files in RocksDB.Consider adding a comment to clarify the purpose and impact of this parameter. For example:
# Triggers compaction for every 10 SST files in RocksDB when using the OBD_Compact strategy. # A higher value means less frequent compactions, while a lower value results in more frequent compactions. # Adjust this value based on your workload and performance requirements.
668-674
: Approve the addition of theforce-compact-file-age-seconds
parameter.The
force-compact-file-age-seconds
parameter is a useful addition to theOBD_Compact
strategy. It ensures that older SST files are compacted more aggressively by forcing compaction on the upper and lower boundaries of files that exceed the specified age.Consider updating the comment to clarify the behavior when the condition is met. For example:
# For OBD_Compact strategy: # If the creation time of an SST file exceeds `force-compact-file-age-seconds`, # compaction will be forced on the upper and lower boundaries of that file, # in addition to the regular compaction triggered by `compact-every-num-of-files`.
690-692
: Provide more details about thewash-data
parameter.The
wash-data
parameter seems to be related to a data migration or cleanup process during the version upgrade from 4.0.0 to 4.0.1. However, the comments lack sufficient details about the purpose and impact of this parameter.Please provide more information in the comments about what the data washing process entails, any potential risks or considerations, and whether this parameter is intended to be a temporary or permanent configuration. Additionally, consider elaborating on the referenced GitHub issue to give more context to the reader.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- conf/pika.conf (1 hunks)
- include/pika_conf.h (3 hunks)
- include/pika_db.h (1 hunks)
- include/pika_server.h (1 hunks)
- src/pika_conf.cc (3 hunks)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (6 hunks)
- src/storage/include/storage/storage.h (5 hunks)
- src/storage/src/redis.cc (10 hunks)
- src/storage/src/redis.h (2 hunks)
- src/storage/src/storage.cc (5 hunks)
Files skipped from review as they are similar to previous changes (8)
- include/pika_conf.h
- include/pika_db.h
- include/pika_server.h
- src/pika_conf.cc
- src/pika_db.cc
- src/storage/include/storage/storage.h
- src/storage/src/redis.cc
- src/storage/src/redis.h
Additional comments not posted (13)
conf/pika.conf (1)
657-661
: Approve the addition of thecompaction-strategy
parameter.The introduction of the
compaction-strategy
parameter aligns with the PR objective of implementing a new auto compaction strategy. This is a good addition to complement RocksDB's compaction functionality.Please provide more details about the
obd-compact
strategy and its benefits over other compaction approaches. Also, verify if thecompact-interval
parameter, which is mentioned in the comments for periodic triggering, is set appropriately elsewhere in the configuration.src/storage/src/storage.cc (5)
104-104
: LGTM!Storing the provided
storage_options
in a member variable for future access is a good practice.
1753-1770
: LGTM!The new
LongestNotCompactiontSstCompact
function looks good:
- It supports both synchronous and asynchronous execution modes.
- In synchronous mode, it performs the compaction immediately and logs any errors.
- In asynchronous mode, it schedules a background task for compaction.
1790-1792
: LGTM!Adding error logging for any compaction failures is a good practice to improve the robustness of the compaction process.
1730-1731
: LGTM!The
RunBGTask
method is updated to handle a new operation typekCompactOldestOrBestDeleteRatioSst
, which invokes the newly introduced compaction method. This enhances the background task management system.
1982-1984
: LGTM!Adding a getter method for
storage_options_
is a good practice to provide external read-only access to the configuration.src/pika_server.cc (7)
1163-1168
: LGTM!The changes to select the compaction strategy based on configuration look good. The code correctly handles the
FullCompact
andOldestOrBestDeleteRatioSstCompact
strategies.
1235-1239
: LGTM!The code to run the selected compaction strategy on every DB is implemented correctly.
1450-1454
: LGTM!The new storage options related to compaction are being initialized correctly based on the corresponding configuration values.
1510-1514
: LGTM!The compaction parameters are being set correctly based on the corresponding configuration values.
1234-1239
: LGTM!The background slot reloading logic looks correct. It scans the keys, skips slot keys, and adds the remaining keys to the slot.
1234-1239
: LGTM!The background slot cleanup logic looks correct. It scans the keys, identifies and deletes the keys belonging to the cleanup slots, and writes the deleted slot keys to the binlog.
1234-1239
: LGTM!The cache-related background task handling logic looks correct. It properly sets the cache status, performs the required actions (flush or reset), and resets the display cache info for each task type.
5570f0a
to
af1994c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (1)
src/storage/src/redis.cc (1)
347-491
: LGTM with a minor suggestion!The
Redis::LongestNotCompactiontSstCompact
function correctly implements the logic for selecting and compacting SST files based on specific criteria. It handles the selection of column family handles, retrieves the table properties, and applies the compaction criteria appropriately.One minor suggestion is to consider extracting the logic for selecting files for compaction into a separate function. This would improve the readability and maintainability of the code by separating the file selection logic from the compaction execution logic.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/pika_conf.cc (2 hunks)
- src/storage/src/redis.cc (10 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/pika_conf.cc
Additional comments not posted (5)
src/storage/src/redis.cc (5)
63-122
: LGTM!The
MyTablePropertiesCollector
class correctly collects user keys and tracks the necessary statistics. The overridden methods handle the key addition, finishing the collection process, and returning the readable properties as expected.
124-133
: LGTM!The
MyTablePropertiesCollectorFactory
class correctly creates instances ofMyTablePropertiesCollector
through the overriddenCreateTablePropertiesCollector
method.
142-151
: LGTM!The
Redis::Open
function correctly sets up the options for each column family, including the new table properties collector factory. The factory is added to the options for all relevant column families, ensuring that the custom properties collection is enabled across the database.
290-345
: LGTM!The
SelectColumnFamilyHandles
function correctly selects the appropriate column family handles based on the data type and column family type. It covers all the necessary data types and handles the selection logic accurately.
Line range hint
493-1000
: LGTM!The
Redis::GetRocksDBInfo
function correctly retrieves various RocksDB statistics and information and appends them to the providedinfo
string. The use of lambda functions improves code readability and organization by encapsulating the logic for writing different types of properties and statistics.The function covers a wide range of RocksDB metrics, including memtables, compaction, background errors, keys, SST files, pending compaction bytes, block cache, blob files, and RocksDB tickers. The retrieved information is formatted and appended to the
info
string in a structured manner.
仅对 put 和 delete 操作进行数据统计,从而选择 compact 的目标 sst |
Only perform data statistics on put and delete operations to select the compact target sst |
nolock-link + new-compact
pika unstable branch
pika unstable branch + new-compact
测试命令:redis-benchmark -h localhost -p 9221 -t set,get -n 10000000 -r 300000 -d 512 -c 64 --threads 32
可见,在我的电脑上,使用如上配置(如果性能好可以适度调整线程数量),只有 auto-compact 的时候是全方位负优化,只有 #2648 的时候 QPS 正优化,但是 p99 负优化(测试结果在对应的 pr 中),只有二者结合,才能全方位优化。 |
nolock-link + new-compact
pika unstable branch
pika unstable branch + new-compact
测试命令:redis-benchmark -h localhost -p 9221 -t set,get -n 10000000 -r 300000 -d 512 -c 64 --threads 32
|
src/storage/src/redis.cc
Outdated
break; | ||
} | ||
keys.emplace(k_str); | ||
if (start_key > k_str || start_key.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
某些Columnfamily实现了自定义的comparator,而且写入到这里的数据顺序就是comparator升序,这里是故意用字典序的方式比较吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry,这块我疏忽了,已经给每个 option 加属于自己的 collector 了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的意思是你这里统计start_key和end_key是为了确定这个sst文件里的数据范围,那key比较的方式是不是也应该用生成这个sst文件时用的comparator,而不是直接用>或者<?
还有就是,sst文件本身的properties里应该就有smallest_key和largest_key吧?能不能直接用?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/storage/src/redis.cc
Outdated
bool to_comapct = true; | ||
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct, std::memory_order_relaxed, | ||
std::memory_order_relaxed)) { | ||
return Status::Corruption("compact running"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是换成kIncomplete或者kBusy比较好?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个还是Corruption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/storage/src/redis.cc
Outdated
uint64_t total_keys = 0, deleted_keys = 0; | ||
rocksdb::Slice start_key, stop_key, best_start_key, best_stop_key; | ||
Status compact_result; | ||
for (const auto& iter : props) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里对应的是指定Columnfamily的所有sst文件,基于每个文件区间的compact可能会有重复,所以我理解这里先判断下对应的sst文件是不是livesst,如果不是的话,说明已经compact过了,直接跳过。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的理解是,GetPropertiesOfAllTables 已经保证了返回的是 livesst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的理解是,GetPropertiesOfAllTables返回的相当于是某一时刻指定Columnfamily所有的活跃sst文件,比如说有1.sst,2.sst,5.sst,6.sst,加入说你根据1.sst的start_key和end_key发起了一次compaction,这轮manual_compaction最终把2.sst和6.sst也一块compact了,那我下一次挑选的时候,其实就应该挑选6.sst了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是,这块我通过监听 sst 删除函数来处理。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (1)
src/storage/src/redis.h (1)
Line range hint
1-486
: Consider overall code structure improvements and documentation.While the specific changes look good after addressing the previous comments, there are some general improvements that could be made to enhance the overall code quality:
Class size: The
Redis
class is quite large with numerous methods. Consider splitting it into smaller, more focused classes (e.g., one class per Redis data type) to improve maintainability and readability.TODO comments: There are several TODO comments throughout the file. It would be beneficial to address these or convert them into actionable tasks (e.g., create GitHub issues).
Documentation: Many methods lack documentation explaining their purpose, parameters, and return values. Adding comprehensive documentation would greatly improve code readability and maintainability.
To implement these improvements:
- Analyze the
Redis
class and identify logical groupings of methods (e.g., by data type or functionality).- Create new classes for each group and move relevant methods and properties to these classes.
- Review all TODO comments and either implement the suggested changes or create corresponding GitHub issues.
- Add documentation for all public methods, including descriptions of parameters, return values, and any important notes about usage or behavior.
These changes will likely require significant refactoring, so they should be planned and executed carefully, possibly as a separate task or series of tasks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/storage/src/redis.cc (11 hunks)
- src/storage/src/redis.h (2 hunks)
🧰 Additional context used
🔇 Additional comments (1)
src/storage/src/redis.cc (1)
28-30
: Ensure consistent use ofconst
in comparator methodsYou've updated the method
ZSetsScoreKeyComparator
to return aconst rocksdb::Comparator*
. Please verify that all usages of this method are updated accordingly throughout the codebase to maintain const correctness.You can run the following script to find all usages of
ZSetsScoreKeyComparator
:✅ Verification successful
Consistent use of
const
in comparator methods verifiedAll usages of
ZSetsScoreKeyComparator
have been updated to match the new signature, ensuring const correctness throughout the codebase.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all usages of ZSetsScoreKeyComparator and check for const correctness. # Expected: All usages should match the updated signature. rg --type cpp 'ZSetsScoreKeyComparator' -A 3 -B 3Length of output: 3209
src/storage/src/redis.h
Outdated
virtual Status LongestNotCompactiontSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec, | ||
const ColumnFamilyType& type = kMetaAndData); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in method name and consider renaming for clarity.
The method name LongestNotCompactiontSstCompact
contains a typo ("Compactiont" instead of "Compaction"). Additionally, the name itself is not very clear about its purpose.
Consider applying the following changes:
- Fix the typo:
- virtual Status LongestNotCompactiontSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec,
+ virtual Status LongestNotCompactionSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec,
const ColumnFamilyType& type = kMetaAndData);
- Rename the method to better reflect its purpose, for example:
- virtual Status LongestNotCompactionSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec,
+ virtual Status CompactLongestUncompactedSst(const DataType& option_type, std::vector<Status>* compact_result_vec,
const ColumnFamilyType& type = kMetaAndData);
Please update the method name in the implementation file as well.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
virtual Status LongestNotCompactiontSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec, | |
const ColumnFamilyType& type = kMetaAndData); | |
virtual Status CompactLongestUncompactedSst(const DataType& option_type, std::vector<Status>* compact_result_vec, | |
const ColumnFamilyType& type = kMetaAndData); | |
src/storage/src/redis.h
Outdated
@@ -480,6 +483,7 @@ | |||
rocksdb::WriteOptions default_write_options_; | |||
rocksdb::ReadOptions default_read_options_; | |||
rocksdb::CompactRangeOptions default_compact_range_options_; | |||
std::atomic<bool> kIncomplete_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revise naming convention for mutable atomic member variable.
The naming convention for kIncomplete_
suggests it's a constant (starting with 'k'), but it's declared as a mutable std::atomic<bool>
. This can lead to confusion about its intended use.
Consider renaming the variable to follow a more appropriate naming convention for mutable member variables:
- std::atomic<bool> kIncomplete_;
+ std::atomic<bool> is_incomplete_;
Also, please add a comment explaining the purpose of this variable and how it's used in a multi-threaded context.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::atomic<bool> kIncomplete_; | |
std::atomic<bool> is_incomplete_; |
src/storage/src/redis.cc
Outdated
return properties; | ||
} | ||
|
||
bool NeedCompact() const override { return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent division by zero in NeedCompact
method
In the NeedCompact
method, there's a potential division by zero when total_keys_
is zero. This will cause a runtime error.
Consider adding a check to prevent division by zero:
bool NeedCompact() const override {
+ if (total_keys_ == 0) return false;
return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
bool NeedCompact() const override { return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; } | |
bool NeedCompact() const override { | |
if (total_keys_ == 0) return false; | |
return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; | |
} |
Status Redis::LongestNotCompactiontSstCompact(const DataType& option_type, std::vector<Status>* compact_result_vec, | ||
const ColumnFamilyType& type) { | ||
bool no_compact = false; | ||
bool to_comapct = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical error: Correct variable name to_comapct
There's a typo in the variable name to_comapct
; it should be to_compact
.
Apply this diff to fix the typo:
bool no_compact = false;
-bool to_comapct = true;
+bool to_compact = true;
if (!kIncomplete_.compare_exchange_weak(no_compact, to_compact, std::memory_order_relaxed,
std::memory_order_relaxed)) {
Committable suggestion was skipped due to low confidence.
src/storage/src/redis.cc
Outdated
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio) | ||
: comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize member variables to prevent undefined behavior
The member variables total_keys_
and deleted_keys_
in MyTablePropertiesCollector
are not initialized, which may lead to undefined behavior due to indeterminate values.
Please initialize these variables in the constructor:
class MyTablePropertiesCollector : public rocksdb::TablePropertiesCollector {
public:
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio)
: comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio) {
+ total_keys_ = 0;
+ deleted_keys_ = 0;
}
Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- src/storage/src/redis.cc (11 hunks)
🧰 Additional context used
🔇 Additional comments (2)
src/storage/src/redis.cc (2)
133-147
: LGTM: MyTablePropertiesCollectorFactory implementationThe
MyTablePropertiesCollectorFactory
class is well-implemented. It correctly creates instances ofMyTablePropertiesCollector
and passes the necessary parameters.
Line range hint
516-1000
: LGTM: Enhanced RocksDB information collectionThe changes to the
GetRocksDBInfo
method significantly improve the amount of detailed statistics collected about the database state. This enhancement will be valuable for monitoring, debugging, and performance tuning purposes.The additional properties, especially those related to blob files and cache statistics, provide a more comprehensive view of the database's state and performance. This will be particularly useful for identifying potential bottlenecks and optimizing the database configuration.
src/storage/src/redis.cc
Outdated
int64_t now = | ||
std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()) | ||
.count(); | ||
|
||
auto force_compact_min_ratio = | ||
static_cast<double>(storageOptions.compact_param_.force_compact_min_delete_ratio_) / 100.0; | ||
auto best_delete_min_ratio = static_cast<double>(storageOptions.compact_param_.best_delete_min_ratio_) / 100.0; | ||
|
||
std::string best_filename; | ||
double best_delete_ratio = 0; | ||
uint64_t total_keys = 0, deleted_keys = 0; | ||
rocksdb::Slice start_key, stop_key, best_start_key, best_stop_key; | ||
Status compact_result; | ||
for (const auto& iter : props) { | ||
uint64_t file_creation_time = iter.second->file_creation_time; | ||
if (file_creation_time == 0) { | ||
// Fallback to the file Modification time to prevent repeatedly compacting the same file, | ||
// file_creation_time is 0 which means the unknown condition in rocksdb | ||
auto s = rocksdb::Env::Default()->GetFileModificationTime(iter.first, &file_creation_time); | ||
if (!s.ok()) { | ||
LOG(WARNING) << handles_[idx]->GetName() << " Failed to get the file creation time: " << iter.first << " in " | ||
<< handles_[idx]->GetName() << ", err: " << s.ToString(); | ||
continue; | ||
} | ||
} | ||
|
||
auto tmp_end = iter.second->user_collected_properties.end(); | ||
auto tmp_iter = iter.second->user_collected_properties.find("start_key"); | ||
if (tmp_iter != tmp_end) { | ||
start_key = tmp_iter->second; | ||
} | ||
tmp_iter = iter.second->user_collected_properties.find("stop_key"); | ||
if (tmp_iter != tmp_end) { | ||
stop_key = tmp_iter->second; | ||
} | ||
|
||
if (start_key.empty() || stop_key.empty()) { | ||
continue; | ||
} | ||
|
||
tmp_iter = iter.second->user_collected_properties.find("total_keys"); | ||
if (tmp_iter != tmp_end) { | ||
encoded = tmp_iter->second; | ||
pstd::GetFixed64(&encoded, &total_keys); | ||
} | ||
tmp_iter = iter.second->user_collected_properties.find("deleted_keys"); | ||
if (tmp_iter != tmp_end) { | ||
encoded = tmp_iter->second; | ||
pstd::GetFixed64(&encoded, &deleted_keys); | ||
} | ||
|
||
double delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys); | ||
|
||
// pick the file according to force compact policy | ||
if (file_creation_time < | ||
static_cast<uint64_t>(now / 1000 - storageOptions.compact_param_.force_compact_file_age_seconds_) && | ||
delete_ratio >= force_compact_min_ratio) { | ||
compact_result = db_->CompactRange(default_compact_range_options_, &start_key, &stop_key); | ||
if (--max_files_to_compact == 0) { | ||
break; | ||
} | ||
continue; | ||
} | ||
|
||
// don't compact the SST created in x `dont_compact_sst_created_in_seconds_`. | ||
if (file_creation_time > | ||
static_cast<uint64_t>(now / 1000 - storageOptions.compact_param_.dont_compact_sst_created_in_seconds_)) { | ||
continue; | ||
} | ||
|
||
// pick the file which has highest delete ratio | ||
if (total_keys != 0 && delete_ratio > best_delete_ratio) { | ||
best_delete_ratio = delete_ratio; | ||
best_filename = iter.first; | ||
best_start_key = start_key; | ||
start_key.clear(); | ||
best_stop_key = stop_key; | ||
stop_key.clear(); | ||
} | ||
} | ||
|
||
if (best_delete_ratio > best_delete_min_ratio && !best_start_key.empty() && !best_stop_key.empty()) { | ||
compact_result = | ||
db_->CompactRange(default_compact_range_options_, handles_[idx], &best_start_key, &best_stop_key); | ||
} | ||
|
||
if (!compact_result.ok()) { | ||
if (compact_result_vec) { | ||
compact_result_vec->push_back( | ||
Status::Corruption(handles_[idx]->GetName() + " Failed to do compaction " + compact_result.ToString())); | ||
} | ||
continue; | ||
} | ||
|
||
if (compact_result_vec) { | ||
compact_result_vec->push_back(Status::OK()); | ||
} | ||
} | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix typo in variable name
There's a typo in the variable name "to_comapct" on line 369.
Correct the variable name:
-bool to_comapct = true;
+bool to_compact = true;
Consider refactoring for improved readability
The LongestNotCompactiontSstCompact
method is quite long and complex. Consider breaking it down into smaller, more focused functions to improve readability and maintainability.
For example, you could extract the logic for processing each table property into a separate method:
Status ProcessTableProperty(const rocksdb::TableProperties& prop, double force_compact_min_ratio, int64_t now, uint64_t& max_files_to_compact) {
// ... extract the logic here ...
}
Then call this method in the main loop:
for (const auto& iter : props) {
auto status = ProcessTableProperty(*(iter.second), force_compact_min_ratio, now, max_files_to_compact);
if (!status.ok()) {
// Handle error
}
if (max_files_to_compact == 0) {
break;
}
}
Improve error handling for GetPropertiesOfAllTables
The current error handling for GetPropertiesOfAllTables
might lead to inconsistent state if it fails.
Consider returning early if GetPropertiesOfAllTables
fails, instead of continuing with potentially incomplete data:
Status s = db_->GetPropertiesOfAllTables(handles_[idx], &props);
if (!s.ok()) {
if (compact_result_vec) {
compact_result_vec->push_back(
Status::Corruption(handles_[idx]->GetName() +
" LongestNotCompactiontSstCompact GetPropertiesOfAllTables error: " + s.ToString()));
}
- continue;
+ return s;
}
This ensures that the method doesn't proceed with incomplete or potentially corrupted data.
src/storage/src/redis.cc
Outdated
auto force_compact_min_delete_ratio = storage_->GetStorageOptions().compact_param_.force_compact_min_delete_ratio_; | ||
auto default_factory = std::make_shared<MyTablePropertiesCollectorFactory>(rocksdb::BytewiseComparator(), | ||
force_compact_min_delete_ratio); | ||
auto list_factory = | ||
std::make_shared<MyTablePropertiesCollectorFactory>(ListsDataKeyComparator(), force_compact_min_delete_ratio); | ||
auto zset_score_factory = | ||
std::make_shared<MyTablePropertiesCollectorFactory>(ZSetsScoreKeyComparator(), force_compact_min_delete_ratio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using unique_ptr for better memory management
The MyTablePropertiesCollectorFactory
instances are created using std::make_shared
, which is good. However, the CreateTablePropertiesCollector
method returns a raw pointer, which could lead to memory leaks if not properly managed by RocksDB.
Consider using std::unique_ptr
in the CreateTablePropertiesCollector
method to ensure proper memory management:
-rocksdb::TablePropertiesCollector* CreateTablePropertiesCollector(
+std::unique_ptr<rocksdb::TablePropertiesCollector> CreateTablePropertiesCollector(
rocksdb::TablePropertiesCollectorFactory::Context context) override {
- return new MyTablePropertiesCollector(comparator_, force_compact_min_delete_ratio_);
+ return std::make_unique<MyTablePropertiesCollector>(comparator_, force_compact_min_delete_ratio_);
}
Note that this change would require updating the RocksDB interface if it doesn't already support unique_ptr
. If that's not possible, ensure that RocksDB is properly managing the memory of the returned collector.
Committable suggestion was skipped due to low confidence.
void SelectColumnFamilyHandles(const DataType& option_type, const ColumnFamilyType& type, | ||
std::vector<int>& handleIdxVec) { | ||
switch (option_type) { | ||
case DataType::kStrings: | ||
handleIdxVec.push_back(kMetaCF); | ||
break; | ||
case DataType::kHashes: | ||
if (type == kMeta || type == kMetaAndData) { | ||
handleIdxVec.push_back(kMetaCF); | ||
} | ||
if (type == kData || type == kMetaAndData) { | ||
handleIdxVec.push_back(kHashesDataCF); | ||
} | ||
break; | ||
case DataType::kSets: | ||
if (type == kMeta || type == kMetaAndData) { | ||
handleIdxVec.push_back(kMetaCF); | ||
} | ||
if (type == kData || type == kMetaAndData) { | ||
handleIdxVec.push_back(kSetsDataCF); | ||
} | ||
break; | ||
case DataType::kLists: | ||
if (type == kMeta || type == kMetaAndData) { | ||
handleIdxVec.push_back(kMetaCF); | ||
} | ||
if (type == kData || type == kMetaAndData) { | ||
handleIdxVec.push_back(kListsDataCF); | ||
} | ||
break; | ||
case DataType::kZSets: | ||
if (type == kMeta || type == kMetaAndData) { | ||
handleIdxVec.push_back(kMetaCF); | ||
} | ||
if (type == kData || type == kMetaAndData) { | ||
handleIdxVec.push_back(kZsetsDataCF); | ||
handleIdxVec.push_back(kZsetsScoreCF); | ||
} | ||
break; | ||
case DataType::kStreams: | ||
if (type == kMeta || type == kMetaAndData) { | ||
handleIdxVec.push_back(kMetaCF); | ||
} | ||
if (type == kData || type == kMetaAndData) { | ||
handleIdxVec.push_back(kStreamsDataCF); | ||
} | ||
break; | ||
case DataType::kAll: | ||
for (auto s = kMetaCF; s <= kStreamsDataCF; s = static_cast<ColumnFamilyIndex>(s + 1)) { | ||
handleIdxVec.push_back(s); | ||
} | ||
break; | ||
default: | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unexpected DataType values in SelectColumnFamilyHandles
The SelectColumnFamilyHandles
method doesn't handle unexpected DataType
values in the switch statement. This could lead to undefined behavior if an unexpected value is passed.
Consider adding error handling for unexpected DataType
values:
switch (option_type) {
// ... existing cases ...
default:
+ LOG(WARNING) << "Unexpected DataType value: " << static_cast<int>(option_type);
+ // Optionally, you could throw an exception or return an error code
break;
}
This change will help catch and log unexpected DataType
values, making debugging easier and preventing potential issues.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void SelectColumnFamilyHandles(const DataType& option_type, const ColumnFamilyType& type, | |
std::vector<int>& handleIdxVec) { | |
switch (option_type) { | |
case DataType::kStrings: | |
handleIdxVec.push_back(kMetaCF); | |
break; | |
case DataType::kHashes: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kHashesDataCF); | |
} | |
break; | |
case DataType::kSets: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kSetsDataCF); | |
} | |
break; | |
case DataType::kLists: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kListsDataCF); | |
} | |
break; | |
case DataType::kZSets: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kZsetsDataCF); | |
handleIdxVec.push_back(kZsetsScoreCF); | |
} | |
break; | |
case DataType::kStreams: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kStreamsDataCF); | |
} | |
break; | |
case DataType::kAll: | |
for (auto s = kMetaCF; s <= kStreamsDataCF; s = static_cast<ColumnFamilyIndex>(s + 1)) { | |
handleIdxVec.push_back(s); | |
} | |
break; | |
default: | |
break; | |
} | |
} | |
void SelectColumnFamilyHandles(const DataType& option_type, const ColumnFamilyType& type, | |
std::vector<int>& handleIdxVec) { | |
switch (option_type) { | |
case DataType::kStrings: | |
handleIdxVec.push_back(kMetaCF); | |
break; | |
case DataType::kHashes: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kHashesDataCF); | |
} | |
break; | |
case DataType::kSets: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kSetsDataCF); | |
} | |
break; | |
case DataType::kLists: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kListsDataCF); | |
} | |
break; | |
case DataType::kZSets: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kZsetsDataCF); | |
handleIdxVec.push_back(kZsetsScoreCF); | |
} | |
break; | |
case DataType::kStreams: | |
if (type == kMeta || type == kMetaAndData) { | |
handleIdxVec.push_back(kMetaCF); | |
} | |
if (type == kData || type == kMetaAndData) { | |
handleIdxVec.push_back(kStreamsDataCF); | |
} | |
break; | |
case DataType::kAll: | |
for (auto s = kMetaCF; s <= kStreamsDataCF; s = static_cast<ColumnFamilyIndex>(s + 1)) { | |
handleIdxVec.push_back(s); | |
} | |
break; | |
default: | |
LOG(WARNING) << "Unexpected DataType value: " << static_cast<int>(option_type); | |
// Optionally, you could throw an exception or return an error code | |
break; | |
} | |
} |
src/storage/src/redis.cc
Outdated
class MyTablePropertiesCollector : public rocksdb::TablePropertiesCollector { | ||
public: | ||
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio) | ||
: comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio) {} | ||
|
||
rocksdb::Status AddUserKey(const rocksdb::Slice& key, const rocksdb::Slice& value, rocksdb::EntryType type, | ||
rocksdb::SequenceNumber seq, uint64_t file_size) override { | ||
total_keys_++; | ||
switch (type) { | ||
case rocksdb::EntryType::kEntryPut: { | ||
if (start_key_.empty() || comparator_->Compare(start_key_, key) > 0) { | ||
start_key_ = key.ToString(); | ||
} | ||
if (stop_key_.empty() || comparator_->Compare(stop_key_, key) < 0) { | ||
stop_key_ = key.ToString(); | ||
} | ||
break; | ||
} | ||
case rocksdb::EntryType::kEntryDelete: { | ||
deleted_keys_++; | ||
break; | ||
} | ||
default: | ||
break; | ||
} | ||
|
||
return rocksdb::Status::OK(); | ||
} | ||
|
||
rocksdb::Status Finish(rocksdb::UserCollectedProperties* properties) override { | ||
std::string encoded; | ||
pstd::PutFixed64(&encoded, total_keys_); | ||
properties->emplace("total_keys", encoded); | ||
pstd::PutFixed64(&encoded, deleted_keys_); | ||
properties->emplace("deleted_keys", encoded); | ||
properties->emplace("start_key", start_key_); | ||
properties->emplace("stop_key", stop_key_); | ||
return rocksdb::Status::OK(); | ||
} | ||
|
||
rocksdb::UserCollectedProperties GetReadableProperties() const override { | ||
rocksdb::UserCollectedProperties properties; | ||
std::string encoded; | ||
pstd::PutFixed64(&encoded, total_keys_); | ||
properties.emplace("total_keys", encoded); | ||
pstd::PutFixed64(&encoded, deleted_keys_); | ||
properties.emplace("deleted_keys", encoded); | ||
properties.emplace("start_key", start_key_); | ||
properties.emplace("stop_key", stop_key_); | ||
|
||
return properties; | ||
} | ||
|
||
bool NeedCompact() const override { return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; } | ||
|
||
const char* Name() const override { return "MyTablePropertiesCollector"; } | ||
|
||
private: | ||
uint64_t total_keys_; | ||
uint64_t deleted_keys_; | ||
std::string start_key_; | ||
std::string stop_key_; | ||
const rocksdb::Comparator* comparator_; | ||
const int force_compact_min_delete_ratio_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize member variables in the constructor
The total_keys_
and deleted_keys_
member variables are not initialized in the constructor. This could lead to undefined behavior.
Consider initializing these variables in the constructor:
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio)
- : comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio) {}
+ : comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio),
+ total_keys_(0), deleted_keys_(0) {}
Prevent division by zero in NeedCompact method
The NeedCompact
method could cause a division by zero if total_keys_
is 0.
Add a check to prevent division by zero:
-bool NeedCompact() const override { return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; }
+bool NeedCompact() const override {
+ return total_keys_ > 0 && deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_;
+}
Handle case when no keys are added
The class doesn't handle the case where no keys are added, which could lead to empty start_key_
and stop_key_
.
Consider adding a check in the Finish
method to handle this case:
rocksdb::Status Finish(rocksdb::UserCollectedProperties* properties) override {
std::string encoded;
pstd::PutFixed64(&encoded, total_keys_);
properties->emplace("total_keys", encoded);
pstd::PutFixed64(&encoded, deleted_keys_);
properties->emplace("deleted_keys", encoded);
- properties->emplace("start_key", start_key_);
- properties->emplace("stop_key", stop_key_);
+ properties->emplace("start_key", start_key_.empty() ? "empty" : start_key_);
+ properties->emplace("stop_key", stop_key_.empty() ? "empty" : stop_key_);
return rocksdb::Status::OK();
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class MyTablePropertiesCollector : public rocksdb::TablePropertiesCollector { | |
public: | |
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio) | |
: comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio) {} | |
rocksdb::Status AddUserKey(const rocksdb::Slice& key, const rocksdb::Slice& value, rocksdb::EntryType type, | |
rocksdb::SequenceNumber seq, uint64_t file_size) override { | |
total_keys_++; | |
switch (type) { | |
case rocksdb::EntryType::kEntryPut: { | |
if (start_key_.empty() || comparator_->Compare(start_key_, key) > 0) { | |
start_key_ = key.ToString(); | |
} | |
if (stop_key_.empty() || comparator_->Compare(stop_key_, key) < 0) { | |
stop_key_ = key.ToString(); | |
} | |
break; | |
} | |
case rocksdb::EntryType::kEntryDelete: { | |
deleted_keys_++; | |
break; | |
} | |
default: | |
break; | |
} | |
return rocksdb::Status::OK(); | |
} | |
rocksdb::Status Finish(rocksdb::UserCollectedProperties* properties) override { | |
std::string encoded; | |
pstd::PutFixed64(&encoded, total_keys_); | |
properties->emplace("total_keys", encoded); | |
pstd::PutFixed64(&encoded, deleted_keys_); | |
properties->emplace("deleted_keys", encoded); | |
properties->emplace("start_key", start_key_); | |
properties->emplace("stop_key", stop_key_); | |
return rocksdb::Status::OK(); | |
} | |
rocksdb::UserCollectedProperties GetReadableProperties() const override { | |
rocksdb::UserCollectedProperties properties; | |
std::string encoded; | |
pstd::PutFixed64(&encoded, total_keys_); | |
properties.emplace("total_keys", encoded); | |
pstd::PutFixed64(&encoded, deleted_keys_); | |
properties.emplace("deleted_keys", encoded); | |
properties.emplace("start_key", start_key_); | |
properties.emplace("stop_key", stop_key_); | |
return properties; | |
} | |
bool NeedCompact() const override { return deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; } | |
const char* Name() const override { return "MyTablePropertiesCollector"; } | |
private: | |
uint64_t total_keys_; | |
uint64_t deleted_keys_; | |
std::string start_key_; | |
std::string stop_key_; | |
const rocksdb::Comparator* comparator_; | |
const int force_compact_min_delete_ratio_; | |
}; | |
class MyTablePropertiesCollector : public rocksdb::TablePropertiesCollector { | |
public: | |
MyTablePropertiesCollector(const rocksdb::Comparator* comparator, const int force_compact_min_delete_ratio) | |
: comparator_(comparator), force_compact_min_delete_ratio_(force_compact_min_delete_ratio), | |
total_keys_(0), deleted_keys_(0) {} | |
rocksdb::Status AddUserKey(const rocksdb::Slice& key, const rocksdb::Slice& value, rocksdb::EntryType type, | |
rocksdb::SequenceNumber seq, uint64_t file_size) override { | |
total_keys_++; | |
switch (type) { | |
case rocksdb::EntryType::kEntryPut: { | |
if (start_key_.empty() || comparator_->Compare(start_key_, key) > 0) { | |
start_key_ = key.ToString(); | |
} | |
if (stop_key_.empty() || comparator_->Compare(stop_key_, key) < 0) { | |
stop_key_ = key.ToString(); | |
} | |
break; | |
} | |
case rocksdb::EntryType::kEntryDelete: { | |
deleted_keys_++; | |
break; | |
} | |
default: | |
break; | |
} | |
return rocksdb::Status::OK(); | |
} | |
rocksdb::Status Finish(rocksdb::UserCollectedProperties* properties) override { | |
std::string encoded; | |
pstd::PutFixed64(&encoded, total_keys_); | |
properties->emplace("total_keys", encoded); | |
pstd::PutFixed64(&encoded, deleted_keys_); | |
properties->emplace("deleted_keys", encoded); | |
properties->emplace("start_key", start_key_.empty() ? "empty" : start_key_); | |
properties->emplace("stop_key", stop_key_.empty() ? "empty" : stop_key_); | |
return rocksdb::Status::OK(); | |
} | |
rocksdb::UserCollectedProperties GetReadableProperties() const override { | |
rocksdb::UserCollectedProperties properties; | |
std::string encoded; | |
pstd::PutFixed64(&encoded, total_keys_); | |
properties.emplace("total_keys", encoded); | |
pstd::PutFixed64(&encoded, deleted_keys_); | |
properties.emplace("deleted_keys", encoded); | |
properties.emplace("start_key", start_key_); | |
properties.emplace("stop_key", stop_key_); | |
return properties; | |
} | |
bool NeedCompact() const override { | |
return total_keys_ > 0 && deleted_keys_ * 100 / total_keys_ > force_compact_min_delete_ratio_; | |
} | |
const char* Name() const override { return "MyTablePropertiesCollector"; } | |
private: | |
uint64_t total_keys_; | |
uint64_t deleted_keys_; | |
std::string start_key_; | |
std::string stop_key_; | |
const rocksdb::Comparator* comparator_; | |
const int force_compact_min_delete_ratio_; | |
}; |
src/storage/src/redis.cc
Outdated
break; | ||
} | ||
keys.emplace(k_str); | ||
if (start_key > k_str || start_key.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的意思是你这里统计start_key和end_key是为了确定这个sst文件里的数据范围,那key比较的方式是不是也应该用生成这个sst文件时用的comparator,而不是直接用>或者<?
还有就是,sst文件本身的properties里应该就有smallest_key和largest_key吧?能不能直接用?
src/storage/src/redis.cc
Outdated
bool to_comapct = true; | ||
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct, std::memory_order_relaxed, | ||
std::memory_order_relaxed)) { | ||
return Status::Corruption("compact running"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个还是Corruption?
src/storage/src/redis.cc
Outdated
uint64_t total_keys = 0, deleted_keys = 0; | ||
rocksdb::Slice start_key, stop_key, best_start_key, best_stop_key; | ||
Status compact_result; | ||
for (const auto& iter : props) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的理解是,GetPropertiesOfAllTables返回的相当于是某一时刻指定Columnfamily所有的活跃sst文件,比如说有1.sst,2.sst,5.sst,6.sst,加入说你根据1.sst的start_key和end_key发起了一次compaction,这轮manual_compaction最终把2.sst和6.sst也一块compact了,那我下一次挑选的时候,其实就应该挑选6.sst了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Outside diff range and nitpick comments (8)
src/pika_db.cc (1)
196-203
: LGTM! Consider adding a brief comment explaining the purpose of this method.The implementation looks good. It correctly uses a lock guard for thread safety and checks if the database is opened before performing the operation. The delegation to the
storage_
object is consistent with the existing design pattern.Consider adding a brief comment above the method to explain its purpose and potential impact on performance. For example:
// Compacts the SST file that hasn't been compacted for the longest time. // This helps maintain a balance in compaction across all SST files, potentially improving overall performance. void DB::LongestNotCompactionSstCompact(const storage::DataType& type) { // ... (existing implementation) }src/storage/src/redis.h (5)
10-10
: Include only necessary headersEnsure that
<set>
is required. If it was added for the use ofstd::set
in your code, this inclusion is appropriate. Otherwise, consider removing it to reduce compilation overhead.
110-112
: Add documentation forLongestNotCompactionSstCompact
methodThe method
LongestNotCompactionSstCompact
lacks documentation. Adding a comment explaining its purpose, parameters, and return value would improve code readability and maintainability.
110-112
: Consider passingcompact_result_vec
by reference instead of pointerIf
compact_result_vec
is a required output parameter and should not be null, consider passing it by reference (std::vector<Status>&
) instead of using a pointer. This enhances safety by preventing null pointer dereferences and clarifies that the argument is mandatory.
471-493
: RenameMyEventListener
to a more descriptive nameThe class name
MyEventListener
is generic and does not reflect its specific functionality. Renaming it to something more descriptive, such asCompactionEventListener
orTableFileDeletionListener
, would enhance code clarity.
510-510
: Consider renamingin_compact_flag_
for clarityThe member variable
in_compact_flag_
may not clearly convey its purpose. Renaming it tois_compacting_
orcompaction_in_progress_
would improve code readability and adhere to common naming conventions.src/storage/include/storage/storage.h (2)
77-85
: Add documentation comments forCompactParam
member variablesProviding comments for each member variable in
CompactParam
improves code readability and maintainability. It helps other developers understand the purpose of each parameter.Apply this diff to add documentation comments:
struct CompactParam { // for LongestNotCompactionSstCompact function + // Number of files after which compaction should occur int compact_every_num_of_files_ = 0; + // Age in seconds after which compaction is forced int force_compact_file_age_seconds_ = 0; + // Minimum delete ratio to trigger forced compaction int force_compact_min_delete_ratio_ = 0; + // Do not compact SST files created within these seconds int dont_compact_sst_created_in_seconds_ = 0; + // Best delete ratio to select SST for compaction int best_delete_min_ratio_ = 0; };
1093-1099
: Adjust Doxygen comments for parameter descriptionsThe parameter descriptions in the Doxygen comments use periods instead of colons, which is unconventional. Replace periods with colons for consistency and clarity.
Apply this diff to adjust the comments:
/** * LongestNotCompactionSstCompact will execute the compact command for any cf in the given type - * @param type. data type like `kStrings` - * @param sync. if true, block function + * @param type: data type like `kStrings` + * @param sync: if true, block function * @return Status */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- include/pika_db.h (1 hunks)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (4 hunks)
- src/storage/include/storage/storage.h (5 hunks)
- src/storage/src/redis.cc (4 hunks)
- src/storage/src/redis.h (4 hunks)
- src/storage/src/storage.cc (5 hunks)
🧰 Additional context used
🔇 Additional comments (5)
src/pika_db.cc (1)
196-203
: Verify usage and consider documenting performance implications.The new
LongestNotCompactionSstCompact
method enhances the available compaction strategies. To ensure its effectiveness:
- Verify that this method is being called appropriately elsewhere in the codebase.
- Consider adding documentation about when to use this strategy over others like
Compact
orCompactRange
.- If possible, provide performance metrics or guidelines on when this strategy is most beneficial.
To check the usage of this new method across the codebase, you can run:
This will help ensure the method is being utilized as intended.
✅ Verification successful
Usage Verified
The
LongestNotCompactionSstCompact
method is properly utilized across the codebase, ensuring its integration aligns with existing compaction strategies.
- Consider adding documentation about when to use this strategy over others like
Compact
orCompactRange
.- Providing performance metrics or guidelines on the benefits of this strategy could further aid developers.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to LongestNotCompactionSstCompact rg "LongestNotCompactionSstCompact" --type cppLength of output: 1581
src/storage/include/storage/storage.h (1)
1137-1137
: Verify initialization ofstorage_options_
in all constructorsThe member variable
storage_options_
should be properly initialized in all constructors of theStorage
class to avoid undefined behavior.Run the following script to verify that
storage_options_
is initialized in all constructors:src/pika_server.cc (3)
519-521
: Addition of new compaction task is appropriate.The new case
kCompactOldestOrBestDeleteRatioSst
in the switch statement is properly implemented.
1227-1231
: Compaction strategy handling is correctly implemented.The code correctly triggers compaction tasks based on the configured strategy.
1501-1507
: Initialization of compaction parameters is appropriate.The compaction parameters are correctly set from configuration values.
@@ -132,6 +132,7 @@ class DB : public std::enable_shared_from_this<DB>, public pstd::noncopyable { | |||
// Compact use; | |||
void Compact(const storage::DataType& type); | |||
void CompactRange(const storage::DataType& type, const std::string& start, const std::string& end); | |||
void LongestNotCompactionSstCompact(const storage::DataType& type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Method naming inconsistency confirmed.
The method LongestNotCompactionSstCompact
is used across multiple files, confirming the naming inconsistency within the codebase. Please ensure to rename the method consistently across all implementations and usages.
🔗 Analysis chain
Improve method naming consistency and clarity.
The new method LongestNotCompactionSstCompact
has a naming convention that's inconsistent with other methods in the class. Additionally, there's a minor grammatical issue in the name. Consider renaming it to improve consistency and clarity:
- void LongestNotCompactionSstCompact(const storage::DataType& type);
+ void CompactLongestNotCompactedSst(const storage::DataType& type);
This change:
- Uses PascalCase for the method name, consistent with other methods like
Compact
andCompactRange
. - Improves readability by placing the verb at the beginning.
- Corrects the grammar from "Compaction" to "Compacted".
This method appears to be part of the new compaction strategy mentioned in the PR objectives. To verify its implementation and usage:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the implementation and usage of the new method
rg --type cpp "LongestNotCompactionSstCompact" -A 10
Length of output: 8750
src/storage/src/redis.h
Outdated
static std::mutex mu_; | ||
// deleted file(.sst) name in OBD compacting | ||
static std::set<std::string> deletedFileNameInOBDCompact_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make static members mu_
and deletedFileNameInOBDCompact_
private
Exposing static members mu_
and deletedFileNameInOBDCompact_
as public can lead to unintended access and modifications from outside the class. Making them private improves encapsulation and maintains class invariants.
Apply this diff to make the members private:
+ private:
static std::mutex mu_;
static std::set<std::string> deletedFileNameInOBDCompact_;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static std::mutex mu_; | |
// deleted file(.sst) name in OBD compacting | |
static std::set<std::string> deletedFileNameInOBDCompact_; | |
}; | |
private: | |
static std::mutex mu_; | |
// deleted file(.sst) name in OBD compacting | |
static std::set<std::string> deletedFileNameInOBDCompact_; | |
}; |
src/storage/src/redis.cc
Outdated
while (file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | ||
++metadata_iter; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure iterator bounds are checked to prevent potential infinite loop
In the loop starting on line 371, there's a risk that metadata_iter
may reach metadata.end()
without satisfying the condition. Dereferencing metadata_iter->name
after metadata_iter
passes metadata.end()
can cause undefined behavior or a crash.
Consider adding a check to ensure that metadata_iter
does not exceed metadata.end()
before dereferencing.
Apply this diff to fix the issue:
- while (file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) {
+ while (metadata_iter != metadata.end() && file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) {
++metadata_iter;
+ }
+ if (metadata_iter == metadata.end()) {
+ // Handle the case where the file is not found in metadata
+ continue;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
while (file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | |
++metadata_iter; | |
} | |
while (metadata_iter != metadata.end() && file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | |
++metadata_iter; | |
} | |
if (metadata_iter == metadata.end()) { | |
// Handle the case where the file is not found in metadata | |
continue; | |
} |
deleted_keys = metadata_iter->num_deletions; | ||
++metadata_iter; | ||
|
||
double delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent division by zero when calculating delete_ratio
On line 381, delete_ratio
is calculated using total_keys
as the denominator. If total_keys
is zero, this will result in a division by zero error.
Consider adding a check to ensure total_keys
is not zero before performing the division.
Apply this diff to prevent division by zero:
+ if (total_keys == 0) {
+ delete_ratio = 0.0;
+ } else {
delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
deleted_keys = metadata_iter->num_deletions; | |
++metadata_iter; | |
double delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys); | |
deleted_keys = metadata_iter->num_deletions; | |
++metadata_iter; | |
if (total_keys == 0) { | |
delete_ratio = 0.0; | |
} else { | |
delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys); | |
} |
/** | ||
* LongestNotCompactionSstCompact will execute the compact command for any cf in the given type | ||
* @param type. data type like `kStrings` | ||
* @param sync. if true, block function | ||
* @return Status | ||
*/ | ||
Status LongestNotCompactionSstCompact(const DataType &type, bool sync = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider renaming LongestNotCompactionSstCompact
for clarity
The method name LongestNotCompactionSstCompact
is lengthy and may reduce readability. Renaming it to a more concise and descriptive name enhances understanding.
Possible alternative names:
CompactOldestSstFile
CompactLongestUncompactedSst
CompactStaleSstFiles
Ensure the new name aligns with the method's functionality and follows the project's naming conventions.
struct CompactParam { | ||
// for LongestNotCompactionSstCompact function | ||
int compact_every_num_of_files_; | ||
int force_compact_file_age_seconds_; | ||
int force_compact_min_delete_ratio_; | ||
int dont_compact_sst_created_in_seconds_; | ||
int best_delete_min_ratio_; | ||
}; | ||
CompactParam compact_param_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize member variables in CompactParam
to default values
The CompactParam
struct declares several integer member variables without initializing them. Uninitialized variables can lead to undefined behavior if they are used before being set. Consider initializing them with default values to ensure predictable behavior.
Apply this diff to initialize the member variables:
struct CompactParam {
// for LongestNotCompactionSstCompact function
- int compact_every_num_of_files_;
- int force_compact_file_age_seconds_;
- int force_compact_min_delete_ratio_;
- int dont_compact_sst_created_in_seconds_;
- int best_delete_min_ratio_;
+ int compact_every_num_of_files_ = 0;
+ int force_compact_file_age_seconds_ = 0;
+ int force_compact_min_delete_ratio_ = 0;
+ int dont_compact_sst_created_in_seconds_ = 0;
+ int best_delete_min_ratio_ = 0;
};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
struct CompactParam { | |
// for LongestNotCompactionSstCompact function | |
int compact_every_num_of_files_; | |
int force_compact_file_age_seconds_; | |
int force_compact_min_delete_ratio_; | |
int dont_compact_sst_created_in_seconds_; | |
int best_delete_min_ratio_; | |
}; | |
CompactParam compact_param_; | |
struct CompactParam { | |
// for LongestNotCompactionSstCompact function | |
int compact_every_num_of_files_ = 0; | |
int force_compact_file_age_seconds_ = 0; | |
int force_compact_min_delete_ratio_ = 0; | |
int dont_compact_sst_created_in_seconds_ = 0; | |
int best_delete_min_ratio_ = 0; | |
}; | |
CompactParam compact_param_; |
kCompactRange, | ||
kCompactOldestOrBestDeleteRatioSst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Assign explicit integer values to Operation
enum entries
The Operation
enum does not assign explicit integer values to its members. This can lead to unintended behavior if the order of enum entries changes in the future. Assign explicit values to ensure stability.
Apply this diff to assign explicit integer values:
enum Operation {
- kNone = 0,
- kCleanAll,
- kCompactRange,
- kCompactOldestOrBestDeleteRatioSst,
+ kNone = 0,
+ kCleanAll = 1,
+ kCompactRange = 2,
+ kCompactOldestOrBestDeleteRatioSst = 3,
};
Committable suggestion was skipped due to low confidence.
const StorageOptions& Storage::GetStorageOptions() { | ||
return storage_options_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Declare GetStorageOptions
as a const
method
The method GetStorageOptions
does not modify any member variables and should be marked as const
to allow it to be called on const
instances of Storage
.
Apply the following diff to make the method const
:
-const StorageOptions& Storage::GetStorageOptions() {
+const StorageOptions& Storage::GetStorageOptions() const {
return storage_options_;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const StorageOptions& Storage::GetStorageOptions() { | |
return storage_options_; | |
} | |
const StorageOptions& Storage::GetStorageOptions() const { | |
return storage_options_; | |
} |
Status Storage::LongestNotCompactionSstCompact(const DataType &type, bool sync) { | ||
if (sync) { | ||
Status s; | ||
for (const auto& inst : insts_) { | ||
std::vector<rocksdb::Status> compact_result_vec; | ||
s = inst->LongestNotCompactionSstCompact(type, &compact_result_vec); | ||
for (auto compact_result : compact_result_vec) { | ||
if (!compact_result.ok()) { | ||
LOG(ERROR) << compact_result.ToString(); | ||
} | ||
} | ||
} | ||
return s; | ||
} else { | ||
AddBGTask({type, kCompactOldestOrBestDeleteRatioSst}); | ||
} | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure 's' is properly initialized before use
In LongestNotCompactionSstCompact
, the variable s
is declared but may not be assigned a value in all code paths when sync
is true
. If insts_
is empty, s
will remain uninitialized when returned.
Apply the following modification to initialize s
with Status::OK()
:
Status Storage::LongestNotCompactionSstCompact(const DataType &type, bool sync) {
if (sync) {
- Status s;
+ Status s = Status::OK();
for (const auto& inst : insts_) {
std::vector<rocksdb::Status> compact_result_vec;
s = inst->LongestNotCompactionSstCompact(type, &compact_result_vec);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Status Storage::LongestNotCompactionSstCompact(const DataType &type, bool sync) { | |
if (sync) { | |
Status s; | |
for (const auto& inst : insts_) { | |
std::vector<rocksdb::Status> compact_result_vec; | |
s = inst->LongestNotCompactionSstCompact(type, &compact_result_vec); | |
for (auto compact_result : compact_result_vec) { | |
if (!compact_result.ok()) { | |
LOG(ERROR) << compact_result.ToString(); | |
} | |
} | |
} | |
return s; | |
} else { | |
AddBGTask({type, kCompactOldestOrBestDeleteRatioSst}); | |
} | |
return Status::OK(); | |
} | |
Status Storage::LongestNotCompactionSstCompact(const DataType &type, bool sync) { | |
if (sync) { | |
Status s = Status::OK(); | |
for (const auto& inst : insts_) { | |
std::vector<rocksdb::Status> compact_result_vec; | |
s = inst->LongestNotCompactionSstCompact(type, &compact_result_vec); | |
for (auto compact_result : compact_result_vec) { | |
if (!compact_result.ok()) { | |
LOG(ERROR) << compact_result.ToString(); | |
} | |
} | |
} | |
return s; | |
} else { | |
AddBGTask({type, kCompactOldestOrBestDeleteRatioSst}); | |
} | |
return Status::OK(); | |
} |
src/pika_server.cc
Outdated
storage_options_.options.level0_file_num_compaction_trigger = g_pika_conf->level0_file_num_compaction_trigger(); | ||
storage_options_.options.level0_stop_writes_trigger = g_pika_conf->level0_stop_writes_trigger(); | ||
storage_options_.options.level0_slowdown_writes_trigger = g_pika_conf->level0_slowdown_writes_trigger(); | ||
storage_options_.options.min_write_buffer_number_to_merge = g_pika_conf->min_write_buffer_number_to_merge(); | ||
storage_options_.options.max_bytes_for_level_base = g_pika_conf->level0_file_num_compaction_trigger() * g_pika_conf->write_buffer_size(); | ||
storage_options_.options.max_subcompactions = g_pika_conf->max_subcompactions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate configuration assignments to avoid redundancy.
The assignments to the storage options from lines 1442 to 1447 are duplicates of previous assignments in the same method. This can lead to confusion and maintenance issues.
Apply this diff to remove the duplicated lines:
- storage_options_.options.level0_file_num_compaction_trigger = g_pika_conf->level0_file_num_compaction_trigger();
- storage_options_.options.level0_stop_writes_trigger = g_pika_conf->level0_stop_writes_trigger();
- storage_options_.options.level0_slowdown_writes_trigger = g_pika_conf->level0_slowdown_writes_trigger();
- storage_options_.options.min_write_buffer_number_to_merge = g_pika_conf->min_write_buffer_number_to_merge();
- storage_options_.options.max_bytes_for_level_base = g_pika_conf->level0_file_num_compaction_trigger() * g_pika_conf->write_buffer_size();
- storage_options_.options.max_subcompactions = g_pika_conf->max_subcompactions();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
storage_options_.options.level0_file_num_compaction_trigger = g_pika_conf->level0_file_num_compaction_trigger(); | |
storage_options_.options.level0_stop_writes_trigger = g_pika_conf->level0_stop_writes_trigger(); | |
storage_options_.options.level0_slowdown_writes_trigger = g_pika_conf->level0_slowdown_writes_trigger(); | |
storage_options_.options.min_write_buffer_number_to_merge = g_pika_conf->min_write_buffer_number_to_merge(); | |
storage_options_.options.max_bytes_for_level_base = g_pika_conf->level0_file_num_compaction_trigger() * g_pika_conf->write_buffer_size(); | |
storage_options_.options.max_subcompactions = g_pika_conf->max_subcompactions(); |
如果仅仅使用 benchmark 对 put 进行测试,理论上是测试不出效果,因为 put 并不会产生任何的删除率,只有一边 put,一边 delete 才行。 |
If you only use benchmark to test put, the test will theoretically have no effect, because put will not produce any deletion rate. You can only put on one side and delete on the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
src/storage/src/redis.cc (2)
406-406
: Remove unused variablebest_filename
The variable
best_filename
is assigned but never used in the code. Consider removing it to clean up the code and avoid potential confusion.Apply this diff to remove the unused variable:
std::string best_filename; double best_delete_ratio = 0;
388-389
: Consider optimizing time calculations for clarityIn comparing
file_creation_time
with the current time, the code performs division by1000
to convert milliseconds to seconds. To enhance readability and maintainability, consider using consistent time units or utilizing standard library functions for time calculations.For example, you could represent
now
in seconds directly:int64_t now_seconds = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::system_clock::now().time_since_epoch()).count(); // ... if (file_creation_time < static_cast<uint64_t>(now_seconds - storageOptions.compact_param_.force_compact_file_age_seconds_)) { // ... }This eliminates the need for manual division and reduces potential errors related to unit conversions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- src/storage/src/redis.cc (4 hunks)
🧰 Additional context used
🔇 Additional comments (2)
src/storage/src/redis.cc (2)
309-310
: Ensure thread-safe clearing of deleted SST file recordsThe method
MyEventListener::Clear()
is called to clear deleted SST file records. Ensure that this method properly locks the mutexmu_
to maintain thread safety when accessingdeletedFileNameInOBDCompact_
.Please verify that
MyEventListener::Clear()
implements the necessary locking mechanisms to prevent data races. If not, consider adding appropriate synchronization within the method.
215-270
:⚠️ Potential issueHandle unexpected
DataType
values inSelectColumnFamilyHandles
The
SelectColumnFamilyHandles
method doesn't handle unexpectedDataType
values explicitly. In thedefault
case of theswitch
statement, no action is taken, which could lead to silent failures if an invalid or unsupportedDataType
is passed.Consider adding error handling or logging in the
default
case to alert developers of unexpectedDataType
values:default: + LOG(WARNING) << "Unexpected DataType value: " << static_cast<int>(option_type); + // Optionally, you could throw an exception or return an error code break;This change enhances the robustness of the method by ensuring that unexpected input is detected and handled appropriately.
Likely invalid or redundant comment.
src/storage/src/redis.cc
Outdated
for (const auto& iter : props_vec) { | ||
auto file_path = iter.first; | ||
|
||
// maybe some sst files which occur in props_vec has been compacted in CompactRange, | ||
// so these files should not be checked. | ||
if (MyEventListener::Contains(file_path)) { | ||
continue; | ||
} | ||
|
||
uint64_t file_creation_time = iter.second->file_creation_time; | ||
if (file_creation_time == 0) { | ||
// Fallback to the file Modification time to prevent repeatedly compacting the same file, | ||
// file_creation_time is 0 which means the unknown condition in rocksdb | ||
auto s = rocksdb::Env::Default()->GetFileModificationTime(file_path, &file_creation_time); | ||
if (!s.ok()) { | ||
LOG(WARNING) << handles_[idx]->GetName() << " Failed to get the file creation time: " << file_path << " in " | ||
<< handles_[idx]->GetName() << ", err: " << s.ToString(); | ||
continue; | ||
} | ||
} | ||
|
||
while (metadata_iter != metadata.end() && file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | ||
++metadata_iter; | ||
} | ||
if (metadata_iter == metadata.end()) { | ||
// we reach here only in this case: some sst files has been created | ||
// before calling GetPropertiesOfAllTables and after calling GetLiveFilesMetaData. | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper synchronization between metadata
and props_vec
The current logic assumes that metadata
and props_vec
are aligned and that their iterators move in sync. However, there's a risk of mismatch if the files in metadata
and props_vec
are not sorted identically or if new SST files are created between calls to GetLiveFilesMetaData
and GetPropertiesOfAllTables
.
This could lead to incorrect associations between file properties and metadata, potentially causing errors or unintended behavior.
Consider mapping the file names to their metadata to ensure accurate associations. For example:
// Build a map for quick lookup
std::unordered_map<std::string, rocksdb::LiveFileMetaData> metadata_map;
for (const auto& meta : metadata) {
metadata_map[meta.name] = meta;
}
// In the loop
for (const auto& iter : props_vec) {
auto file_name = iter.first.substr(iter.first.find_last_of('/') + 1);
auto metadata_it = metadata_map.find(file_name);
if (metadata_it == metadata_map.end()) {
// Handle the case where the file is not found
continue;
}
const auto& meta = metadata_it->second;
start_key = meta.smallestkey;
stop_key = meta.largestkey;
total_keys = meta.num_entries;
deleted_keys = meta.num_deletions;
// ... rest of the logic ...
}
This approach ensures that each file's properties are correctly matched with its metadata, enhancing the robustness of the compaction logic.
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct, std::memory_order_relaxed, | ||
std::memory_order_relaxed)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use appropriate memory order for atomic operations
The use of std::memory_order_relaxed
in atomic operations may not ensure the necessary synchronization. When performing compound operations like compare_exchange_weak
, it's generally safer to use stronger memory ordering to prevent subtle concurrency issues.
Consider using std::memory_order_acquire
and std::memory_order_release
for proper synchronization:
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct,
- std::memory_order_relaxed,
- std::memory_order_relaxed)) {
+ std::memory_order_acquire,
+ std::memory_order_release)) {
return Status::Busy("compact running");
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct, std::memory_order_relaxed, | |
std::memory_order_relaxed)) { | |
if (!in_compact_flag_.compare_exchange_weak(no_compact, to_comapct, std::memory_order_acquire, | |
std::memory_order_release)) { |
if (!s.ok()) { | ||
if (compact_result_vec) { | ||
compact_result_vec->push_back( | ||
Status::Corruption(handles_[idx]->GetName() + | ||
" LongestNotCompactionSstCompact GetPropertiesOfAllTables error: " + s.ToString())); | ||
} | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review error handling after failing to get table properties
The current error handling continues to the next handle when GetPropertiesOfAllTables
fails. This may lead to incomplete operations or inconsistent states if critical data is missing.
Consider returning the error immediately or accumulating errors to inform the caller of all issues encountered. For example:
if (!s.ok()) {
if (compact_result_vec) {
compact_result_vec->push_back(
Status::Corruption(handles_[idx]->GetName() +
" LongestNotCompactionSstCompact GetPropertiesOfAllTables error: " + s.ToString()));
}
- continue;
+ return s;
}
This ensures that the method does not proceed with invalid data and allows the caller to handle the error appropriately.
Committable suggestion was skipped due to low confidence.
start_key.clear(); | ||
best_stop_key = stop_key; | ||
stop_key.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct misuse of clear()
method on rocksdb::Slice
The rocksdb::Slice
class does not have a clear()
method. Calling clear()
on a Slice
object will result in a compile-time error.
To reset a Slice
object, you can assign it an empty Slice
or set its data and size appropriately.
Apply this diff to fix the issue:
start_key.clear();
stop_key.clear();
+start_key = rocksdb::Slice();
+stop_key = rocksdb::Slice();
Committable suggestion was skipped due to low confidence.
src/pika_server.cc
Outdated
@@ -1430,6 +1439,12 @@ void PikaServer::InitStorageOptions() { | |||
storage_options_.options.max_bytes_for_level_base = g_pika_conf->level0_file_num_compaction_trigger() * g_pika_conf->write_buffer_size(); | |||
storage_options_.options.max_subcompactions = g_pika_conf->max_subcompactions(); | |||
storage_options_.options.target_file_size_base = g_pika_conf->target_file_size_base(); | |||
storage_options_.options.level0_file_num_compaction_trigger = g_pika_conf->level0_file_num_compaction_trigger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些参数设置上边已经有了,1435行
} | ||
|
||
static std::mutex mu_; | ||
// deleted file(.sst) name in OBD compacting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个为啥搞成static?为不同的redis实例创建的不同的Listener对象没关系吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块对 redis 和 rocksdb 关系搞错了,已改
src/storage/src/redis.cc
Outdated
} | ||
|
||
// clear deleted sst file records because we use these only in OBD-compact | ||
MyEventListener::Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果一直没有执行manual compaction,MyEventListener会一直在内存map里积累删除的sst文件吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块我在 rocksdb::DB::Open 前判断,如果是 obd-compact,那么我就选择监听 sst file,否则就不监听,这样可以保证不会积累
src/storage/src/redis.cc
Outdated
} | ||
|
||
// clear deleted sst file records because we use these only in OBD-compact | ||
MyEventListener::Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
要不换个名字吧,MyEventListener看不出来这个Listener的作用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成了 OBDSstListener
} | ||
|
||
// don't compact the SST created in x `dont_compact_sst_created_in_seconds_`. | ||
if (file_creation_time > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你的props_vec已经是按照filenumber升序排列了,如果有一个sst文件的createtime不满足条件了,后边的也不会满足条件了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
厉害,这块我确实没想到,已改
} | ||
|
||
// pick the file which has highest delete ratio | ||
if (total_keys != 0 && delete_ratio > best_delete_ratio) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方的逻辑应该走不到吧?首先file_create_time的限制是满足的,否则399行就return了,然后delete_ratio还要比之前已经compact过的sstfile的delete_ratio都大,那肯定就比force_compact_min_ratio大,那388行不就已经执行了吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块是两个参数,一个是 force_compact_min_ratio,当删除率大于这个的时候,强制删除;还有一个参数是 best_delete_min_ratio,当删除率大于这个的时候,这个 sst 可以被删除,究竟是否删除,要选最大的。不过如果有一个 sst 的删除率超过 force_compact_min_ratio,那后面的 sst 确实不会走到这个位置,但是如果遍历的所有的 sst 都不超过 force_compact_min_ratio,只是超过 best_delete_min_ratio,这个位置还是会被走到的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (4)
src/storage/src/redis.h (2)
510-510
: Add comment explainingin_compact_flag_
Add a comment describing the purpose and usage of
in_compact_flag_
to enhance code readability and maintainability.
511-511
: Improve the comment forlistener_
Clarify and correct the comment for
listener_
:Original:
// listening created sst file while compacting in OBD-compact
Suggested:
// Listener for capturing deleted SST files during OBD compaction
Apply this diff:
- OBDSstListener listener_; // listening created sst file while compacting in OBD-compact + OBDSstListener listener_; // Listener for capturing deleted SST files during OBD compactionsrc/storage/src/redis.cc (1)
168-169
: Correct grammatical errors in comments.The comments contain grammatical errors that can be confusing.
Original comments:
// if using obd-compact, we should listening created sst file // while compacting in OBD-compact
Consider revising them for clarity:
-// if using obd-compact, we should listening created sst file +// If using OBD-compact, we should listen to created SST files -// while compacting in OBD-compact +// while compacting in OBD-compact.src/pika_server.cc (1)
1500-1506
: Update the comment to accurately describe the compaction parametersThe comment
// For Storage small compaction
does not match the parameters below, which are related to theOldestOrBestDeleteRatioSstCompact
strategy. Update the comment to reflect that these settings configure the compaction parameters for the new strategy.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- src/pika_server.cc (4 hunks)
- src/storage/src/redis.cc (5 hunks)
- src/storage/src/redis.h (4 hunks)
🧰 Additional context used
🔇 Additional comments (19)
src/storage/src/redis.h (3)
10-10
: Inclusion of<set>
header is appropriateThe addition of
#include <set>
is necessary for the use ofstd::set
in theOBDSstListener
class.
110-111
: Consider renamingLongestNotCompactionSstCompact
for clarityThe method name
LongestNotCompactionSstCompact
may not clearly convey its purpose. Consider renaming it to better reflect its functionality, such asCompactLongestUncompactedSst
.
489-491
: Makemu_
anddeletedFileNameInOBDCompact_
privateTo improve encapsulation and prevent unintended access, consider making the member variables
mu_
anddeletedFileNameInOBDCompact_
private.src/storage/src/redis.cc (16)
8-9
: Includes appear necessary and appropriate.The addition of
#include "include/pika_conf.h"
and#include "pstd_coding.h"
provides required declarations used elsewhere in the code. These includes are necessary for accessing configuration settings and encoding utilities.
11-11
: Includerocksdb/metadata.h
for metadata operations.Including
rocksdb/metadata.h
enables the use of RocksDB metadata functionalities, which are utilized in the compaction logic.
17-18
: Ensure utility headers are necessary for functionality.The inclusion of
"pstd/include/pstd_string.h"
and"pstd/include/pstd_defer.h"
suggests the use of string utilities and defer mechanisms. Confirm that these headers are required for the implementations in this file.
20-20
: Declaration of external configuration variable.The external declaration
extern std::unique_ptr<PikaConf> g_pika_conf;
allows access to global configuration settings. Ensure thatg_pika_conf
is properly initialized before use to avoid undefined behavior.
31-33
: Return aconst
comparator for safety.Changing the return type of
ZSetsScoreKeyComparator()
toconst rocksdb::Comparator*
enhances const-correctness, preventing unintended modifications to the comparator.
74-74
: Retrieveforce_compact_min_delete_ratio
correctly.The variable
force_compact_min_delete_ratio
is retrieved from the storage options’ compaction parameters. Ensure that this value is correctly initialized and used in subsequent compaction logic.
75-75
: Initialize RocksDB options appropriately.Using
rocksdb::Options ops(storage_options.options);
ensures that the database options are initialized based on the provided storage options.
80-80
: Assign database statistics to options when enabled.Assigning
ops.statistics = db_statistics_;
incorporates the statistics object into RocksDB options, enabling performance monitoring when database statistics are enabled.
170-172
: Ensure the listener is added correctly based on compaction strategy.The condition checks if the compaction strategy is
OldestOrBestDeleteRatioSstCompact
before adding theOBDSstListener
. This ensures that the listener is only added when necessary.
220-275
:SelectColumnFamilyHandles
method correctly maps data types to handles.The method
SelectColumnFamilyHandles
appropriately mapsDataType
values to their corresponding column family handles based on the specifiedColumnFamilyType
.
389-390
: Prevent division by zero when calculatingdelete_ratio
.There is a risk of division by zero if
total_keys
is zero in the calculation ofdelete_ratio
:double delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys);This issue may have been previously addressed, but ensure that a check is in place to handle cases where
total_keys
is zero.
421-423
: Ensure compaction condition accurately reflects intent.The condition for compacting based on the best delete ratio is:
if (best_delete_ratio > best_delete_min_ratio && !best_start_key.empty() && !best_stop_key.empty() && max_files_to_compact != 0) { // ... }Confirm that
best_delete_ratio
andbest_delete_min_ratio
are correctly calculated and that this condition aligns with the desired compaction strategy.
395-395
: Handle compaction result properly.After calling
db_->CompactRange
, check the returnedStatus
and handle any errors immediately to prevent proceeding with invalid state.Ensure that any errors from
CompactRange
are appropriately logged and that the loop exits or continues based on the severity of the error.
Line range hint
74-345
: Consistent use of compaction parameters from configuration.Ensure that all compaction parameters, such as
force_compact_min_delete_ratio_
andbest_delete_min_ratio_
, are consistently retrieved from the configuration and used throughout the compaction logic.
170-172
: Confirm the existence and correct implementation ofOBDSstListener
.Make sure that the
OBDSstListener
class is defined and implemented correctly to handle SST file events during compaction as intended.Consider adding unit tests or integration tests to verify the behavior of the listener in various scenarios.
352-352
:⚠️ Potential issueInitialize
compact_result
toStatus::OK()
.To prevent potential undefined behavior due to uninitialized
compact_result
, initialize it toStatus::OK()
when declared.Apply this diff:
-Status compact_result; +Status compact_result = Status::OK();Likely invalid or redundant comment.
src/storage/src/redis.h
Outdated
mu_.lock(); | ||
deletedFileNameInOBDCompact_.emplace(info.file_path); | ||
mu_.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::lock_guard
for exception safety
In OnTableFileDeleted
, manual locking and unlocking of mu_
is used. To ensure exception safety and consistency with other methods, replace manual locks with std::lock_guard<std::mutex>
.
Apply this diff:
- mu_.lock();
deletedFileNameInOBDCompact_.emplace(info.file_path);
- mu_.unlock();
+ std::lock_guard<std::mutex> lk(mu_);
+ deletedFileNameInOBDCompact_.emplace(info.file_path);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
mu_.lock(); | |
deletedFileNameInOBDCompact_.emplace(info.file_path); | |
mu_.unlock(); | |
std::lock_guard<std::mutex> lk(mu_); | |
deletedFileNameInOBDCompact_.emplace(info.file_path); |
src/storage/src/redis.cc
Outdated
|
||
auto force_compact_min_ratio = | ||
static_cast<double>(storageOptions.compact_param_.force_compact_min_delete_ratio_) / 100.0; | ||
auto best_delete_min_ratio = static_cast<double>(storageOptions.compact_param_.best_delete_min_ratio_) / 100.0; | ||
|
||
std::string best_filename; | ||
double best_delete_ratio = 0; | ||
uint64_t total_keys = 0, deleted_keys = 0; | ||
rocksdb::Slice start_key, stop_key, best_start_key, best_stop_key; | ||
Status compact_result; | ||
auto metadata_iter = metadata.begin(); | ||
for (const auto& iter : props_vec) { | ||
auto file_path = iter.first; | ||
|
||
// maybe some sst files which occur in props_vec has been compacted in CompactRange, | ||
// so these files should not be checked. | ||
if (listener_.Contains(file_path)) { | ||
continue; | ||
} | ||
|
||
uint64_t file_creation_time = iter.second->file_creation_time; | ||
if (file_creation_time == 0) { | ||
// Fallback to the file Modification time to prevent repeatedly compacting the same file, | ||
// file_creation_time is 0 which means the unknown condition in rocksdb | ||
auto s = rocksdb::Env::Default()->GetFileModificationTime(file_path, &file_creation_time); | ||
if (!s.ok()) { | ||
LOG(WARNING) << handles_[idx]->GetName() << " Failed to get the file creation time: " << file_path << " in " | ||
<< handles_[idx]->GetName() << ", err: " << s.ToString(); | ||
continue; | ||
} | ||
} | ||
|
||
while (metadata_iter != metadata.end() && file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | ||
++metadata_iter; | ||
} | ||
if (metadata_iter == metadata.end()) { | ||
// we reach here only in this case: some sst files has been created | ||
// before calling GetPropertiesOfAllTables and after calling GetLiveFilesMetaData. | ||
break; | ||
} | ||
|
||
start_key = metadata_iter->smallestkey; | ||
stop_key = metadata_iter->largestkey; | ||
total_keys = metadata_iter->num_entries; | ||
deleted_keys = metadata_iter->num_deletions; | ||
++metadata_iter; | ||
|
||
double delete_ratio = static_cast<double>(deleted_keys) / static_cast<double>(total_keys); | ||
|
||
// pick the file according to force compact policy | ||
if (file_creation_time < | ||
static_cast<uint64_t>(now / 1000 - storageOptions.compact_param_.force_compact_file_age_seconds_) && | ||
delete_ratio >= force_compact_min_ratio) { | ||
compact_result = db_->CompactRange(default_compact_range_options_, &start_key, &stop_key); | ||
if (--max_files_to_compact == 0) { | ||
break; | ||
} | ||
continue; | ||
} | ||
|
||
// don't compact the SST created in x `dont_compact_sst_created_in_seconds_`. | ||
// the elems in props_vec has been sorted by filename, meaning that the file | ||
// creation time of the subsequent sst file must be not less than this time. | ||
if (file_creation_time > | ||
static_cast<uint64_t>(now / 1000 - storageOptions.compact_param_.dont_compact_sst_created_in_seconds_)) { | ||
break; | ||
} | ||
|
||
// pick the file which has highest delete ratio | ||
if (total_keys != 0 && delete_ratio > best_delete_ratio) { | ||
best_delete_ratio = delete_ratio; | ||
best_filename = iter.first; | ||
best_start_key = start_key; | ||
start_key.clear(); | ||
best_stop_key = stop_key; | ||
stop_key.clear(); | ||
} | ||
} | ||
|
||
// if max_files_to_compact is zero, we should not compact this sst file. | ||
if (best_delete_ratio > best_delete_min_ratio && !best_start_key.empty() && !best_stop_key.empty() && | ||
max_files_to_compact != 0) { | ||
compact_result = | ||
db_->CompactRange(default_compact_range_options_, handles_[idx], &best_start_key, &best_stop_key); | ||
} | ||
|
||
if (!compact_result.ok()) { | ||
if (compact_result_vec) { | ||
compact_result_vec->push_back( | ||
Status::Corruption(handles_[idx]->GetName() + " Failed to do compaction " + compact_result.ToString())); | ||
} | ||
continue; | ||
} | ||
|
||
if (compact_result_vec) { | ||
compact_result_vec->push_back(Status::OK()); | ||
} | ||
} | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential use of uninitialized compact_result
.
In the LongestNotCompactionSstCompact
method, the compact_result
variable may not be assigned before being checked at lines 428-434.
Status compact_result;
// ...
if (!compact_result.ok()) {
if (compact_result_vec) {
compact_result_vec->push_back(
Status::Corruption(handles_[idx]->GetName() + " Failed to do compaction " + compact_result.ToString()));
}
continue;
}
If none of the compaction conditions are met within the loop, compact_result
remains uninitialized, leading to undefined behavior when calling compact_result.ok()
.
Apply this diff to ensure compact_result
is initialized properly:
Status compact_result;
+compact_result = Status::OK();
Alternatively, modify the logic to only check compact_result
if it has been assigned after a compaction attempt.
start_key.clear(); | ||
best_stop_key = stop_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid clearing rocksdb::Slice
using clear()
method.
rocksdb::Slice
does not have a clear()
method. Attempting to call clear()
will result in a compilation error.
Apply this diff to reset the Slice
objects:
-start_key.clear();
+start_key = rocksdb::Slice();
Similarly, ensure that stop_key
is reset appropriately.
Committable suggestion was skipped due to low confidence.
case TaskType::kCompactOldestOrBestDeleteRatioSst: | ||
db_item.second->LongestNotCompactionSstCompact(storage::DataType::kAll); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider renaming LongestNotCompactionSstCompact
for clarity
The method name LongestNotCompactionSstCompact
may be grammatically incorrect and could cause confusion. Consider renaming it to LongestNotCompactedSstCompact
for better readability and clarity.
if (g_pika_conf->compaction_strategy() == PikaConf::FullCompact) { | ||
DoSameThingEveryDB(TaskType::kCompactAll); | ||
} else if (g_pika_conf->compaction_strategy() == PikaConf::OldestOrBestDeleteRatioSstCompact) { | ||
DoSameThingEveryDB(TaskType::kCompactOldestOrBestDeleteRatioSst); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a default case to handle unexpected compaction strategies
Currently, AutoCompactRange()
handles FullCompact
and OldestOrBestDeleteRatioSstCompact
strategies. If compaction_strategy()
returns an unexpected value, no compaction will occur, which might lead to unintended behavior. Consider adding an else
clause to handle any unsupported or future compaction strategies.
src/storage/src/redis.cc
Outdated
@@ -64,12 +71,13 @@ Status Redis::Open(const StorageOptions& storage_options, const std::string& db_ | |||
rocksdb::BlockBasedTableOptions table_ops(storage_options.table_options); | |||
table_ops.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, true)); | |||
|
|||
rocksdb::DBOptions db_ops(storage_options.options); | |||
db_ops.create_missing_column_families = true; | |||
auto force_compact_min_delete_ratio = storage_->GetStorageOptions().compact_param_.force_compact_min_delete_ratio_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个临时变量没有用到。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -464,6 +468,29 @@ class Redis { | |||
inline Status SetFirstOrLastID(const rocksdb::Slice& key, StreamMetaValue& stream_meta, bool is_set_first, | |||
rocksdb::ReadOptions& read_options); | |||
|
|||
class OBDSstListener : public rocksdb::EventListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我理解,如果没有在做manual compact,这个函数就不需要工作,不管现在的compact_strategy是full_compact还是obd_compact。所以你可以在OBDSstListener加个参数,默认不记录删除的sst文件。在执行你逻辑的地方打开这个参数开始记录,逻辑执行完再关掉这个参数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/pika_server.cc
Outdated
@@ -1430,6 +1439,11 @@ void PikaServer::InitStorageOptions() { | |||
storage_options_.options.max_bytes_for_level_base = g_pika_conf->level0_file_num_compaction_trigger() * g_pika_conf->write_buffer_size(); | |||
storage_options_.options.max_subcompactions = g_pika_conf->max_subcompactions(); | |||
storage_options_.options.target_file_size_base = g_pika_conf->target_file_size_base(); | |||
storage_options_.options.level0_stop_writes_trigger = g_pika_conf->level0_stop_writes_trigger(); | |||
storage_options_.options.level0_slowdown_writes_trigger = g_pika_conf->level0_slowdown_writes_trigger(); | |||
storage_options_.options.min_write_buffer_number_to_merge = g_pika_conf->min_write_buffer_number_to_merge(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些参数1435行已经有了,可以删掉。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- src/pika_server.cc (3 hunks)
- src/storage/src/redis.cc (5 hunks)
- src/storage/src/redis.h (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/storage/src/redis.h
🧰 Additional context used
🔇 Additional comments (8)
src/storage/src/redis.cc (6)
279-279
: Fix typo in variable nameto_comapct
There's a typo in the variable name
to_comapct
; it should beto_compact
.
280-281
: Use appropriate memory order in atomic operationUsing
std::memory_order_relaxed
incompare_exchange_weak
may not ensure the necessary synchronization. It's safer to usestd::memory_order_acquire
andstd::memory_order_release
to prevent concurrency issues.
395-396
: Prevent division by zero when calculatingdelete_ratio
total_keys
could be zero, causing a division by zero error in the calculation ofdelete_ratio
.
421-423
: Correct misuse ofclear()
method onrocksdb::Slice
rocksdb::Slice
does not have aclear()
method. Callingclear()
will result in a compilation error.
434-440
: Initializecompact_result
before usage
compact_result
might remain uninitialized if no compaction occurs, leading to undefined behavior when checkingcompact_result.ok()
.
219-274
: Handle unexpectedDataType
values inSelectColumnFamilyHandles
The
default
case in the switch statement does not handle unexpectedDataType
values, which could lead to undefined behavior.src/pika_server.cc (2)
519-521
: Duplicate Comment: Consider renamingLongestNotCompactionSstCompact
for clarity
1227-1231
: Duplicate Comment: Add a default case to handle unexpected compaction strategies
if (props.size() / storageOptions.compact_param_.compact_every_num_of_files_ > max_files_to_compact) { | ||
max_files_to_compact = props.size() / storageOptions.compact_param_.compact_every_num_of_files_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent potential division by zero in compaction calculation
In line 336, there's a risk of division by zero if storageOptions.compact_param_.compact_every_num_of_files_
is zero. This could lead to a runtime error.
Consider adding a check to ensure compact_every_num_of_files_
is not zero before performing the division:
if (storageOptions.compact_param_.compact_every_num_of_files_ != 0 &&
props.size() / storageOptions.compact_param_.compact_every_num_of_files_ > max_files_to_compact) {
max_files_to_compact = props.size() / storageOptions.compact_param_.compact_every_num_of_files_;
}
while (metadata_iter != metadata.end() && file_path.substr(file_path.find_last_of('/')) != metadata_iter->name) { | ||
++metadata_iter; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid potential errors when extracting file name from file_path
The expression file_path.substr(file_path.find_last_of('/'))
may cause issues if '/'
is not found in file_path
, as find_last_of('/')
would return std::string::npos
. This could lead to unexpected behavior.
Consider handling the case where '/'
is not found:
size_t pos = file_path.find_last_of('/');
std::string file_name = (pos != std::string::npos) ? file_path.substr(pos + 1) : file_path;
while (metadata_iter != metadata.end() && file_name != metadata_iter->name) {
++metadata_iter;
}
// For Storage compaction | ||
storage_options_.compact_param_.best_delete_min_ratio_ = g_pika_conf->best_delete_min_ratio(); | ||
storage_options_.compact_param_.dont_compact_sst_created_in_seconds_ = g_pika_conf->dont_compact_sst_created_in_seconds(); | ||
storage_options_.compact_param_.force_compact_file_age_seconds_ = g_pika_conf->force_compact_file_age_seconds(); | ||
storage_options_.compact_param_.force_compact_min_delete_ratio_ = g_pika_conf->force_compact_min_delete_ratio(); | ||
storage_options_.compact_param_.compact_every_num_of_files_ = g_pika_conf->compact_every_num_of_files(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider refactoring InitStorageOptions()
for better maintainability
The InitStorageOptions()
function contains numerous direct assignments from configuration parameters, making it lengthy and potentially harder to maintain. Consider refactoring the initialization of compaction parameters into a separate helper method or struct to improve readability and modularity.
对 MalikHou#2 的完善
Summary by CodeRabbit
Summary by CodeRabbit
New Features
PikaConf
class with new methods for accessing compaction-related settings.Bug Fixes
Documentation