DOC-11497 Docs for obs: Enabling troubleshooting hot spots externally (e.g., logs or metrics) #19577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

florence-crl wants to merge 29 commits into main from DOC-11497

Contributor

florence-crl commented May 1, 2025 •

edited

Loading

Fixes DOC-11497

Added detect-hotspots.md and associated images.

Rendered previews:

Detect Hotspots


          initial draft.

0173eb8

github-actions bot commented May 1, 2025 •

edited

Loading

Files changed:

src/current/_includes/v25.2/sidebar-data/troubleshooting.json
src/current/_includes/v25.3/sidebar-data/troubleshooting.json
src/current/images/v25.2/detect-hotspots-cpu-percent.png:

src/current/images/v25.2/detect-hotspots-latch-conflict-wait-durations.png:

src/current/images/v25.2/detect-hotspots-workflow.monopic:

Warning: include not used in any file or include

src/current/images/v25.2/detect-hotspots-workflow.svg:

src/current/images/v25.3/detect-hotspots-cpu-percent.png:

src/current/images/v25.3/detect-hotspots-latch-conflict-wait-durations.png:

src/current/images/v25.3/detect-hotspots-workflow.monopic:

Warning: include not used in any file or include

src/current/images/v25.3/detect-hotspots-workflow.svg:

src/current/v25.2/detect-hotspots.md
src/current/v25.2/understand-hotspots.md
src/current/v25.3/detect-hotspots.md
src/current/v25.3/understand-hotspots.md


          Merge remote-tracking branch 'origin/main' into DOC-11497

6dbaa47

netlify bot commented May 1, 2025 •

edited

Loading

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name	Link
🔨 Latest commit	`f0370b3`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/6851b312362ada00083641dd

netlify bot commented May 1, 2025 •

edited

Loading

✅ Deploy Preview for cockroachdb-api-docs canceled.

Name	Link
🔨 Latest commit	`f0370b3`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-api-docs/deploys/6851b312a578fc00083c1794

netlify bot commented May 1, 2025

❌ Deploy Preview for cockroachdb-docs failed. Why did it fail? →

Name	Link
🔨 Latest commit	`0173eb8`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-docs/deploys/6813b55b6c4a2d00084eadec

netlify bot commented May 1, 2025 •

edited

Loading

✅ Netlify Preview

Name	Link
🔨 Latest commit	`f0370b3`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-docs/deploys/6851b3120add9a000825327e
😎 Deploy Preview	https://deploy-preview-19577--cockroachdb-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

florence-crl added 4 commits

May 2, 2025 19:54


          first revision

5a2c18f


          Merge remote-tracking branch 'origin/main' into DOC-11497

c31804e


          fixed link

9bd27e8


          fixed summary

db76289

florence-crl requested a review from kevin-v-ngo

May 13, 2025 19:17

florence-crl added 11 commits

May 20, 2025 13:49


          Merge remote-tracking branch 'origin/main' into DOC-11497

b12f81c


          Merge remote-tracking branch 'origin/main' into DOC-11497

4b1cf7a


          draft 2

e78be2d


          Merge remote-tracking branch 'origin/main' into DOC-11497

74965b4


          draft 3

57fa244


          Merge remote-tracking branch 'origin/main' into DOC-11497

8fc6e2c


          Merge remote-tracking branch 'origin/main' into DOC-11497

80e592f


          Merge remote-tracking branch 'origin/main' into DOC-11497

0aab4d9


          Merge remote-tracking branch 'origin/main' into DOC-11497

e8411a4


          full draft

30fd1c0


          fixed link

6a5609b

florence-crl requested a review from angles-n-daemons

June 3, 2025 14:11

florence-crl added 3 commits

June 3, 2025 13:23


          fix file names

c7c0a9e


          Merge remote-tracking branch 'origin/main' into DOC-11497

4c3b13f


          restart deploy-preview

3744bc2

angles-n-daemons reviewed

View reviewed changes

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved


          incorporated Brian’s feedback

98122e5

florence-crl commented

View reviewed changes

Contributor Author

florence-crl left a comment

thanks for the first look, @angles-n-daemons please review again.

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

angles-n-daemons reviewed

View reviewed changes

angles-n-daemons left a comment

Awesome, couple more quick comments here.

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved


          Incorporated Brian’s feedback 2. Deleted unused images.

f724d59

florence-crl commented

View reviewed changes

Contributor Author

florence-crl left a comment

TFTR

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Show resolved Hide resolved

florence-crl added 2 commits

June 4, 2025 17:25


          Added detect-hotspots-workflow.svg.

afbd9ff


          Merge remote-tracking branch 'origin/main' into DOC-11497

florence-crl requested a review from angles-n-daemons

June 4, 2025 22:11

angles-n-daemons approved these changes

View reviewed changes

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

florence-crl added 2 commits

June 6, 2025 16:31


          Merge remote-tracking branch 'origin/main' into DOC-11497

8fd4609


          Incorporated Brian’s feedback 3.

7e7f289

kevin-v-ngo requested changes

View reviewed changes

kevin-v-ngo left a comment

Awesome Doc! Few questions and suggestions.

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/images/v25.2/detect-hotspots-workflow.svg Outdated

kevin-v-ngo Jun 16, 2025

Few questions and suggestions,

Can we simplify this and remove the second box ("Is there a node outlier in the metrics?")?
Are guaranteed to have a 'hot ranges log' when there is a popular key log for the latch contention workflow? CC @angles-n-daemons

Contributor Author

florence-crl Jun 17, 2025

modified diagram

angles-n-daemons Jun 18, 2025

We aren't, I'll explain in detail why.

The hot ranges log shows up under two conditions when enabled:

The logging interval duration has elapsed (eg, once every four hours).
A single replica has exceeded the CPU threshold we configured for logging.

Now when there's a popular key, or rather a row hotspot, a single range may be receiving most of the traffic, but much of the incoming queries are waiting for a latch to be released rather than doing anything. Waiting for a latch incurs no effect on cpu utilization, so if there are lots of waiting queries, there's not quite as much cpu activity.

You can see this difference in the Anatomy of a Hotspot document, if you look at "Appendix B: Anatomy of a Row Hotspot", you'll see that while elevated, the cpu utilization for the leaseholder doesn't exceed 25%.

It's certainly possible that this is enough to go over the threshold defined, but not guaranteed.

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated

+              - Once you identify a relevant log, note the range ID in the tag section of the log.
+              {{site.data.alerts.callout_info}}
+              There may be false positives of the `popular key detected` log.

kevin-v-ngo Jun 16, 2025

How? If we determined that there is a metric anomaly in latch or CPU, don't we remove the false positives?

Contributor Author

florence-crl Jun 17, 2025

@angles-n-daemons Would you be able to answer the above questions?

angles-n-daemons Jun 18, 2025

I think metric anomalies don't guarantee that there's a hotspot in the keyspace, there could, for example, be a hotspot in data domiciling, or in a changefeed job or other similar task. Separately, it's possible, because we only collect 20 samples, that the samples collected to determine a popular key are randomly skewed.

I will say though that I'm not sure if the false positives are as big a concern as I thought before, I recommended adding this warning, but I think we can remove it and see if it proves to be an issue at all.

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

florence-crl added 3 commits

June 17, 2025 11:05


          Merge remote-tracking branch 'origin/main' into DOC-11497

7d32151


          Incorporated Kevin’s feedback.

502cc35


          Copied files to v25.3.

f0370b3

florence-crl commented

View reviewed changes

Contributor Author

florence-crl left a comment

@kevin-v-ngo thanks for your first review, please take a second look.

src/current/images/v25.2/detect-hotspots-workflow.svg Outdated

Contributor Author

florence-crl Jun 17, 2025

modified diagram

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated

+              - Once you identify a relevant log, note the range ID in the tag section of the log.
+              {{site.data.alerts.callout_info}}
+              There may be false positives of the `popular key detected` log.

Contributor Author

florence-crl Jun 17, 2025

@angles-n-daemons Would you be able to answer the above questions?

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

src/current/v25.2/detect-hotspots.md Outdated Show resolved Hide resolved

florence-crl requested a review from kevin-v-ngo

June 17, 2025 18:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet