-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC-11961 Provide comprehensive documentation for hotspots #19282
base: main
Are you sure you want to change the base?
Conversation
Files changed:
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, left a couple comments - but appreciate all the extra work put into this (and the links are amazing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tftr! @kevin-v-ngo and @angles-n-daemons
Florence one more thing to note, the hot key visual may be misleading. While write hotspots can go up to 20k writes per second for an index hotspot, hot keys are generally limited to < 1k writes per second. The image I created seems to indicate that they can go much higher, which I doubt is the case in most deployments. Not sure if this is important for documentation, but it might be worth updating the image. I've updated it in the "Talking about hotspots" document |
I updated the 2 images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together! Few comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin-v-ngo updated according to your comments. tftr
@angles-n-daemons, please provide clarification to the unresolved conversations.
@angles-n-daemons tftr! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Approved pending small change. Thanks Florence!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! I learned a lot from this. Thanks @florence-crl. I left a bunch of non-blocking comments, these are all suggestions/questions I have for clarification. Let me know if you want me to take another look.
|
||
Because [`CURRENT_TIMESTAMP()`]({% link {{ page.version.version }}/functions-and-operators.md %}#date-and-time-functions) is a steadily increasing value, this [`UPDATE`]({% link {{ page.version.version }}/update.md %}) will similarly burden the range at the tail of the index. While hotspots on an index tail tend to be the most common, bottlenecks on the head are not unheard of. For example, indexing on `DESC` with the same insertion strategy will cause a hotspot. | ||
|
||
In this page, the phrase _index hotspot_ will be reserved for a hot by write hotspot on an index, even though indexes can become hot due to read. This is because a hot by write index hotspot is the most common hotspot pattern that occurs now and in the future as workloads continue to be migrated from legacy single-node installations. Hot by read index hotspots are defined later on this page as [_lookback hotspots_](#lookback-hotspots). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this paragraph a note? I would suggest "On this page"
|
||
**Synonyms:** Outbox hotspot | ||
|
||
A _queuing hotspot_ is a type of index hotspot that occurs when an workload treats CockroachDB like a distributed queue. This can happen if you implement the [Outbox microservice pattern]({% link {{ page.version.version }}/cdc-queries.md %}#queries-and-the-outbox-pattern). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A _queuing hotspot_ is a type of index hotspot that occurs when an workload treats CockroachDB like a distributed queue. This can happen if you implement the [Outbox microservice pattern]({% link {{ page.version.version }}/cdc-queries.md %}#queries-and-the-outbox-pattern). | |
A _queuing hotspot_ is a type of index hotspot that occurs when a workload treats CockroachDB like a distributed queue. This can happen if you implement the [Outbox microservice pattern]({% link {{ page.version.version }}/cdc-queries.md %}#queries-and-the-outbox-pattern). |
|
||
The following image visualizes a keyspace with multiple hot rows. In a large enough cluster, each of these rows can burden the range they live in, leading to multiple burdened nodes. | ||
|
||
<img src="{{ 'images/v25.1/hotspots-figure-7.png' | relative_url }}" alt="Multiple row hotspots example" style="border:1px solid #eee;max-width:100%" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about including celebrity names in a diagram — my feeling is that if we just use a user + number or even celebrity_a
, this will make the diagram more timeless and something we have control over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about including celebrity names in a diagram — my feeling is that if we just use a user + number or even celebrity_a, this will make the diagram more timeless and something we have control over.
|
||
The word _hotspot_ describes various skewed data access patterns in a [cluster]({% link {{ page.version.version }}/architecture/overview.md %}#cluster), often manifesting as higher [CPU]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#cpu) utilization on one or more [nodes]({% link {{ page.version.version }}/architecture/overview.md %}#node). Hotspots can also be based on [disk I/O]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#storage-and-disk-i-o), [memory]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#memory) usage, or other finite resources. Hotspots are troublesome because they are often limited to a fixed-size subset of the cluster’s resources, which puts them in a class of performance issues that cannot be solved by [scaling the cluster size]({% link {{ page.version.version }}/frequently-asked-questions.md %}#how-does-cockroachdb-scale). | ||
|
||
### Hot Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the subtitles, do we want to change to sentence case as per the style guide? Also, for the "Synonyms", there was a difference in capitalization, probably keep them all lower case?
|
||
<img src="{{ 'images/v25.1/hotspots-figure-6.png' | relative_url }}" alt="Single row hotspot example" style="border:1px solid #eee;max-width:100%" /> | ||
|
||
Without changing the default behavior of the system, the load will not be distributed because it needs to be served by a single range. This behavior is not just temporary; certain users, such as celebrities and influencers, may consistently experience a high volume of activity compared to the average user. This can result in a system with multiple hotspots, each of which can potentially overload the system at any moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without changing the default behavior of the system, the load will not be distributed because it needs to be served by a single range. This behavior is not just temporary; certain users, such as celebrities and influencers, may consistently experience a high volume of activity compared to the average user. This can result in a system with multiple hotspots, each of which can potentially overload the system at any moment. | |
Without changing the default behavior of the system, the load will not be distributed because it needs to be served by a single range. This behavior is not just temporary; certain users may consistently experience a high volume of activity compared to the average user. This can result in a system with multiple hotspots, each of which can potentially overload the system at any moment. |
I'm not sure that defining the users is necessary.
|
||
Because sequences avoid user expressions, optimizations can be made to improve their performance, but unfortunately the write volume on the sequence is still that of the sum total of all its accesses. | ||
|
||
[Sequence caching]({% link {{ page.version.version }}/create-sequence.md %}#cache-sequence-values-in-memory), which allows clients to cache sequence values as to reduce the burden on the target range, serves as a good mitigation for hot sequences. Alternatively, the `unique_rowid()` function generates sequential values which have strong guarantees against collision, with the drawback that its values are not a series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Sequence caching]({% link {{ page.version.version }}/create-sequence.md %}#cache-sequence-values-in-memory), which allows clients to cache sequence values as to reduce the burden on the target range, serves as a good mitigation for hot sequences. Alternatively, the `unique_rowid()` function generates sequential values which have strong guarantees against collision, with the drawback that its values are not a series. | |
[Sequence caching]({% link {{ page.version.version }}/create-sequence.md %}#cache-sequence-values-in-memory), which allows clients to cache sequence values to reduce the burden on the target range, serves as a good mitigation for hot sequences. Alternatively, the `unique_rowid()` function generates sequential values which have strong guarantees against collision, with the drawback that its values are not a series. |
|
||
<img src="{{ 'images/v25.1/hotspots-figure-9.png' | relative_url }}" alt="Table hotspot example" style="border:1px solid #eee;max-width:100%" /> | ||
|
||
Reads in the `posts` table may be evenly distributed, but joining the `countries` table becomes a bottleneck, since it exists in so few ranges. Splitting the `countries` table ranges can relieve pressure, but only to a theoretical limit as the indivisible points, the rows themselves, experience high throughput. [Global tables]({% link {{ page.version.version }}/global-tables.md %}) and [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) can help scaling in this case, especially when write throughput is low. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads in the `posts` table may be evenly distributed, but joining the `countries` table becomes a bottleneck, since it exists in so few ranges. Splitting the `countries` table ranges can relieve pressure, but only to a theoretical limit as the indivisible points, the rows themselves, experience high throughput. [Global tables]({% link {{ page.version.version }}/global-tables.md %}) and [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) can help scaling in this case, especially when write throughput is low. | |
Reads in the `posts` table may be evenly distributed, but joining the `countries` table becomes a bottleneck, since it exists in so few ranges. Splitting the `countries` table ranges can relieve pressure, but only to a limit as the indivisible rows experience high throughput. [Global tables]({% link {{ page.version.version }}/global-tables.md %}) and [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) can help scaling in this case, especially when write throughput is low. |
) STORED; | ||
~~~ | ||
|
||
In the `ALTER` statement, the first condition `province <> 'alabama'` checks whether the province is not `alabama`. It matches every single row in the table that is not `alabama`, and will ironically place them in the `us-east-1` region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sense is that the "ironically" may be lost on some users?
|
||
**Synonyms:** time-based hotspot | ||
|
||
_Temporal hotspots_ refer to increased database usage during particular windows of time. These take a variety of shapes, from event and holiday usage (such as Black Friday or the Superbowl), to synchronized job runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit split on the mentions of "Black Friday and the Superbowl". I think it's good to provide practical examples that users can understand, but my one worry was that they're both US-centric and I wonder whether they're timeless? Do you think just "from event and holiday usage" is enough without the parenthesis?
Fixes DOC-11961
Added understand-hotspots.md and corresponding images.
In sidebar-data/troubleshooting.json, added link to understand-hotspots.md.
Rendered preview: