Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow configurable shard range format thresholds for more flexible scaling #17688

Open
bluecrabs007 opened this issue Feb 3, 2025 · 1 comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@bluecrabs007
Copy link

bluecrabs007 commented Feb 3, 2025

Feature Description

key.GenerateShardRanges uses 2 hex digits in shard ranges when there are 256 or fewer shards.

vitess/go/vt/key/key.go

Lines 387 to 394 in 30c09f5

case shards <= 0:
return nil, errors.New("shards must be greater than zero")
case shards <= 256:
format = "%02x"
maxShards = 256
case shards <= 65536:
format = "%04x"
maxShards = 65536

Make the transition threshold (from 2-digit hex to 4-digit hex formatting) configurable, along with the needed changes in vitess-operator.
This would allow a keyspace at 128 shards to expand incrementally to a non power-of-two number (e.g., 160) that fits operational needs without the overhead of jumping all the way to 256 shards (the next power-of-two number after 128).
It would also prevent the uneven distribution caused when staying on two-digit hex ranges on non power-of-two shards.

Use Case(s)

We currently operate multiple sharded keyspaces where each keyspace uses a power-of-two number of shards. This helps avoid uneven distribution (or "banding") in the shard ranges. However, once a keyspace reaches 128 shards, the next power-of-two step is 256 shards—which can be excessive and wasteful if the keyspace truly only needs something like 160 shards.

We have observed that using two-digit hexadecimal ranges (%02x) on non power-of-two shards can cause distribution banding.
In smaller keyspaces, such as with 10 shards, the banding isn't that noticeable, the banding gets worse with increase in the number of shards in the keyspace. Switching to four-digit hexadecimal ranges (%04x) eliminates this banding but currently only happens when the shard count exceeds 256. If we lower the threshold to 128, we could comfortably scale from 128 to, say, 160 shards using four-digit ranges without incurring uneven data distribution.

We tested this internally and observed the benefits of using 4 digit ranges vs 2 digit ranges for 160 shards.

160 shards created with 2 digit shard ranges showing banding
Image
160 shards created with 4 digit shard ranges showing no banding
Image

This Feature Request along with #15744 would provide the utmost flexibility.

@bluecrabs007 bluecrabs007 added the Needs Triage This issue needs to be correctly labelled and triaged label Feb 3, 2025
@bluecrabs007
Copy link
Author

bluecrabs007 commented Feb 4, 2025

The difference between this feature request and #15744 is
With being able to customize the threshold of where the 2->4 switch happens, we can create a pathway for existing clusters to shard out. so existing clusters at 128 shards can shard out to 160 (as an example) where the target is created using 4 digit ranges.

#15744 can bring in a feature that lets one customize the small & big ranges.
for example, the default small is 2 ("%02x") and the default big is ("%04x") and the default threshold (via this feature request) can be at 256 - making these 2 features fully backwards compatible.

One can then leave the default small at 2 and customize the default big as 6 digits and set the threshold at 128, giving a pathway for a 128 shard keyspace, created with 2 digit ranges, to be able to reshard into a 160 shard keyspace with 6 digit ranges - as an example.

Hope this makes sense.

@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management and removed Needs Triage This issue needs to be correctly labelled and triaged labels Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

2 participants