Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kaf group commit unexpectedly joins consumer group #381

Open
DanInProgress opened this issue Feb 18, 2025 · 1 comment
Open

kaf group commit unexpectedly joins consumer group #381

DanInProgress opened this issue Feb 18, 2025 · 1 comment

Comments

@DanInProgress
Copy link

Noticed a surprising difference in behavior between kaf group commit and kafka-consumer-groups.sh while investigating the expiration/deletion of a consumer group in one of our kafka clusters.

When using kaf group commit on a group with protocolType of (Empty), the group's protocolType changes to consumer. Groups of type consumer expire their offsets after offset.retention.ms if no members have joined the group and subscribed. This is important to note because existing groups of type (Empty) are unlikely to ever have a member that will join or subscribe, as this is typically used for consumer groups that manually assign partitions.

I can't completely rule out that this is intended behavior, but it's definitely surprising for someone using kaf as a drop-in replacement for the kafka cli utils. In addition, I think the method used contains a race condition that can cause the kaf to try to commit offsets to a group with active consumers. I'm not sure this will actually function, but it definitely looks like a risk.

Quick steps to reproduce

  1. Create a consumer group using kafka-consumer-groups.sh --reset-offsets ...
  2. Use kaf group commit to modify that offset
  3. Observe that the group's protocolType changes to consumer

Code path

In the resetHandler, kaf commits offsets using a s.GenerationID() from the session, which implies that it has joined the group at that point.

// from `cmd/kaf/group.go`
func (r *resetHandler) Setup(s sarama.ConsumerGroupSession) error {
req := &sarama.OffsetCommitRequest{
    Version:                 1,
    ConsumerGroup:           r.group,
    ConsumerGroupGeneration: s.GenerationID(),
    ConsumerID:              s.MemberID(),
}
...

When looking at how this is done in the admin client, we see it use a generationID of -1, which is used as a flag that this is not a real consumer.

Race condition?

  1. First kaf uses the admin client is used to check if the group is empty.
// from `cmd/kaf/group.go`
// Verify the Consumer Group is Empty
admin := getClusterAdmin()
groupDescs, err := admin.DescribeConsumerGroups([]string{args[0]})
  1. Then an unrelated consumer joins the group
  2. Then kaf joins the group and commits offsets?
// from `cmd/kaf/group.go`
g, err := sarama.NewConsumerGroupFromClient(group, client)
...
err = g.Consume(context.Background(), []string{topic}, &resetHandler{
...

Possible Fixes

I think the right fix is to use a, currently nonexistent in sarama, admin client api to commit the offsets. I need to look into this a bit more, but wanted to check first. Alternatively, we can directly use a manually constructed coordinator to make a commit and will be rejected if we use generation id -1 and there are active consumers. (removing the need for the admin api check for empty)

Background

We discovered this issue while investigating the expiration of a consumer group that was manually handling the assignment of partitions (there is 1 partition and 1 member in the group). The group was frequently committing offsets, but had not "joined" or "subscribed" because that wasn't needed. The offsets eventually expired and the group was deleted. We were having trouble explaining why this happened at the specific time that it did, and we eventually tracked it down to an operations ticket that used the kaf tool to move its offsets approximately offset.retention.ms before the group was deleted.

@DanInProgress
Copy link
Author

DanInProgress commented Feb 19, 2025

FYI, IBM/sarama#1669 so it looks acknowledged this would be much easier to implement properly with an admin API in sarama.

This commit/mr seems to sort of lay out the codepaths that would be involved in implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant