Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: improve decoder alloc count #71475

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

oiweiwei
Copy link

@oiweiwei oiweiwei commented Jan 29, 2025

the allocs per op is twice as more for json.Decoder compared to json.Unmarshal to interface{}, while
the reason for excessive allocation is SyntaxError read-ahead generation (that is subsequently
discarded) leaving the memory imprint.

Copy link

google-cla bot commented Jan 29, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gopherbot
Copy link
Contributor

This PR (HEAD: 19c9367) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/645275.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://go.dev/doc/contribute#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://go.dev/s/release
for more details.


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@oiweiwei
Copy link
Author

Summary

when working with the json.Decoder implementation, I've noticed that the difference between using decoder and json.Unmarshal to any type seems to be not in Decoder favor. The allocs per op as per benchmarking bellow is twice as more for Decoder based implementation, while the reason for excessive allocation is SyntaxError read-ahead generation (that is subsequently discarded) leaving the memory imprint.

Solution

Delay the memory-affecting operations until they are really needed.

Details

Code to reproduce

I've written simple (key, value) enumerator as following:

package jsonkv

import (
	"fmt"
	"io"

	// "github.com/golang/go/src/encoding/json"
	"encoding/json"
)

type scope []bool

func (s *scope) array() bool {
	return len(*s) > 0 && (*s)[len(*s)-1]
}

func (s *scope) consume(tok json.Token) bool {
	switch tok := tok.(type) {
	case json.Delim:
		switch tok {
		case '{':
			(*s) = append(*s, false)
		case '[':
			(*s) = append(*s, true)
		case '}', ']':
			(*s) = (*s)[:len(*s)-1]
		}
	default:
		return false
	}
	return true
}

type KeyValuer struct {
	scope scope
	dec   *json.Decoder
	key   string
}

func (kv *KeyValuer) flush() string {
	key := kv.key
	kv.key = ""
	return key
}

func NewKeyValuer(r io.Reader) *KeyValuer {
	return &KeyValuer{dec: json.NewDecoder(r), scope: make(scope, 0)}
}

func (kv *KeyValuer) Reset(r io.Reader) {
	kv.dec = json.NewDecoder(r)
	if kv.scope != nil {
		kv.scope = kv.scope[:0]
	}
	kv.key = ""
}

// Range reads key and value pairs.
func (kv *KeyValuer) Range(f func(string, any) bool) error {
	for {
		key, val, err := kv.next()
		if err == io.EOF {
			break
		}
		if err != nil {
			return err
		}
		if !f(key, val) {
			break
		}
	}
	return nil
}

// First reads first occurence of the key.
func (kv *KeyValuer) First(key string) (any, error) {
	var val any
	err := kv.Range(func(k string, v any) bool {
		if k == key {
			val = v
			return false
		}
		return true
	})
	return val, err
}

// Next reads next key and value.
func (kv *KeyValuer) Next() (string, any, error) {
	return kv.next()
}

// next reads next key and value.
func (kv *KeyValuer) next() (string, any, error) {

	for {

		// read next token.
		tok, err := kv.dec.Token()
		if err != nil {
			return "", nil, err
		}

		// if delimiter is consumed or current scope is array, skip.
		if kv.scope.consume(tok) || kv.scope.array() {
			kv.flush() // clear key.
			continue
		}

		// if key is empty, set key.
		if kv.key == "" {
			if tok, ok := (any)(tok).(string); !ok {
				return "", nil, fmt.Errorf("unexpected token: %v", tok)
			} else {
				kv.key = tok
			}
			continue
		}

		// unwrap json.Number to string.
		if num, ok := tok.(json.Number); ok {
			return kv.flush(), string(num), nil
		}

		// return key and value.
		return kv.flush(), tok, nil
	}
}

Benchmark

package jsonkv

import (
	"encoding/json"
	"fmt"
	"strings"
	"testing"
)

var j = `[{"obj": {"name": "test1", "age": 20, "arr": ["a", "b", {"a": "c"}]}}, {"obj": {"name": "test2", "age": 30}}]`

func TestKeyValuerFirst(t *testing.T) {

	kv := NewKeyValuer(strings.NewReader(j))
	val, err := kv.First("a")
	if err != nil {
		t.Fatalf("error: %v", err)
	}
	if val != "c" {
		t.Fatalf("expected c, got %v", val)
	}
}

func BenchmarkKeyValuerFirst(b *testing.B) {
	r := strings.NewReader("")
	kv := NewKeyValuer(r)
	for i := 0; i < b.N; i++ {
		r.Reset(j)
		kv.Reset(r)
		if _, err := kv.First("a"); err != nil {
			b.Fatalf("error: %v", err)
		}
	}
}

func JSONUnmarshal(key string, b []byte) (any, error) {

	var v any

	err := json.Unmarshal(b, &v)
	if err != nil {
		return nil, err
	}

	varr, _ := v.([]any)
	for _, v := range varr {
		obj, ok := v.(map[string]any)
		if !ok {
			continue
		}
		objMap, ok := obj["obj"].(map[string]any)
		if !ok {
			continue
		}
		objArr, ok := objMap["arr"].([]any)
		if !ok {
			continue
		}
		for _, v := range objArr {
			obj, ok := v.(map[string]any)
			if !ok {
				continue
			}
			val, ok := obj[key]
			if !ok {
				continue
			}
			return val, nil
		}
	}

	return nil, fmt.Errorf("not found")
}

func BenchmarkUnmarshalJSON(b *testing.B) {

	bj := []byte(j)
	for i := 0; i < b.N; i++ {
		_, err := JSONUnmarshal("a", bj)
		if err != nil {
			b.Fatalf("error: %v", err)
		}
	}
}

Benchmark Results

benchmark results where quite surprising for me:

BenchmarkKeyValuerFirst-4   	  142459	      7541 ns/op	    2176 B/op	      82 allocs/op
BenchmarkUnmarshalJSON-4    	  236854	      5666 ns/op	    2320 B/op	      41 allocs/op

profile revealed:

Showing top 10 nodes out of 114
      flat  flat%   sum%        cum   cum%
     440ms 15.33% 15.33%      960ms 33.45%  runtime.mallocgc
     240ms  8.36% 23.69%      560ms 19.51%  runtime.concatstrings

when looking further into graph, it seems like the culprit code responsible for frequent call of concatstrings is error handling in decoder:

image

it turned out that error is emitted for every delimiter like ":", "," and so on. so, we need to delay the execution of error until the actual Error is being called.

Benchmark(-etts) After Fix

Applying these changes gives a signinficant benefit (x2) in allocations for decoder:

BenchmarkKeyValuerFirst-4   	  196369	      5413 ns/op	    1776 B/op	      42 allocs/op
BenchmarkUnmarshalJSON-4    	  215534	      5745 ns/op	    2320 B/op	      41 allocs/op

same shows profiler:

Showing top 10 nodes out of 133
      flat  flat%   sum%        cum   cum%
     600ms 18.07% 18.07%     1300ms 39.16%  runtime.mallocgc
     190ms  5.72% 23.80%      810ms 24.40%  github.com/golang/go/src/encoding/json.(*Decoder).readValue

@gopherbot
Copy link
Contributor

Message from Jorropo:

Patch Set 2: Commit-Queue+1


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go LUCI:

Patch Set 2:

Dry run: CV is trying the patch.

Bot data: {"action":"start","triggered_at":"2025-01-30T18:09:44Z","revision":"d964a0febbc9557a8b9eeb51351398483f3fa448"}


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Jorropo:

Patch Set 2: -Commit-Queue


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go LUCI:

Patch Set 2:

This CL has failed the run. Reason:

Tryjob golang/try/gotip-linux-amd64-boringcrypto has failed with summary (view all results):


Build or test failure, click here for results.

To reproduce, try gomote repro 8724295380854907297.

Additional links for debugging:


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go LUCI:

Patch Set 2: LUCI-TryBot-Result-1


Please don’t reply on this GitHub thread. Visit golang.org/cl/645275.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

This PR (HEAD: b0c5899) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/645275.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants