-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: improve decoder alloc count #71475
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This PR (HEAD: 19c9367) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/645275. Important tips:
|
Message from Gopher Robot: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Message from Gopher Robot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Summarywhen working with the json.Decoder implementation, I've noticed that the difference between using decoder and json.Unmarshal to SolutionDelay the memory-affecting operations until they are really needed. DetailsCode to reproduceI've written simple (key, value) enumerator as following: package jsonkv
import (
"fmt"
"io"
// "github.com/golang/go/src/encoding/json"
"encoding/json"
)
type scope []bool
func (s *scope) array() bool {
return len(*s) > 0 && (*s)[len(*s)-1]
}
func (s *scope) consume(tok json.Token) bool {
switch tok := tok.(type) {
case json.Delim:
switch tok {
case '{':
(*s) = append(*s, false)
case '[':
(*s) = append(*s, true)
case '}', ']':
(*s) = (*s)[:len(*s)-1]
}
default:
return false
}
return true
}
type KeyValuer struct {
scope scope
dec *json.Decoder
key string
}
func (kv *KeyValuer) flush() string {
key := kv.key
kv.key = ""
return key
}
func NewKeyValuer(r io.Reader) *KeyValuer {
return &KeyValuer{dec: json.NewDecoder(r), scope: make(scope, 0)}
}
func (kv *KeyValuer) Reset(r io.Reader) {
kv.dec = json.NewDecoder(r)
if kv.scope != nil {
kv.scope = kv.scope[:0]
}
kv.key = ""
}
// Range reads key and value pairs.
func (kv *KeyValuer) Range(f func(string, any) bool) error {
for {
key, val, err := kv.next()
if err == io.EOF {
break
}
if err != nil {
return err
}
if !f(key, val) {
break
}
}
return nil
}
// First reads first occurence of the key.
func (kv *KeyValuer) First(key string) (any, error) {
var val any
err := kv.Range(func(k string, v any) bool {
if k == key {
val = v
return false
}
return true
})
return val, err
}
// Next reads next key and value.
func (kv *KeyValuer) Next() (string, any, error) {
return kv.next()
}
// next reads next key and value.
func (kv *KeyValuer) next() (string, any, error) {
for {
// read next token.
tok, err := kv.dec.Token()
if err != nil {
return "", nil, err
}
// if delimiter is consumed or current scope is array, skip.
if kv.scope.consume(tok) || kv.scope.array() {
kv.flush() // clear key.
continue
}
// if key is empty, set key.
if kv.key == "" {
if tok, ok := (any)(tok).(string); !ok {
return "", nil, fmt.Errorf("unexpected token: %v", tok)
} else {
kv.key = tok
}
continue
}
// unwrap json.Number to string.
if num, ok := tok.(json.Number); ok {
return kv.flush(), string(num), nil
}
// return key and value.
return kv.flush(), tok, nil
}
} Benchmarkpackage jsonkv
import (
"encoding/json"
"fmt"
"strings"
"testing"
)
var j = `[{"obj": {"name": "test1", "age": 20, "arr": ["a", "b", {"a": "c"}]}}, {"obj": {"name": "test2", "age": 30}}]`
func TestKeyValuerFirst(t *testing.T) {
kv := NewKeyValuer(strings.NewReader(j))
val, err := kv.First("a")
if err != nil {
t.Fatalf("error: %v", err)
}
if val != "c" {
t.Fatalf("expected c, got %v", val)
}
}
func BenchmarkKeyValuerFirst(b *testing.B) {
r := strings.NewReader("")
kv := NewKeyValuer(r)
for i := 0; i < b.N; i++ {
r.Reset(j)
kv.Reset(r)
if _, err := kv.First("a"); err != nil {
b.Fatalf("error: %v", err)
}
}
}
func JSONUnmarshal(key string, b []byte) (any, error) {
var v any
err := json.Unmarshal(b, &v)
if err != nil {
return nil, err
}
varr, _ := v.([]any)
for _, v := range varr {
obj, ok := v.(map[string]any)
if !ok {
continue
}
objMap, ok := obj["obj"].(map[string]any)
if !ok {
continue
}
objArr, ok := objMap["arr"].([]any)
if !ok {
continue
}
for _, v := range objArr {
obj, ok := v.(map[string]any)
if !ok {
continue
}
val, ok := obj[key]
if !ok {
continue
}
return val, nil
}
}
return nil, fmt.Errorf("not found")
}
func BenchmarkUnmarshalJSON(b *testing.B) {
bj := []byte(j)
for i := 0; i < b.N; i++ {
_, err := JSONUnmarshal("a", bj)
if err != nil {
b.Fatalf("error: %v", err)
}
}
} Benchmark Resultsbenchmark results where quite surprising for me:
profile revealed:
when looking further into graph, it seems like the culprit code responsible for frequent call of concatstrings is error handling in decoder: it turned out that error is emitted for every delimiter like ":", "," and so on. so, we need to delay the execution of error until the actual Error is being called. Benchmark(-etts) After FixApplying these changes gives a signinficant benefit (x2) in allocations for decoder:
same shows profiler:
|
Message from Jorropo: Patch Set 2: Commit-Queue+1 Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Message from Go LUCI: Patch Set 2: Dry run: CV is trying the patch. Bot data: {"action":"start","triggered_at":"2025-01-30T18:09:44Z","revision":"d964a0febbc9557a8b9eeb51351398483f3fa448"} Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Message from Jorropo: Patch Set 2: -Commit-Queue Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Message from Go LUCI: Patch Set 2: This CL has failed the run. Reason: Tryjob golang/try/gotip-linux-amd64-boringcrypto has failed with summary (view all results):
Build or test failure, click here for results. To reproduce, try Additional links for debugging: Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
Message from Go LUCI: Patch Set 2: LUCI-TryBot-Result-1 Please don’t reply on this GitHub thread. Visit golang.org/cl/645275. |
This PR (HEAD: b0c5899) has been imported to Gerrit for code review. Please visit Gerrit at https://go-review.googlesource.com/c/go/+/645275. Important tips:
|
the allocs per op is twice as more for json.Decoder compared to json.Unmarshal to interface{}, while
the reason for excessive allocation is SyntaxError read-ahead generation (that is subsequently
discarded) leaving the memory imprint.