Introduce jsontext.RawToken API for better number parsing #158

huww98 · 2025-02-12T05:52:13Z

Decoder now returns RawToken instead of Token. RawToken exposes a new set of methods:

ParseFloat(bits int) (float64, error)
ParseInt(bits int) (int64, error)
ParseUint(bits int) (uint64, error)

They reuses the same logic of arshalers, intended to offers Decoder user a more efficient and convenient way to parse numbers.

Compared with the original Token.Float, these methods properly return errors for overflow, etc.
Compared with strconv, these methods may take advantage of the fact that json number has simpler syntax than Go.
Compared with json.UnmarshalDecode, these methods avoids the cost of arshaler lookup and reflection.

New jsontext.Raw and Token.Raw func are added to convert between RawToken and Token.

The original Float, Int, Uint methods now only support the case where the Token is created by the correspond constructor. e.g., Int(1).Uint() will panic. This should ensure we will never get inaccurate values.

Other behavior changes:

Do not set value on float overflow #156
v1.Decoder.Token now returns error for parsing float, this is actually consistent with current v1 (maybe the error message should be polished more).
jsontext: remove support for write Float(+Inf) as "Infinity". Implemented in the last commit. This feature is not standard and should not be enabled by default. Commit message contains more explanation. nonfinite is still supported in json package.

This is inspired by the idea in golang/go#63397 (reply in thread) . I expand the idea of @neild , and add a new RawToken type. Now

RawToken is used in Decoding, it only have ParseFloat intended for decode
Token is still used in Encoding, Token.Float now become a simple accessor to retrieve the value set by jsontext.Float.

With this design, we give up support for any inaccurate cases. Then we can answer the edge cases that @dsnet raised in the discussion naturally:

What should rawToken("123.456").ParseInt(32) report?
- 0, strconv.ErrSyntax, just like strconv.ParseInt
What should jsontext.Float(123.456).Int() report?
- panics
What should rawToken("1e2").ParseInt(32) report?
- 0, strconv.ErrSyntax, just like strconv.ParseInt
What should jsontext.Float(1e2).Int() report?
- panics

This is consistent with int overflow and json v1

These can be reused in jsontext. Note that jsonwire.ParseFloat now returns inf for overflow. But this should not impact user-visible behavior, because we don't set value when overflow.

Decoder now returns RawToken instead of Token. RawToken exposes a new set of methods: ParseFloat(bits int) (float64, error) ParseInt(bits int) (int64, error) ParseUint(bits int) (uint64, error) They reuses the same logic of arshalers, intended to offers Decoder user a more efficient and convenient way to parse numbers. Compared with the original Token.Float, these methods properly return errors for overflow, etc. Compared with strconv, these methods may take advantage of the fact that json number has simpler syntax than Go. Compared with json.UnmarshalDecode, these methods avoids the cost of arshaler lookup and reflection. New jsontext.Raw and Token.Raw func are added to convert between RawToken and Token. The original Float, Int, Uint methods now only support the case where the Token is created by the correspond constructor. e.g., Int(1).Uint() will panic. This should ensure we will never get inaccurate values.

Similarly, Parsing "Infinity" as +Inf is also removed This feature is not standard and should not be enabled by default. If this is really desired, user can still write a wrapper that invokes Float or String conditionally. Instead of embedding the maybe undesired branches in jsontext. It is harder to get consistent API design with the new RawToken refactor. Now Token.Float() should only process tokens returned by jsontext.Float, we need to differentate infinity float and infinity string. So what kind should the infinity float be? - if '0': Good for that it cannot be used as object key, but can make Token.String() confusing; also confusing for that a '0' token actually encodes to string. - if '"': Need many extra branching in Token.Kind and Token.appendString. And the kind returned by jsontext.Float() is not consistent. So, just remove these.

dsnet · 2025-02-12T07:18:04Z

Hi, thanks for the contribution.

I don't see a compelling reason to increase the API surface by an extra type, several methods, and break symmetry between Encoder and Decoder for this functionality. I still stand by my comment, that this can done by the user with:

tok, ... = dec.ReadToken()
f, err := strconv.ParseFloat(tok.String(), 64)

or

val, ... = dec.ReadValue()
f, err := strconv.ParseFloat(string(val), 64)

The Int, Uint, and Float methods are intended to be accessors to mirror the Int, Uint, and Float constructors and there is still a usability benefit to these error-less methods.

In the future, if we want to add a checked form of parsing functionality, we can add ParseInt, ParseUint, and ParseFloat methods to Token. This is API that can be added on top of the currently proposed API.

huww98 · 2025-02-12T07:39:21Z

break symmetry between Encoder and Decoder for this functionality.

If I draw it out, it is still pretty symmetric

tok, ... = dec.ReadToken()
f, err := strconv.ParseFloat(tok.String(), 64)

Reasonable, I'm actually using this in my project. But as I say in PR description: "Compared with strconv, these methods may take advantage of the fact that json number has simpler syntax than Go"

The Int, Uint, and Float methods are intended to be accessors to mirror the Int, Uint, and Float constructors and there is still a usability benefit to these error-less methods.

Totally agreed. So these methods are preserved in Token type. I just want to prevent the usages like:

tok, ... = dec.ReadToken()
f := tok.Float()

This is almost certainly a misuse of API because it ignores error. I want to make this a compilation error. While jsontext.Float(1.2).Float() is still valid.

In the future, if we want to add a checked form of parsing functionality, we can add ParseInt, ParseUint, and ParseFloat methods to Token. This is API that can be added on top of the currently proposed API.

I add a new type so that we can make this a compilation error: jsontext.Float(1.2).ParseInt(64). I want ParseInt to only support Decoder use case, but not any token constructed by user.

Plus, the size of RawToken is only 16B, which is only half of Token. This should benefits decoder only workload.

huww98 added 4 commits February 10, 2025 01:51

Do not set value on float overflow

3339bc0

This is consistent with int overflow and json v1

Refactor number parsing complexity to jsonwire package

f357195

These can be reused in jsontext. Note that jsonwire.ParseFloat now returns inf for overflow. But this should not impact user-visible behavior, because we don't set value when overflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce jsontext.RawToken API for better number parsing #158

Introduce jsontext.RawToken API for better number parsing #158

huww98 commented Feb 12, 2025

dsnet commented Feb 12, 2025

huww98 commented Feb 12, 2025

Introduce jsontext.RawToken API for better number parsing #158

Are you sure you want to change the base?

Introduce jsontext.RawToken API for better number parsing #158

Conversation

huww98 commented Feb 12, 2025

dsnet commented Feb 12, 2025

huww98 commented Feb 12, 2025