Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce jsontext.RawToken API for better number parsing #158

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

huww98
Copy link
Contributor

@huww98 huww98 commented Feb 12, 2025

Decoder now returns RawToken instead of Token. RawToken exposes a new set of methods:

ParseFloat(bits int) (float64, error)
ParseInt(bits int) (int64, error)
ParseUint(bits int) (uint64, error)

They reuses the same logic of arshalers, intended to offers Decoder user a more efficient and convenient way to parse numbers.

  • Compared with the original Token.Float, these methods properly return errors for overflow, etc.
  • Compared with strconv, these methods may take advantage of the fact that json number has simpler syntax than Go.
  • Compared with json.UnmarshalDecode, these methods avoids the cost of arshaler lookup and reflection.

New jsontext.Raw and Token.Raw func are added to convert between RawToken and Token.

The original Float, Int, Uint methods now only support the case where the Token is created by the correspond constructor. e.g., Int(1).Uint() will panic. This should ensure we will never get inaccurate values.

Other behavior changes:

  • Do not set value on float overflow #156
  • v1.Decoder.Token now returns error for parsing float, this is actually consistent with current v1 (maybe the error message should be polished more).
  • jsontext: remove support for write Float(+Inf) as "Infinity". Implemented in the last commit. This feature is not standard and should not be enabled by default. Commit message contains more explanation. nonfinite is still supported in json package.

This is inspired by the idea in golang/go#63397 (reply in thread) . I expand the idea of @neild , and add a new RawToken type. Now

  • RawToken is used in Decoding, it only have ParseFloat intended for decode
  • Token is still used in Encoding, Token.Float now become a simple accessor to retrieve the value set by jsontext.Float.

With this design, we give up support for any inaccurate cases. Then we can answer the edge cases that @dsnet raised in the discussion naturally:

  • What should rawToken("123.456").ParseInt(32) report?
    • 0, strconv.ErrSyntax, just like strconv.ParseInt
  • What should jsontext.Float(123.456).Int() report?
    • panics
  • What should rawToken("1e2").ParseInt(32) report?
    • 0, strconv.ErrSyntax, just like strconv.ParseInt
  • What should jsontext.Float(1e2).Int() report?
    • panics

This is consistent with int overflow and json v1
These can be reused in jsontext.

Note that jsonwire.ParseFloat now returns inf for overflow. But this should not
impact user-visible behavior, because we don't set value when overflow.
Decoder now returns RawToken instead of Token.  RawToken exposes a new set of
methods:

  ParseFloat(bits int) (float64, error)
  ParseInt(bits int) (int64, error)
  ParseUint(bits int) (uint64, error)

They reuses the same logic of arshalers, intended to offers Decoder user a
more efficient and convenient way to parse numbers. Compared with the original
Token.Float, these methods properly return errors for overflow, etc. Compared
with strconv, these methods may take advantage of the fact that json number has
simpler syntax than Go. Compared with json.UnmarshalDecode, these methods
avoids the cost of arshaler lookup and reflection.

New jsontext.Raw and Token.Raw func are added to convert between RawToken and
Token.

The original Float, Int, Uint methods now only support the case where the Token
is created by the correspond constructor. e.g., Int(1).Uint() will panic.  This
should ensure we will never get inaccurate values.
Similarly, Parsing "Infinity" as +Inf is also removed

This feature is not standard and should not be enabled by default.  If this is
really desired, user can still write a wrapper that invokes Float or String
conditionally. Instead of embedding the maybe undesired branches in jsontext.

It is harder to get consistent API design with the new RawToken refactor. Now
Token.Float() should only process tokens returned by jsontext.Float, we need to
differentate infinity float and infinity string. So what kind should the
infinity float be?
- if '0': Good for that it cannot be used as object key, but can make
  Token.String() confusing; also confusing for that a '0' token actually
  encodes to string.
- if '"': Need many extra branching in Token.Kind and Token.appendString. And
  the kind returned by jsontext.Float() is not consistent.

So, just remove these.
@dsnet
Copy link
Collaborator

dsnet commented Feb 12, 2025

Hi, thanks for the contribution.

I don't see a compelling reason to increase the API surface by an extra type, several methods, and break symmetry between Encoder and Decoder for this functionality. I still stand by my comment, that this can done by the user with:

tok, ... = dec.ReadToken()
f, err := strconv.ParseFloat(tok.String(), 64)

or

val, ... = dec.ReadValue()
f, err := strconv.ParseFloat(string(val), 64)

The Int, Uint, and Float methods are intended to be accessors to mirror the Int, Uint, and Float constructors and there is still a usability benefit to these error-less methods.

In the future, if we want to add a checked form of parsing functionality, we can add ParseInt, ParseUint, and ParseFloat methods to Token. This is API that can be added on top of the currently proposed API.

@huww98
Copy link
Contributor Author

huww98 commented Feb 12, 2025

break symmetry between Encoder and Decoder for this functionality.

If I draw it out, it is still pretty symmetric
image

tok, ... = dec.ReadToken()
f, err := strconv.ParseFloat(tok.String(), 64)

Reasonable, I'm actually using this in my project. But as I say in PR description: "Compared with strconv, these methods may take advantage of the fact that json number has simpler syntax than Go"

The Int, Uint, and Float methods are intended to be accessors to mirror the Int, Uint, and Float constructors and there is still a usability benefit to these error-less methods.

Totally agreed. So these methods are preserved in Token type. I just want to prevent the usages like:

tok, ... = dec.ReadToken()
f := tok.Float()

This is almost certainly a misuse of API because it ignores error. I want to make this a compilation error. While jsontext.Float(1.2).Float() is still valid.

In the future, if we want to add a checked form of parsing functionality, we can add ParseInt, ParseUint, and ParseFloat methods to Token. This is API that can be added on top of the currently proposed API.

I add a new type so that we can make this a compilation error: jsontext.Float(1.2).ParseInt(64). I want ParseInt to only support Decoder use case, but not any token constructed by user.

Plus, the size of RawToken is only 16B, which is only half of Token. This should benefits decoder only workload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants