Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: feat: Input type must be Default+Serde but not POD #82

Merged
merged 1 commit into from
Dec 14, 2024

Conversation

caspark
Copy link
Contributor

@caspark caspark commented Nov 13, 2024

Change Description

(from commit msg)

Remove the bytemuck bounds on Config::Input, and replace them with Default and Serialize + DeserializeOwned.

  1. Default is a less strict bound than Zeroable, and it's all we need - so it's more ergonomic.
  2. NoUninit + CheckedBitPattern are also fairly stringent requirements, but it turns out that the only place that we actually rely on them is when we serialize or deserialize the input type. So we can just rely on the relevant Serde trait bounds instead, which again is more ergonomic.

Both changes are backwards incompatible, though they should be quite easy to resolve for existing users of GGRS (unless they have a philosophical objection to using Serde directly, I suppose).

However, number 2 does have some significant tradeoffs associated with it; in order of least to most bad:

  • A) Serde becomes part of our public API.
  • B) Bincode's chosen serialization scheme may interfere with the "XOR with previous input then run length encode the result" trick we use to shrink the size of inputs when we send them to connected endpoints. If this is a problem for a user, they can always manually serialize to a byte array in such a way that the run length encoding works better for their data.
  • C) Users may bloat their input struct to the point where it (or N instances of it, where N == number of local players) is larger than the maximum size we limit ourselves to for a single UDP packet. This is particularly problematic since right now we just panic (due to a failed assertion) when this happens.

Discussion

Firstly, I am aware that you can already do "custom serialization of an arbitrary input type" (hereafter referred to as W) by declaring Config::Input as [u8; N], with some carefully-chosen-to-be-big-enough N; the XOR + RLE that's done on input bytes should make the cost of any extra padding bytes (over and above what's needed to store the arbitrary input type) close to zero, as any long run of trailing zeros would be compressed to just a couple of bytes.

Now having said that, requiring various Bytemuck bounds on Input is quite a strong stipulation, and it sure would be nice if you could e.g. use enums with fields ( #40 ), or even just plain old bools in your input type.

So that's what this PR gets you: any type you can serialize, you can stick into your Config::Input struct - which is nice and convenient.

But: I am not exactly sure this is a wise PR to merge, mainly because of tradeoff C listed above (B seems manageable, A seems negligible). In fact, I don't think it would make sense to merge this if we don't come up with a good way to mitigate the impact of C, because clearly "crash when enough people press enough buttons" is pretty terrible behavior.

One approach (let's call it X) for mitigating C would be to treat the current assert as an error and propagate it all the way up to user code for handling. But I suspect that would be quite messy to do, and what do you do if you get that error at that point? I think there isn't really much in the way of meaningful error handling.

Another approach I can think of (Y) is that when you set a local player's input, GGRS encodes it together with other inputs this frame so far to check the total size, and if "size of these extra bytes plus the size of other local inputs for this frame so far" exceeds the maximum packet size, then return an error. The user could then choose to drop the input on the floor (or defer it to a later frame) by replacing it with a default input, which would (hopefully) fit.. but there's still a possibility that it wouldn't fit, and then the only sensible thing to do is to crash anyway.

(And besides, the "shape" of solution Y looks very similar to what's possible today via W, but at least in that approach the user is explicitly forced to think about the size of their byte array, rather than attempting to deal with their data not fitting later. Y does open up the possibility of larger input structs since the check could be done after the XOR + RLE, but that would also make it less predictable when errors occur.)

So.. that's where I'm at. I started down this rabbit hole thinking this approach probably wouldn't work, then it turns out it works quite nicely, but it potentially encourages running into a currently-quite-niche edge case - and I haven't got a nice design solution for that.

Still, I figured I'd write up and raise the PR to give others a chance to weigh in (and there's always the chance I've misunderstood something; I'm still new to this crate).

@caspark caspark force-pushed the no-bytemuck-bounds-on-input branch 2 times, most recently from f90605e to 6dcf1c1 Compare November 13, 2024 02:38
@gschup
Copy link
Owner

gschup commented Nov 20, 2024

Hi, thanks for the PR! I understand the sentiment behind trying to get away from bytemuck. From a ease-of-use perspective, I prefer the serde bounds.

My only concern is performance, which was the motivation to pick bytemuck in the first place. It might be over-optimized and probably doesn't matter, but I would like to see some numbers in order to understand if this approach is similar in performance.

@caspark
Copy link
Contributor Author

caspark commented Nov 20, 2024

Regarding performance, the bytemuck code path is (afaik) approximately C's calloc + Rust's transmute - it is basically impossible to beat that. But, well...

  • we are talking about initializing/serializing/deserializing structs that are strictly less than 467 bytes, typically closer to 10-100 bytes (before XOR+RLE), and doing that perhaps 1-4 times in the context of doing network IO in the usual case or 1-4x multiplied by up to 8 advance-frames of the main game simulation in the worst case.
  • using the log benchmark from the rust serialization benchmark as a (probably pessimistic?) proxy for serializing and deserializing some game input, I figure it's approximately 300 nanoseconds to serialize and deserialize a single log entry using bincode1 (set bandwidth to many many tb, CPU to 1, dataset to log, mode to roundtrip, untick zlib & zstd, find bincode1 entry, divide 1 second by 3421986 logs/second -> approx 292 nanoseconds per log).

I personally don't think it's worth doing any benchmarking or number crunching beyond those back of the envelope calculations?

I think point C I discussed above is much more of an issue.

FWIW I think even if we don't1 merge this particular PR due to that concern, replacing the Zeroable bound with a Default bound would still improve ergonomics.

Footnotes

  1. I am actually fine with not merging this PR - I have applied the "input is a fixed size byte array" workaround myself and it was easy enough and works fine. So I don't really need to have the Bytemuck bounds relaxed myself.

@gschup
Copy link
Owner

gschup commented Dec 12, 2024

I thought a little bit about this and I think it might be best to just simplify these bounds as you did in the PR. If users start to use unreasonably large input structs, then we can help them optimize afterwards.

If you clean up the merge conflicts, I will merge this and make a new GGRS release. We got enough changes by now :)

@caspark
Copy link
Contributor Author

caspark commented Dec 13, 2024

Will do - don't have time today but hope to find some time over the weekend.

Re new release, #79 introduced 2 bugs that I still need to PR:

  • rolling back to furthest frame in prediction window currently panics - easy to trigger on bad connections. I have a fix for this already.
  • disconnects aren't handled properly. I fixed this for non-lockstep code but lockstep mode still crashes so I need to find another hour or two to investigate more.

I haven't raised issues for these yet1, but still, I would recommend holding off on releasing for a bit :)

Footnotes

  1. The new issue template for this repo is extensive enough that by the time I've gathered enough info to fill out the template then I'll have figured out the bug well enough to just raise the PR. Sometimes that's a good thing though.

@gschup
Copy link
Owner

gschup commented Dec 13, 2024

  1. The new issue template for this repo is extensive enough that by the time I've gathered enough info to fill out the template then I'll have figured out the bug well enough to just raise the PR. Sometimes that's a good thing though.

This is simply the default template proposed by github. Please don't feel restricted by it. We could also change it!

Remove the bytemuck bounds on Config::Input, and replace them with
Default and Serialize + DeserializeOwned.

1. Default is a less strict bound than Zeroable, and it's all we need -
   so it's more ergonomic.
2. NoUninit + CheckedBitPattern are also fairly stringent requirements,
   but it turns out that the only place that we actually rely on them is
   when we serialize or deserialize the input type. So we can just rely
   on the relevant Serde trait bounds instead, which again is more
   ergonomic.

Both changes are backwards incompatible, though they should be quite
easy to resolve for existing users of GGRS (unless they have a
philosophical objection to using Serde directly, I suppose).

However, number 2 does have some significant tradeoffs associated with
it; in order of least to most bad:

* Serde becomes part of our public API.
* Bincode's chosen serialization scheme may interfere with the "XOR with
  previous input then run length encode the result" trick we use to
  shrink the size of inputs when we send them to connected endpoints. If
  this is a problem for a user, they can always manually serialize to a
  byte array in such a way that the run length encoding works better for
  their data.
* Users may bloat their input struct to the point where it (or N
  instances of it, where N == number of local players) is larger than
  the maximum size we limit ourselves to for a single UDP packet. This
  is particularly problematic since right now we just panic (due to a
  failed assertion) when this happens.
@caspark caspark force-pushed the no-bytemuck-bounds-on-input branch from 6dcf1c1 to a06b63b Compare December 14, 2024 06:37
@caspark
Copy link
Contributor Author

caspark commented Dec 14, 2024

I've rebased this change onto main now and fixed the merge conflicts. With #87 "fixing" problem C, I think it should be ready for merge.

Re issue template being the default github one - ha, TIL! I've not turned it on for any projects so I had no idea :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants