[Repost] Why is React not creating a ReadableStream in pipeToNodeWritable? #132

gaearon · 2023-11-08T04:25:10Z

gaearon
Nov 8, 2023
Maintainer

There was a discussion about this but the author appears to have deleted their GitHub so it only exists on the WayBack machine. I want to bring this back since people are interested in this question and it's hard to find it otherwise.

I won't copy and paste the deleted comment since I didn't ask for their permission to do so, but the question was essentially the title — why the pipeToNodeWritable API takes a writable stream instead of returning a readable stream.

The answer from @sebmarkbage looked like this:

Note that we used to have pipeToNodeReadable and this is actually still in the build and even has Suspense support. However, we'll add a warning that it's deprecated. So it's not like we can't do it - but it's very intentional that we think it'll be a bad idea for almost everyone.

We need to be able to flush all the way to the underlying target once you have enough data to show progressive content. Meaning once we flush something it should go out directly to the user. Like if we've completed the shell but is missing data for one section we can send the shell.

Transforms often buffer content up until some level. This means that if you have a transform after React, it might be holding onto that shell a bit. Even if it sends a few of the bytes, if it doesn't send the script tag that displays it, it won't show up. Then once we get more data to complete the render, we emit a few more bytes on the stream but then the whole content is done.

The effect of this is that the user sees nothing and then at the end sees everything. So a bad intermediate transform can completely destroy the whole point of it. However, it gets worse because the format we use to send things is larger and more CPU intensive (e.g. injecting script tags instead of just inline HTML). That's only a benefit if it's actually streaming. So we rely on backpressure to tell us that it's better to wait to buffer up more content so we can generate inline HTML.

Readable streams hack backpressure too but that's a one way communication. There's no way for the stream to tell the next stream that now is a good time to flush - even if your window is not full.

The most common example of this is GZIP in Node. GZIP compresses in windows of some maximum byte range for best effectiveness. Therefore it waits to fill a buffer before compressing that buffer. If we fill it only half way then it'll block the bytes from getting to the user. Luckily this is a well known issue and so there exists APIs to solve this. To avoid this we call flush() on the writable stream, if it exists, which ensures that it flushes early.

It's not just GZIP neither because many transforms and naive code don't consider this use case so they likely won't work out of the box anyway.

This feature only exists on writable streams and there's nothing in the readable protocol that forwards this. Similarly if you use async generators to process this data you'd naively also loose this signal. Basically those APIs just doesn't work.

Web Streams doesn't actually have anyway to do this so are kind of fundamentally flawed for this use case atm. That's why we accept the status quo and provide a Readable in for the Web version of this API. But once there is an idiomatic solution for this, we'd switch to that API - even if that means using a Writable instead. However, Web Streams (e.g. for use in CloudFlare Workers and Deno) seems fundamentally flawed atm - at least if you need to apply compression.

None of this applies to transitional SSG though since you probably don't want to stream as you load data but rather wait for completion before starting the stream. I'm hesitant to provide an API for this specifically though because it's so rare that you'd only do SSG and never SSR since many systems are moving into a hybrid model anyway. So it would be misleading to rely on it and build infrastructure on top of it that doesn't port to SSR, or worse, someone just using it without reading details like this post.

It's a small nit but even with SSG it would be more optimal when applying the progressiveChunkSize option (for encoding large HTML pages to be streaming using script tags inserting the content). You should ideally encode the GZIP stream taking flush() calls into account and SSG the gzipped version directly in the JS build. (I.e. don't let other cloud infra gzip it for you.) Because that will ensure that the resulting GZIP stream is split into chunks at appropriate boundary so that when they arrive piecemeal to the client, they don't end up having to be buffered on the client causing the same problem. Like you don't want the GZIP chunks spanning a script tag so that you have to wait for part of the next content to load before you can show the previous content. A random GZIP compressor running on top of the raw HTML without knowledge of these boundaries, wouldn't know about that. That's what the flush() calls provide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repost] Why is React not creating a ReadableStream in pipeToNodeWritable? #132

{{title}}

Replies: 0 comments

Select a reply

[Repost] Why is React not creating a ReadableStream in pipeToNodeWritable? #132

gaearon Nov 8, 2023 Maintainer

Replies: 0 comments

gaearon
Nov 8, 2023
Maintainer