-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: download browsers as TAR #34033
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts
Outdated
Show resolved
Hide resolved
@@ -10,6 +10,7 @@ | |||
}, | |||
"dependencies": { | |||
"extract-zip": "2.0.1", | |||
"tar-fs": "^3.0.6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the library is popular, but its deps list seem to be excessive for what it does a little. Did we consider alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, "tar" is even more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we considered tar-fs
, tar
and writing our own. Writing our own turned out more complex than imagined, because webkit has very long path names and the format becomes tricky when that's involved. Of the three, tar-fs
seemed the most focused.
@@ -48,8 +48,8 @@ | |||
"revision": "1011", | |||
"installByDefault": true, | |||
"revisionOverrides": { | |||
"mac12": "1010", | |||
"mac12-arm64": "1010" | |||
"mac12": "1011", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the motivation for changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1010
doesn't have .tar.br
, and 1010
is identical to 1011
in functionality
This comment has been minimized.
This comment has been minimized.
@@ -1229,6 +2813,6 @@ END OF [email protected] AND INFORMATION | |||
|
|||
SUMMARY BEGIN HERE | |||
========================================= | |||
Total Packages: 48 | |||
Total Packages: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are paying a 25% bump in # of deps for a feature that does not link to a user report linked. Usually not a very good sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. It also increases the zip bundle size from 112kb to 202kb. We had this attempt of writing our own tar parser, maybe we should give it another try.
Just as an idea, can we utilize extract-zip's non-compression mode for tar? That way we use zip for tar and don't need all this extra code? i.e. the files will be .zip.br, not .tar.br. |
That'd save some dependencies, but would result in slightly larger bundles1 and it'd prevent streaming extraction. I'd prefer to stick with TAR, gonna take a stab at reducing the bundle size for that. Footnotes
|
if (!downloadPathTemplate) | ||
return []; | ||
// old webkit versions don't have brotli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why only old webkit revisions don't have brotli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webkit is the only browser we have revisionOverrides
overrides for that point to old versions, so the CI script that created them didn't yet create brotli
} | ||
log(`SUCCESS downloading and extracting ${options.title}`); | ||
} else { | ||
await downloadFile(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing different error handling code in this branch, including explicit checks for ECONNRESET. Is walking away from them intended? Should we do both changes at a time? I'd be more comfortable with leaving the download code as is and swapping piping into file with piping into broti.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new branch is intended to be as similar as possible, while also making the code a little more linear. The ECONNRESET
check only changed the error message, so I didn't include that.
Let me see if I can refactor it to make the change less spooky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored it so we can reuse the existing download function. Good pointer, thanks!
@@ -0,0 +1 @@ | |||
This directory contains a modified copy of the `tar-stream` library that's used exclusively to extract TAR files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure all the third party files are under the third_party folder and corresponding license files are provided beside the files. Make sure they end up in third party list or in a distributed bundle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! See diff.patch
for all my changes.
} | ||
|
||
shiftFirst (size) { | ||
return this._buffered === 0 ? null : this._next(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug on 21st line of this library? (I don't see this._buffered defined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it! It's also in the original: https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/extract.js#L23C17-L23C26
@@ -0,0 +1,311 @@ | |||
const { Writable, Readable, getStreamError } = require('stream') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getStreamError
is not a thing. How is it supposed to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find! Removed the usage of it by moving _predestroy
into _destroy
. Once I add the diff this will make more sense.
const len = parseInt(buf.toString('ascii', 0, i), 10) | ||
if (!len) return result | ||
|
||
const b = buf.subarray('ascii', i + 1, len - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT thinks it is a bug since the value called len
is used in the subarray(start, end) signature. Given that the start is i + 1
, which points to right after the parsed len, len - 1
can't be a valid end, did they want to say i + len
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what the original library does 🤷
https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/headers.js#L40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be some sort of special case checking. Maybe in some implementations of TAR/PAX, len
doesn't contain a length, but an index?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
} else { | ||
const file = fs.createWriteStream(options.zipPath); | ||
await downloadFile(options, file); | ||
log(`SUCCESS downloading ${options.title}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we place it before downloadFile
and print "log(downloading ${options.title}
);"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do that in a separate PR, but that would be a behaviour change to before. What'd be the benefit?
return decodeStr(buf, 0, buf.length, encoding) | ||
} | ||
|
||
exports.encodePax = function encodePax (opts) { // TODO: encode more stuff in pax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop this too, we only need to decode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the base for this patch? I see that headers.js upstream start with https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/headers.js#L1-L4 but I don't see those lines removed in the diff, was there some other modification before? Maybe we can split the patch into removing unnecessary stuff and then applying some changes to avoid third-party deps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like headers.js somehow wasn't included in the diff 🤔 Let me fix that.
const len = parseInt(buf.toString('ascii', 0, i), 10) | ||
if (!len) return result | ||
|
||
const b = buf.subarray('ascii', i + 1, len - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's important but len-1 maybe out of the buffer bounds, we may want to throw in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not touch the implementation. The upstream package is widely used and I think there's more risk from changing this than to leave in a potential defect.
const b = buf.subarray('ascii', i + 1, len - 1) | ||
const keyIndex = b.indexOf('=') | ||
if (keyIndex === -1) return result | ||
result[b.slice(0, keyIndex)] = b.slice(keyIndex + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses deprecated slice
while the rest of the code uses subarray
, perhaps it should be explicit toString()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
// ustar (posix) format. | ||
// prepend prefix, if present. | ||
if (buf[345]) name = decodeStr(buf, 345, 155, filenameEncoding) + '/' + name | ||
} else if (isGNU(buf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both POSIX UStar and GNU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
} | ||
|
||
// to support old tar versions that use trailing / to indicate dirs | ||
if (typeflag === 0 && name && name[name.length - 1] === '/') typeflag = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't hurt, but I don't think we need to support the old versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
|
||
function onfile () { | ||
const ws = xfs.createWriteStream(name) | ||
const rs = mapStream(stream, header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is always just stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
if (this._parent._stream === this) { | ||
this._parent._update() | ||
} | ||
- cb(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, this class extended streamx.Readable
, but it now extends node:stream.Readable
. The _read
implementation is slightly different, see https://github.com/mafintosh/streamx/blob/be4bbc8ba0b4c862a3d996e484675d4b6297136d/README.md?plain=1#L136-L137. Luckily it was synchronous before, so we can remove this callback easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I'll act on removing the dead code and fixing the patch.
I'd prefer to not touch the implementation more than absolutely necessary. The library is widely-used enough that any change introduces more risk than it averts. If we want to write our own, i'm open to that in a follow-up.
} else { | ||
const file = fs.createWriteStream(options.zipPath); | ||
await downloadFile(options, file); | ||
log(`SUCCESS downloading ${options.title}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do that in a separate PR, but that would be a behaviour change to before. What'd be the benefit?
const len = parseInt(buf.toString('ascii', 0, i), 10) | ||
if (!len) return result | ||
|
||
const b = buf.subarray('ascii', i + 1, len - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not touch the implementation. The upstream package is widely used and I think there's more risk from changing this than to leave in a potential defect.
const b = buf.subarray('ascii', i + 1, len - 1) | ||
const keyIndex = b.indexOf('=') | ||
if (keyIndex === -1) return result | ||
result[b.slice(0, keyIndex)] = b.slice(keyIndex + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
// ustar (posix) format. | ||
// prepend prefix, if present. | ||
if (buf[345]) name = decodeStr(buf, 345, 155, filenameEncoding) + '/' + name | ||
} else if (isGNU(buf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
} | ||
|
||
// to support old tar versions that use trailing / to indicate dirs | ||
if (typeflag === 0 && name && name[name.length - 1] === '/') typeflag = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
|
||
function onfile () { | ||
const ws = xfs.createWriteStream(name) | ||
const rs = mapStream(stream, header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, let's not touch the implementation needlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like headers.js somehow wasn't included in the diff 🤔 Let me fix that.
if (this._parent._stream === this) { | ||
this._parent._update() | ||
} | ||
- cb(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, this class extended streamx.Readable
, but it now extends node:stream.Readable
. The _read
implementation is slightly different, see https://github.com/mafintosh/streamx/blob/be4bbc8ba0b4c862a3d996e484675d4b6297136d/README.md?plain=1#L136-L137. Luckily it was synchronous before, so we can remove this callback easily.
return decodeStr(buf, 0, buf.length, encoding) | ||
} | ||
|
||
exports.encodePax = function encodePax (opts) { // TODO: encode more stuff in pax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the dead code and fixed that patch. Not sure how headers.js
slipped. The way I generated it is by pasting the files from the linked commits into a folder called original
, pasting the edited files into a patched
folder, and then running git diff --no-index -- original patched
.
Test results for "tests 1"12 flaky37725 passed, 654 skipped Merge workflow run. |
Some of our browsers are already available as
.tar.br
. Compared to the current.zip
archives, the brotli tarballs are ~10-30% smaller. This PR makes us download brotli files for chromium and webkit.