feat: download browsers as TAR #34033

Skn0tt · 2024-12-16T15:53:00Z

Some of our browsers are already available as .tar.br. Compared to the current .zip archives, the brotli tarballs are ~10-30% smaller. This PR makes us download brotli files for chromium and webkit.

packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts

pavelfeldman · 2024-12-18T22:40:27Z

packages/playwright-core/bundles/zip/package.json

@@ -10,6 +10,7 @@
  },
  "dependencies": {
    "extract-zip": "2.0.1",
+    "tar-fs": "^3.0.6",


I understand the library is popular, but its deps list seem to be excessive for what it does a little. Did we consider alternatives?

Oh wow, "tar" is even more...

Yes, we considered tar-fs, tar and writing our own. Writing our own turned out more complex than imagined, because webkit has very long path names and the format becomes tricky when that's involved. Of the three, tar-fs seemed the most focused.

mxschmitt · 2024-12-19T16:13:06Z

packages/playwright-core/browsers.json

@@ -48,8 +48,8 @@
      "revision": "1011",
      "installByDefault": true,
      "revisionOverrides": {
-        "mac12": "1010",
-        "mac12-arm64": "1010"
+        "mac12": "1011",


whats the motivation for changing this?

1010 doesn't have .tar.br, and 1010 is identical to 1011 in functionality

pavelfeldman · 2025-01-04T00:17:58Z

packages/playwright-core/ThirdPartyNotices.txt

@@ -1229,6 +2813,6 @@ END OF [email protected] AND INFORMATION

 SUMMARY BEGIN HERE
 =========================================
-Total Packages: 48
+Total Packages: 60


We are paying a 25% bump in # of deps for a feature that does not link to a user report linked. Usually not a very good sign.

True. It also increases the zip bundle size from 112kb to 202kb. We had this attempt of writing our own tar parser, maybe we should give it another try.

pavelfeldman · 2025-01-04T00:29:13Z

Just as an idea, can we utilize extract-zip's non-compression mode for tar? That way we use zip for tar and don't need all this extra code? i.e. the files will be .zip.br, not .tar.br.

Skn0tt · 2025-01-06T12:01:01Z

Just as an idea, can we utilize extract-zip's non-compression mode for tar?

That'd save some dependencies, but would result in slightly larger bundles¹ and it'd prevent streaming extraction. I'd prefer to stick with TAR, gonna take a stab at reducing the bundle size for that.

tested on firefox-mac: .zip is 92mb, .zip.br is 67.2mb and .tar.br is 65.36mb ↩

pavelfeldman · 2025-01-13T03:48:29Z

packages/playwright-core/src/server/registry/index.ts

    if (!downloadPathTemplate)
      return [];
+    // old webkit versions don't have brotli


curious why only old webkit revisions don't have brotli

webkit is the only browser we have revisionOverrides overrides for that point to old versions, so the CI script that created them didn't yet create brotli

pavelfeldman · 2025-01-13T03:54:41Z

packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts

+    }
+    log(`SUCCESS downloading and extracting ${options.title}`);
+  } else {
+    await downloadFile(options);


I'm seeing different error handling code in this branch, including explicit checks for ECONNRESET. Is walking away from them intended? Should we do both changes at a time? I'd be more comfortable with leaving the download code as is and swapping piping into file with piping into broti.

The new branch is intended to be as similar as possible, while also making the code a little more linear. The ECONNRESET check only changed the error message, so I didn't include that.

Let me see if I can refactor it to make the change less spooky.

I've refactored it so we can reuse the existing download function. Good pointer, thanks!

pavelfeldman · 2025-01-13T03:58:23Z

packages/playwright-core/src/utils/tar/README.md

@@ -0,0 +1 @@
+This directory contains a modified copy of the `tar-stream` library that's used exclusively to extract TAR files.


Make sure all the third party files are under the third_party folder and corresponding license files are provided beside the files. Make sure they end up in third party list or in a distributed bundle

Done! See diff.patch for all my changes.

pavelfeldman · 2025-01-13T03:59:36Z

packages/playwright-core/src/utils/tar/extract.js

+  }
+
+  shiftFirst (size) {
+    return this._buffered === 0 ? null : this._next(size)


Is this a bug on 21st line of this library? (I don't see this._buffered defined)

looks like it! It's also in the original: https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/extract.js#L23C17-L23C26

pavelfeldman · 2025-01-13T04:00:48Z

packages/playwright-core/src/utils/tar/extract.js

@@ -0,0 +1,311 @@
+const { Writable, Readable, getStreamError } = require('stream')


getStreamError is not a thing. How is it supposed to work?

Good find! Removed the usage of it by moving _predestroy into _destroy. Once I add the diff this will make more sense.

pavelfeldman · 2025-01-13T04:29:52Z

packages/playwright-core/src/utils/tar/headers.js

+    const len = parseInt(buf.toString('ascii', 0, i), 10)
+    if (!len) return result
+
+    const b = buf.subarray('ascii', i + 1, len - 1)


ChatGPT thinks it is a bug since the value called len is used in the subarray(start, end) signature. Given that the start is i + 1, which points to right after the parsed len, len - 1 can't be a valid end, did they want to say i + len here?

this is what the original library does 🤷
https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/headers.js#L40

This seems to be some sort of special case checking. Maybe in some implementations of TAR/PAX, len doesn't contain a length, but an index?

yury-s · 2025-01-29T21:53:46Z

packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts

+  } else {
+    const file = fs.createWriteStream(options.zipPath);
+    await downloadFile(options, file);
+    log(`SUCCESS downloading ${options.title}`);


nit: should we place it before downloadFile and print "log(downloading ${options.title});"?

Happy to do that in a separate PR, but that would be a behaviour change to before. What'd be the benefit?

yury-s · 2025-01-29T23:40:45Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+  return decodeStr(buf, 0, buf.length, encoding)
+}
+
+exports.encodePax = function encodePax (opts) { // TODO: encode more stuff in pax


Let's drop this too, we only need to decode?

Good catch! Will do.

yury-s · 2025-01-29T23:55:20Z

packages/playwright-core/src/utils/third_party/tar/patch.diff

What is the base for this patch? I see that headers.js upstream start with https://github.com/mafintosh/tar-stream/blob/126968fd3c4a39eba5f8318c255e04cedbbad176/headers.js#L1-L4 but I don't see those lines removed in the diff, was there some other modification before? Maybe we can split the patch into removing unnecessary stuff and then applying some changes to avoid third-party deps?

It looks like headers.js somehow wasn't included in the diff 🤔 Let me fix that.

yury-s · 2025-01-30T00:10:32Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    const len = parseInt(buf.toString('ascii', 0, i), 10)
+    if (!len) return result
+
+    const b = buf.subarray('ascii', i + 1, len - 1)


Not sure if it's important but len-1 maybe out of the buffer bounds, we may want to throw in that case.

I'd prefer to not touch the implementation. The upstream package is widely used and I think there's more risk from changing this than to leave in a potential defect.

yury-s · 2025-01-30T00:12:54Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    const b = buf.subarray('ascii', i + 1, len - 1)
+    const keyIndex = b.indexOf('=')
+    if (keyIndex === -1) return result
+    result[b.slice(0, keyIndex)] = b.slice(keyIndex + 1)


This uses deprecated slice while the rest of the code uses subarray, perhaps it should be explicit toString()?

see above, let's not touch the implementation needlessly.

yury-s · 2025-01-30T00:13:26Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    // ustar (posix) format.
+    // prepend prefix, if present.
+    if (buf[345]) name = decodeStr(buf, 345, 155, filenameEncoding) + '/' + name
+  } else if (isGNU(buf)) {


Do we need both POSIX UStar and GNU?

see above, let's not touch the implementation needlessly.

yury-s · 2025-01-30T00:14:25Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+  }
+
+  // to support old tar versions that use trailing / to indicate dirs
+  if (typeflag === 0 && name && name[name.length - 1] === '/') typeflag = 5


Probably doesn't hurt, but I don't think we need to support the old versions.

see above, let's not touch the implementation needlessly.

yury-s · 2025-01-30T00:32:15Z

packages/playwright-core/src/utils/third_party/tar/index.js

+
+    function onfile () {
+      const ws = xfs.createWriteStream(name)
+      const rs = mapStream(stream, header)


Looks like this is always just stream.

see above, let's not touch the implementation needlessly.

yury-s · 2025-01-30T00:34:58Z

packages/playwright-core/src/utils/third_party/tar/patch.diff

+     if (this._parent._stream === this) {
+       this._parent._update()
+     }
+-    cb(null)


Why did this go away?

Before, this class extended streamx.Readable, but it now extends node:stream.Readable. The _read implementation is slightly different, see https://github.com/mafintosh/streamx/blob/be4bbc8ba0b4c862a3d996e484675d4b6297136d/README.md?plain=1#L136-L137. Luckily it was synchronous before, so we can remove this callback easily.

Skn0tt

Thanks for the review. I'll act on removing the dead code and fixing the patch.

I'd prefer to not touch the implementation more than absolutely necessary. The library is widely-used enough that any change introduces more risk than it averts. If we want to write our own, i'm open to that in a follow-up.

Skn0tt · 2025-01-30T10:57:55Z

packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts

+  } else {
+    const file = fs.createWriteStream(options.zipPath);
+    await downloadFile(options, file);
+    log(`SUCCESS downloading ${options.title}`);


Happy to do that in a separate PR, but that would be a behaviour change to before. What'd be the benefit?

Skn0tt · 2025-01-30T10:58:54Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    const len = parseInt(buf.toString('ascii', 0, i), 10)
+    if (!len) return result
+
+    const b = buf.subarray('ascii', i + 1, len - 1)


I'd prefer to not touch the implementation. The upstream package is widely used and I think there's more risk from changing this than to leave in a potential defect.

Skn0tt · 2025-01-30T10:59:25Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    const b = buf.subarray('ascii', i + 1, len - 1)
+    const keyIndex = b.indexOf('=')
+    if (keyIndex === -1) return result
+    result[b.slice(0, keyIndex)] = b.slice(keyIndex + 1)


see above, let's not touch the implementation needlessly.

Skn0tt · 2025-01-30T10:59:34Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+    // ustar (posix) format.
+    // prepend prefix, if present.
+    if (buf[345]) name = decodeStr(buf, 345, 155, filenameEncoding) + '/' + name
+  } else if (isGNU(buf)) {


see above, let's not touch the implementation needlessly.

Skn0tt · 2025-01-30T10:59:38Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+  }
+
+  // to support old tar versions that use trailing / to indicate dirs
+  if (typeflag === 0 && name && name[name.length - 1] === '/') typeflag = 5


see above, let's not touch the implementation needlessly.

Skn0tt · 2025-01-30T10:59:49Z

packages/playwright-core/src/utils/third_party/tar/index.js

+
+    function onfile () {
+      const ws = xfs.createWriteStream(name)
+      const rs = mapStream(stream, header)


see above, let's not touch the implementation needlessly.

Skn0tt · 2025-01-30T11:01:47Z

packages/playwright-core/src/utils/third_party/tar/patch.diff

It looks like headers.js somehow wasn't included in the diff 🤔 Let me fix that.

Skn0tt · 2025-01-30T11:03:42Z

packages/playwright-core/src/utils/third_party/tar/patch.diff

+     if (this._parent._stream === this) {
+       this._parent._update()
+     }
+-    cb(null)


Before, this class extended streamx.Readable, but it now extends node:stream.Readable. The _read implementation is slightly different, see https://github.com/mafintosh/streamx/blob/be4bbc8ba0b4c862a3d996e484675d4b6297136d/README.md?plain=1#L136-L137. Luckily it was synchronous before, so we can remove this callback easily.

Skn0tt · 2025-01-30T11:04:16Z

packages/playwright-core/src/utils/third_party/tar/headers.js

+  return decodeStr(buf, 0, buf.length, encoding)
+}
+
+exports.encodePax = function encodePax (opts) { // TODO: encode more stuff in pax


Good catch! Will do.

Skn0tt

I've removed the dead code and fixed that patch. Not sure how headers.js slipped. The way I generated it is by pasting the files from the linked commits into a folder called original, pasting the edited files into a patched folder, and then running git diff --no-index -- original patched.

github-actions · 2025-01-30T13:15:27Z

Test results for "tests 1"

12 flaky

⚠️ [chromium-page] › tests/page/page-event-request.spec.ts:151:3 › should report navigation requests and responses handled by service worker with routing @chromium-ubuntu-22.04-node18
⚠️ [firefox-page] › tests/page/page-evaluate.spec.ts:403:3 › should throw for too deep reference chain @firefox-ubuntu-22.04-node18
⚠️ [playwright-test] › tests/ui-mode-trace.spec.ts:341:5 › should work behind reverse proxy @ubuntu-latest-node22-1
⚠️ [webkit-library] › tests/library/defaultbrowsercontext-2.spec.ts:28:3 › should work in persistent context @webkit-ubuntu-22.04-node18
⚠️ [webkit-library] › tests/library/proxy.spec.ts:44:3 › should use proxy for second page @webkit-ubuntu-22.04-node18
⚠️ [webkit-library] › tests/library/screenshot.spec.ts:278:14 › element screenshot › should restore viewport after page screenshot and exception @webkit-ubuntu-22.04-node18
⚠️ [webkit-library] › tests/library/selector-generator.spec.ts:68:5 › selector generator › should generate text for @webkit-ubuntu-22.04-node18
⚠️ [webkit-library] › tests/library/video.spec.ts:475:5 › screencast › should scale frames down to the requested size @webkit-ubuntu-22.04-node18
⚠️ [webkit-page] › tests/page/page-leaks.spec.ts:82:5 › click should not leak @webkit-ubuntu-22.04-node18
⚠️ [webkit-page] › tests/page/page-leaks.spec.ts:107:5 › fill should not leak @webkit-ubuntu-22.04-node18
⚠️ [webkit-page] › tests/page/page-set-input-files.spec.ts:147:3 › should upload large file @webkit-ubuntu-22.04-node18
⚠️ [playwright-test] › tests/ui-mode-test-watch.spec.ts:145:5 › should watch all @windows-latest-node18-1

37725 passed, 654 skipped
✔️✔️✔️

Merge workflow run.

Skn0tt added 4 commits December 16, 2024 16:43

add second branch

a9b40bd

move chromium, chromium-tot, and webkit to brotli

0b139f1

fix import

98d6138

feat: download browsers as .tar.br

657e435

Skn0tt self-assigned this Dec 16, 2024

This comment has been minimized.

Sign in to view

Skn0tt added 6 commits December 18, 2024 12:47

respect revisionOverrides

c5ab0c5

update thirdparty

3b72c3b

update log line matcher

411cf29

update expected error codes

d880a22

ffmpeg also has brotli builds now

f39ae50

remove conflicting tar-fs types

f071337

This comment has been minimized.

Sign in to view

add workaround for manually-created ffmpeg archive

45b84b7

This comment has been minimized.

Sign in to view

remove console statement

a497b6b

This comment has been minimized.

Sign in to view

Skn0tt marked this pull request as ready for review December 18, 2024 14:28

Merge branch 'main' into tar-download-3rd-party-lib

b048d2a

This comment has been minimized.

Sign in to view

mxschmitt reviewed Dec 18, 2024

View reviewed changes

packages/playwright-core/src/server/registry/oopDownloadBrowserMain.ts Outdated Show resolved Hide resolved

pavelfeldman reviewed Dec 18, 2024

View reviewed changes

Skn0tt added 2 commits December 19, 2024 16:47

Merge branch 'main' into tar-download-3rd-party-lib

affff4d

adapt to new ffmpeg roll

879afb1

mxschmitt reviewed Dec 19, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

pavelfeldman reviewed Jan 4, 2025

View reviewed changes

pavelfeldman reviewed Jan 13, 2025

View reviewed changes

Skn0tt added 8 commits January 13, 2025 11:00

Merge branch 'main' into tar-download-3rd-party-lib

30c5ec6

firefox and chrome-headless-shell have tar now

e2b1500

reuse existing code better

25494cb

continue using close

2224fd2

move to third_party

41a552f

add license/readme

dce95a5

remove getStreamError

2749367

add diff

7744cc5

This comment has been minimized.

Sign in to view

Skn0tt added 2 commits January 13, 2025 11:58

fix _predestroy

c717419

add notices

a2d214f

This comment has been minimized.

Sign in to view

Skn0tt added 2 commits January 13, 2025 12:13

fix linter

5086a56

add deps

e24a0d7

This comment has been minimized.

Sign in to view

Skn0tt added 2 commits January 21, 2025 11:42

Merge branch 'main' into tar-download-3rd-party-lib

746bb68

revert test changes

c8b2ce0

This comment has been minimized.

Sign in to view

fix test

6fb5355

This comment has been minimized.

Sign in to view

yury-s reviewed Jan 30, 2025

View reviewed changes

Skn0tt commented Jan 30, 2025

View reviewed changes

Skn0tt added 2 commits January 30, 2025 13:15

remove dead code

3c63756

update diff

5c04f77

Skn0tt commented Jan 30, 2025

View reviewed changes

		@@ -0,0 +1 @@
		This directory contains a modified copy of the `tar-stream` library that's used exclusively to extract TAR files.

		@@ -0,0 +1,311 @@
		const { Writable, Readable, getStreamError } = require('stream')

feat: download browsers as TAR #34033

Are you sure you want to change the base?

feat: download browsers as TAR #34033

Conversation

Skn0tt commented Dec 16, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skn0tt Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavelfeldman commented Jan 4, 2025

Skn0tt commented Jan 6, 2025 • edited Loading

Footnotes

Choose a reason for hiding this comment

Skn0tt Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skn0tt Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skn0tt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skn0tt left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 30, 2025

Test results for "tests 1"

Skn0tt Dec 19, 2024 •

edited

Loading

Skn0tt commented Jan 6, 2025 •

edited

Loading

Skn0tt Jan 13, 2025 •

edited

Loading

Skn0tt Jan 13, 2025 •

edited

Loading