Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] brew audit failure at check_https_availability for large (2GB+) files #107214

Open
3 of 6 tasks
cho-m opened this issue Jun 13, 2021 · 5 comments
Open
3 of 6 tasks

[CI] brew audit failure at check_https_availability for large (2GB+) files #107214

cho-m opened this issue Jun 13, 2021 · 5 comments

Comments

@cho-m
Copy link
Member

cho-m commented Jun 13, 2021

Description of issue

In some PRs, Homebrew's CI hits an audit failure due to trying to read large (2GB+) files into memory.

Specifically, the failure happens in curl_http_content_headers_and_checksum at
https://github.com/Homebrew/brew/blob/d3887df2a39d270c7988985c5bf9099c8229aeca/Library/Homebrew/utils/curl.rb#L286-L289

      if status.success?
        file_contents = File.read(file.path)
        file_hash = Digest::SHA2.hexdigest(file_contents) if hash_needed
      end

Seen in #107181

exception while auditing unity: Invalid argument @ io_fread - /private/tmp/20210612-3010-1sx7ka6<br class="Apple-interchange-newline">

and #106950

exception while auditing final-fantasy-xiv: Invalid argument @ io_fread - /private/tmp/20210609-2949-1spku3h<br class="Apple-interchange-newline">

along with some other PRs.

Command that failed

brew audit --cask --appcast --online './Casks/unity.rb'

Output of command with --verbose --debug

Click to expand
==> Verifying checksum for cask 'unity'
/usr/bin/curl --disable --globoff --show-error --user-agent Homebrew/3.1.12-15-gd3887df\ \(Macintosh\;\ Intel\ Mac\ OS\ X\ 10.15.7\)\ curl/7.64.1 --header Accept-Language:\ en --retry 3 --dump-header - --output /private/tmp/20210613-3195-xxh8yw --location --connect-timeout 15 --max-time 25 --retry-max-time 25 https://download.unity3d.com/download_unity/4d8c25f7477e/MacEditorInstaller/Unity-2021.1.11f1.pkg
==> Invalid argument @ io_fread - /private/tmp/20210613-3195-xxh8yw
/usr/local/Homebrew/Library/Homebrew/utils/curl.rb:287:in `read'
/usr/local/Homebrew/Library/Homebrew/utils/curl.rb:287:in `curl_http_content_headers_and_checksum'
/usr/local/Homebrew/Library/Homebrew/utils/curl.rb:202:in `block in curl_check_http_content'
/usr/local/Homebrew/Library/Homebrew/utils/curl.rb:200:in `each'
/usr/local/Homebrew/Library/Homebrew/utils/curl.rb:200:in `curl_check_http_content'
/usr/local/Homebrew/Library/Homebrew/cask/audit.rb:753:in `check_url_for_https_availability'
/usr/local/Homebrew/Library/Homebrew/cask/audit.rb:737:in `check_https_availability'
/usr/local/Homebrew/Library/Homebrew/cask/audit.rb:66:in `run!'
/usr/local/Homebrew/Library/Homebrew/cask/auditor.rb:141:in `audit_cask_instance'
/usr/local/Homebrew/Library/Homebrew/cask/auditor.rb:87:in `audit'
/usr/local/Homebrew/Library/Homebrew/cask/auditor.rb:38:in `audit'
/usr/local/Homebrew/Library/Homebrew/cask/cmd/audit.rb:104:in `block in audit_casks'
/usr/local/Homebrew/Library/Homebrew/cask/cmd/audit.rb:102:in `map'
/usr/local/Homebrew/Library/Homebrew/cask/cmd/audit.rb:102:in `audit_casks'
/usr/local/Homebrew/Library/Homebrew/dev-cmd/audit.rb:210:in `audit'
/usr/local/Homebrew/Library/Homebrew/brew.rb:122:in `<main>'

Output of brew doctor --verbose

N/A. Has been occurring on CI nodes.
Haven't been able to reproduce locally.
e.g. https://github.com/Homebrew/homebrew-cask/runs/2810177842?check_suite_focus=true

Output of brew tap

N/A. Has been occurring on CI nodes.
Haven't been able to reproduce locally.
e.g. https://github.com/Homebrew/homebrew-cask/runs/2810177842?check_suite_focus=true

@cho-m
Copy link
Member Author

cho-m commented Jun 13, 2021

Some possibilities:

  1. An option to avoid reading the file into a variable for this audit check and just run Digest::SHA2 directly on file.
  2. Some additional logic for URLs that are already HTTPS (after resolving redirects).
  3. Add workaround for CI to selectively disable specific audit subtests.
  4. See if there is way to let CI read large file without failing. Not sure what is exact resource limits, but there is no issue on local machine.

@cho-m cho-m changed the title CI brew audit failure at check_https_availability for large (2GB+) files [CI] brew audit failure at check_https_availability for large (2GB+) files Jun 19, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale Issue which has not received any feedback for some time. label Jul 11, 2021
@cho-m cho-m added help wanted and removed stale Issue which has not received any feedback for some time. labels Jul 11, 2021
@ed-flanagan
Copy link
Contributor

I was taking a look. Thought it'd be possible to change https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/utils/curl.rb#L340-L341 into a file stream, e.g.

file_hash = Digest::SHA2.new
chunk_size = 1024 # bytes
File.open(file.path) do |f|
  while chunk = f.read(chunk_size)
     file_hash << chunk
  end
end
file_hash = file_hash.hexdigest

Chunking seems to work fine; however, it looks like the entire file_contents value is still required, i.e. other things expect the entire file to be loaded into memory.

Returned here https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/utils/curl.rb#L354
and consumed here https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/cask/audit.rb#L597-L611.
Looks like check_appcast_contains_version searches the entire file for a pattern match.

I'm not aware of a way to grep a binary without slurping the entire file. If there's a way to discover that appcast version without reading the entire file into memory (I'm not familiar with appcast, if there's a file prefix, could take advantage of parsing that instead), then could remove reading the entire file and returning it (which I think would be a good idea anyways).

A workaround could be to make this function optional here maybe? https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/cask/audit.rb#L76

@carlocab
Copy link
Member

carlocab commented Mar 9, 2022

I was taking a look.

Thanks for doing this!

Thought it'd be possible to change https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/utils/curl.rb#L340-L341 into a file stream

This makes sense to me.

I'm not aware of a way to grep a binary without slurping the entire file.

Could we somehow break the file up into overlapping chunks so that we can grep each chunk without loss of information?

@ed-flanagan
Copy link
Contributor

ed-flanagan commented Mar 10, 2022

Original comment about file chunking

Could we somehow break the file up into overlapping chunks so that we can grep each chunk without loss of information?

Depends on what's what's being matched. If it's a known file position (i.e. a prefix or something parse-able), or has a known maximum size, yeah I think you can.

So for https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/cask/audit.rb#L605-L606

We could create a sliding window of size adjusted_version_stanza, sliding byte-by-byte until the end of the file.
I don't really know Ruby, so I'm not familiar with file streams/built-ins. Here's my best attempt at an example, but sorry if this is awful Ruby:

file_name = 'example.txt'

target = '1.2.3'
chunk_size = target.length

contains = false
File.open(file_name) do |f|
  contains = for pos in 0.step()
    begin
      chunk = f.pread(chunk_size, pos)
      break true if chunk == target
    rescue EOFError
      break false
    end
  end
end

p contains

Essentially, just read a fixed chunk size at every position until either (1) the exact match is found or (2) we reach EOF. However, iterative preads like this has some very negative time performance impact (i.e. don't do this). For large files, seeking every position for a chunk that's not there may take inordinate amount of times, even if using little memory. Running this on the Unity pkg from above (~2.6G) has already been running for ~10 minutes, and will likely take longer.
This is a really naive approach. You could make larger overlapping chunks and run include? on the strings, which may result in practical performance benefits.


Yeah, chunking, by some arbitrary fixed value (128MB) and using include? is much faster.

file_name = 'example.txt'

target = 'example'
target_size = target.length
chunk_size = 128 * 1000000 # 128MB

contains = false
File.open(file_name) do |f|
  contains = for i in 0.step()
    pos = i * (chunk_size - target_size)
    begin
      chunk = f.pread(chunk_size, pos)
      break true if chunk.include? target
    rescue EOFError
      break false
    end
  end
end

p contains

Anyway, I couldn't reproduce locally.


I re-read the issue. I have a sense now it might not be a file size issue, but a naming one? The file name /private/tmp/20210612-3010-1sx7ka6<br class="Apple-interchange-newline"> mentioned in the original issue might be erroring because of characters in the filename or something. The errors I see when the file is too large to read include Errno::EINVAL.

I looked back at the Unity cask history. Looks like appcast was updated 6730dae. Then removed here 39ec123.
Further, reading https://github.com/Homebrew/brew/blob/e62094f52aa7203106f5c9b9447679488f6f2dc7/Library/Homebrew/cask/audit.rb#L595 closer, looks like it's comparing the file from the appcast field, not the installer binary (which is much smaller).

Sorry I went on a goose chase here with file sizes. The file chunking in the collapsed section might be useful for something else, but have a feeling it doesn't help with this issue (which might be resolved with the removal of the appcast fields)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@ed-flanagan @cho-m @carlocab and others