Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random upload failures #724

Open
YoavGro opened this issue Jul 21, 2024 · 6 comments
Open

Random upload failures #724

YoavGro opened this issue Jul 21, 2024 · 6 comments

Comments

@YoavGro
Copy link

YoavGro commented Jul 21, 2024

Describe the bug
We are using the SDK to upload files to our S3 bucket.
After the client is opened for a few hours we are getting random upload failures on some files (while in others we don't - meaning continue to upload normally).
The errors are one in two

  1. SotoS3.S3.MultipartUploadError error 1
  2. AsyncHTTPClient.HTTPClientError error 1

On every failed part we have a 3 times retry system (all requests on retry fail as-well).

To Reproduce
Steps to reproduce the behavior:

  1. Set up S3 client
  2. Keep client open for several hours
  3. Try to upload multiple files
  4. Some files fail

Expected behavior
Have fails not fail to upload

Setup (please complete the following information):

  • OS: macOS 14.3.1
  • Version of soto: 6.8.0
  • Version of soto-core: 6.5.2
  • Authentication mechanism: IAM Instance Profile on EC2

Additional context
Not sure if this can effect, but maybe the client S3 credentials get timed out and it's affecting the upload
(This will not explain why other files continue to upload normally)

@adam-fowler
Copy link
Member

Can you print the HTTPClientError in the form "(error)". It tends to be more informative.

Have you any idea how many calls to AWS you are making before it fails? I don't think this is a credential issue as you wouldn't be getting an HTTPClient error.

@YoavGro
Copy link
Author

YoavGro commented Jul 21, 2024

@adam-fowler

HTTPClientError

wdym in the form of an error? when printing the thrown error in multipartUpload( I get

SotoS3.S3.MultipartUploadError error 1
AsyncHTTPClient.HTTPClientError error 1

@adam-fowler
Copy link
Member

Sorry I don't think I was clear enough (typing code on the iPhone is hard). If you run the following code

print("\(error)")

instead of

print(error)

You can get different results. The first tends to be more informative. Although after testing it. You get the same for both here with HTTPClientError, but it does give a more descriptive error eg something like HTTPClientError.alreadyShutdown. Basically I'm trying to get what the HTTPClientError is. This will most likely indicate the cause of the issue.

@adam-fowler
Copy link
Member

Also have you thought of upgrading to 7.0. The multipart upload has had a major rewrite and should upload faster with v7 as it support concurrent part uploads now. Although this will probably not fix this issue.

@YoavGro
Copy link
Author

YoavGro commented Jul 25, 2024

So these are the errors I am getting after a failed upload

MultipartUploadError(error: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method., completedParts: [])

Is it possible that I need to change the key when creating the upload request?

let request = S3.CreateMultipartUploadRequest(bucket: bucket, key: fileName) <-- change this on every failed upload

Also, it looks like requests that receive this error

error: POSIXErrorCode(rawValue: 60): Operation timed out

for some reason do not get a response from the upload on a second try (meaning I see the timeout error, and the retry of the upload and the upload progress of the retry, but when it finishes with progress 1.0 it does not return from the upload)

@adam-fowler
Copy link
Member

Are these two different situations where uploads fail. You can't throw two errors at the same time?

Regarding the first I'm quite surprised you are seeing a signature does not match error. I haven't seen one of those reported in years. I will need to know the exact request that caused this error to fix it. Unless this is a request that is being created but something stalled so it got sent sometime after, once the signature had expired.

The timeout error can be resolved by increasing the request timeout a service allows.

s3WithTimeout = s3.with(timeout: .minutes(2))

Finally have you tried resumeMultipartUpload(). This will resume a failed upload, where some parts have already been uploaded. See the section on resuming a failed upload here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants