Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large file upload times out #49

Open
tcnichol opened this issue Aug 5, 2022 · 12 comments
Open

large file upload times out #49

tcnichol opened this issue Aug 5, 2022 · 12 comments

Comments

@tcnichol
Copy link
Contributor

tcnichol commented Aug 5, 2022

The pyclowder file upload will time out for large files.

To get around this, I used the api endpoint with requests instead. This might point towards a solution.

@robkooper
Copy link
Member

can you show the code you use instead?

@tcnichol
Copy link
Contributor Author

tcnichol commented Aug 5, 2022

Sure.


def upload_a_file_to_dataset(filepath, dataset_id, clowder_url, user_api):
    url = '%s/api/uploadToDataset/%s?key=%s' % (clowder_url, dataset_id, user_api)
    file_exists = os.path.exists(filepath)
    before = datetime.now()
    if os.path.exists(filepath):
            filename = os.path.basename(filepath)
            m = MultipartEncoder(
                fields={'file': (filename, open(filepath, 'rb'))}
            )
            try:
                result = requests.post(url, data=m, headers={'Content-Type': m.content_type},
                                        verify=False)

                uploadedfileid = result.json()['id']
                return uploadedfileid
            except Exception as e:
                print('failed to upload file, error')
                print(e)
                print(str(datetime.now()))
    else:
        print("unable to upload file %s (not found)", filepath)
    return None

@bingzhang
Copy link
Contributor

i can also reproduce this issue when uploading large files.

@tcnichol
Copy link
Contributor Author

I'll take a look at this soon. Since I have code that works I believe I should be able to fix it.

@tcnichol
Copy link
Contributor Author

I notice that the difference is that I'm using verify=False and this is verify=True in pyclowder. This is set in connector (where it is set to True) and the default is True.

Should we switch this to False, or just do False if the files are above some size?

@robkooper
Copy link
Member

verify == FALSE means it does not test SSL certs, this is bad. This should only be used as a last resort in case you use self signed certs.

@tcnichol
Copy link
Contributor Author

OK. Will look into a better fix.

@tcnichol
Copy link
Contributor Author

@bingzhang how large of a file did you test and have it fail? Could you share it?

@bingzhang
Copy link
Contributor

@bingzhang how large of a file did you test and have it fail? Could you share it?

something like 1 GB.

@tcnichol
Copy link
Contributor Author

tcnichol commented Aug 23, 2022

OK.

If I use my code above and put verify=True, a file that's 1.8 GB will upload.

If i use
file_id = client.post_file("/uploadToDataset/%s" % dataset_id, path_to_large_file)

then i get a timeout.

There is

self.timeout = kwargs.get('timeout', 5)
in the client code

seeing if i can upload if i add value

@robkooper
Copy link
Member

from https://requests.readthedocs.io/en/latest/user/quickstart/#timeouts:

timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

So that means that we have not received a response from clowder for 5 seconds. Does this happen at the end of the upload or in the middle?

@tcnichol
Copy link
Contributor Author

I'm saying the middle based on the time. Uploading a 1.8 GB file (which I tried) took about 30 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants