Hangs after downloading a few pages #1

youssefavx · 2020-11-15T09:48:31Z

Hey! Thanks for making this! Was looking for something like this and this one actually worked!

For some reason, it downloads a few pages, and then it hangs. If I close it, restart it, and then run with the same settings again, it continues, and then hangs again. It does less than 10 pages/around 10

youssefavx · 2020-11-15T09:54:37Z

Sorry I realized a tracebac would be useful:

8% (57/703) done
8% (58/703) done
8% (59/703) done
8% (60/703) done
8% (61/703) done
8% (62/703) done
8% (63/703) done
9% (64/703) done
9% (65/703) done
9% (66/703) done
9% (67/703) done
9% (68/703) done
9% (69/703) done
9% (70/703) done
10% (71/703) done
10% (72/703) done
10% (73/703) done
10% (74/703) done
^CTraceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 265, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1321, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 265, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ripper.py", line 60, in <module>
    main()
  File "ripper.py", line 52, in main
    contents = client.download_page(i)
  File "/Volumes/Transcend2/archivedownloader/archiveripper/api.py", line 143, in download_page
    'referer': self.URL_FORMAT % ('details/' + self.book_id)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Exception ignored in: <module 'threading' from '/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py'>
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1273, in _shutdown
    t.join()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1032, in join
    self._wait_for_tstate_lock()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

notevenaperson · 2021-01-06T00:33:35Z

Can't replicate the issue if I don't know which book you tried to borrow. Does this always happen with every book you tried or with a specific book only? Did you try running it again against that specific book? from the traceback it just looks like your internet connection dropped.

notevenaperson · 2021-02-10T00:26:04Z

Hi again. Lately I've been having this issue myself. It seems archive.org drops the connection if it suspects automated requests. I fixed this in my fork by adding a random second or so delay between requests. It's only implemented in the dev branch of my fork for now, since I'm waiting for my pull request that adds cmdline flags to be accepted.

…een requests to prevent being blocked 1. adds the -R flag 2. Should fix scoliono#1 and adds the -nt flag

Aphexus · 2024-04-28T03:44:13Z

Similar issue.

Should not just add a few seconds delay(specially after downloading several pages but also check for empty files and then delay longer.

My quick hax:

downloaded_pages = 0
total = end - start
for i in range(start, end):
    fn = '%s/%d.jpg' % (dir, i + 1)
    
    if exists(fn):
        if getsize(fn) != 0:
            continue

    sleeptime=random.uniform(2,5)
    if i % 25 == 0:
        time.sleep(10)
    if i % 50 == 0:
        time.sleep(10)            
        

    logging.debug('downloading page %d (index %d)' % (i + 1, i))
    contents = client.download_page(i, args.scale)
    # likely data is corrupt or blocked so try again.
    if len(contents) < 100:
        time.sleep(50)
        contents = client.download_page(i, args.scale)
        # Ignore it
        if len(contents) < 100:
            print("Error downloading page " + str(i) + ". Ignoring...")
            continue
        
                        
    with open(fn, 'wb') as file:
        file.write(contents)
    downloaded_pages = downloaded_pages + 1
    done_count = i + 1 - start
    print('%d%% (%d/%d) done. Total Downloaded (%d/%d)' % (done_count / total * 100, done_count, total, downloaded_pages, total))
            
    time.sleep(sleeptime)

notevenaperson added a commit to notevenaperson/archiveripper that referenced this issue Sep 13, 2021

1. Don't re-download pages by default 2. Sleep for a few seconds betw…

8569763

…een requests to prevent being blocked 1. adds the -R flag 2. Should fix scoliono#1 and adds the -nt flag

notevenaperson linked a pull request Sep 14, 2021 that will close this issue

Assorted improvements #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hangs after downloading a few pages #1

Hangs after downloading a few pages #1

youssefavx commented Nov 15, 2020 •

edited

Loading

youssefavx commented Nov 15, 2020

notevenaperson commented Jan 6, 2021

notevenaperson commented Feb 10, 2021

Aphexus commented Apr 28, 2024 •

edited

Loading

Hangs after downloading a few pages #1

Hangs after downloading a few pages #1

Comments

youssefavx commented Nov 15, 2020 • edited Loading

youssefavx commented Nov 15, 2020

notevenaperson commented Jan 6, 2021

notevenaperson commented Feb 10, 2021

Aphexus commented Apr 28, 2024 • edited Loading

youssefavx commented Nov 15, 2020 •

edited

Loading

Aphexus commented Apr 28, 2024 •

edited

Loading