Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

circumvent fingerprinting #8

Open
MattMoony opened this issue Mar 25, 2023 · 5 comments
Open

circumvent fingerprinting #8

MattMoony opened this issue Mar 25, 2023 · 5 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@MattMoony
Copy link
Owner

description

Try to prevent platforms from rate-limiting bots (especially anonymous ones) by all available means. Probably a good idea to switch up HTTP headers on every other request, but also do more than that. Client fingerprinting shouldn't be the biggest issue, however, since that basically relies on JavaScript, afaik, and that's not really applicable to how d4v1d bots should normally gather data.

references

@MattMoony MattMoony added the enhancement New feature or request label Mar 25, 2023
@MattMoony
Copy link
Owner Author

To get better control of lower-level connection parameters (TLS & HTTP/2) - perhaps taking a look at something like PyCurl especially in combination with curl-impersonate is a good idea.

@8twinni8
Copy link
Collaborator

A rotating proxy functionality would also be great.

@MattMoony
Copy link
Owner Author

MattMoony commented Mar 28, 2023

Found curl_cffi in a discussion about PyCurl integration for curl-impersonate - looks like a rather promising project. Going to try and base a sort of "anonymous session" class upon it.

Edit: Found a blog post (curl_cffi: A Python library that supports natively simulated browser TLS/JA3 fingerprinting) by the author of curl_cffi.

@MattMoony
Copy link
Owner Author

It's still not enough; need to do more research on how the "anonymous" session can still be identified as I'm still getting rate limited using the code base at commit (ac0303e) with AnonSession, etc.

@MattMoony
Copy link
Owner Author

MattMoony commented Apr 1, 2023

Recommendation at the moment: Use a virtual machine / enforce IPv4, it could very well be that platforms like Instagram are more likely to block IPv6 addresses, as they should be assigned to exactly one device, whereas IPv4 addresses are commonly NATed, and therefore might actually have several clients behind them => they're probably a little more reluctant, when it comes to blocking those.

Edit: Nvm, I can fetch the site in a virtual machine using the exact same IPv6 address as my host machine, if I have been rate-limited on the host...

@MattMoony MattMoony added the documentation Improvements or additions to documentation label Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants