The Chinese version of README can be found here.
Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗
This GIF depicts a sample run in normal speed,
-
Pixiv Crawler
-
Ranking lists for daily/monthly/yearly...
-
Personal bookmarks
-
Specific artist's artworks
-
Specific keyword's artworks (support advanced keyword search, e.g.,
(Lucy OR 边缘行者) AND (5000users OR 10000users)
) -
Parallel download with multi-threading
-
-
Mosaic Puzzles
pip install pixiv-utils
git clone [email protected]:CWHer/PixivCrawler.git
pip install -v .
Please refer to tutorial for comprehensive instructions.
Note: This section only contains the usage of Pixiv Crawler. For the usage of Mosaic Puzzles, please refer to Mosaic Puzzles Doc.
import datetime
from pixiv_utils.pixiv_crawler import (
RankingCrawler,
checkDir,
displayAllConfig,
download_config,
network_config,
ranking_config,
user_config,
)
if __name__ == "__main__":
network_config.proxy["https"] = "127.0.0.1:7890"
user_config.user_id = ""
user_config.cookie = ""
download_config.with_tag = False
ranking_config.start_date = datetime.date(2024, 5, 1)
ranking_config.range = 2
ranking_config.mode = "weekly"
ranking_config.content_mode = "illust"
ranking_config.num_artwork = 50
displayAllConfig()
checkDir(download_config.store_path)
app = RankingCrawler(capacity=200)
app.run()
The configurations locate at config.py
, which contains several items that should potentially be modified, denoted by displayAllConfig()
to check if they are correct.
-
RankingConfig
import ranking_config from pixiv_utils.pixiv_crawler
NOTE: This config is only activated when downloading the ranking list.
-
ranking_config.start_date: datetime.date
: The start date of the ranking list⚠️ -
ranking_config.range: int
: The date range of the ranking list⚠️ [start, start + range - 1]
-
ranking_config.mode: str
: The type of ranking list⚠️ , which can be chosen fromranking_modes: Tuple = ( "daily", "weekly", "monthly", "male", "female", "daily_ai", "daily_r18", "weekly_r18", "male_r18", "female_r18", "daily_r18_ai", )
-
ranking_config.content_mode: str
: The type of content in the ranking list⚠️ , which can be chosen fromcontent_modes: Tuple = ("all", "illust", "manga", "ugoira")
-
ranking_config.num_artwork: int
: The number of artworks to be downloaded in each ranking list⚠️
-
-
NetworkConfig
import network_config from pixiv_utils.pixiv_crawler
-
network_config.proxy: Dict
: The proxy configuration⚠️ # For example, to turn off the proxy network_config.proxy["https"] = ""
The default
proxy["https"]
value is127.0.0.1:7890
, which is the default proxy port of clash. It needs to be changed according to the actual proxy settings. If you do not need a proxy, please set the https attribute to "". -
network_config.headers: Dict
: The headers used in the request.
-
-
UserConfig
import user_config from pixiv_utils.pixiv_crawler
NOTE: User-specific configurations are required when downloading personal bookmarks or R18 content.
-
user_config.user_id: str
: The user ID of the Pixiv account⚠️ . You can find it in the URL of your profile page,https://www.pixiv.net/users/{UID}
. -
user_config.cookie: str
: The cookie of your Pixiv account⚠️
-
-
DownloadConfig
import download_config from pixiv_utils.pixiv_crawler
-
download_config.timeout: float
: The timeout of the request. -
download_config.retry_times: int
: The number of retries after a request fails. -
download_config.fail_delay: float
: The delay after a request fails. -
download_config.store_path: str
: The path to store the downloaded images⚠️ -
download_config.with_tag: bool
: Whether to download image tags totags.json
.⚠️ -
download_config.url_only: bool
: Whether to download image URLs only, without downloading images. URL will be returned throughapp.run()
.⚠️ ... download_config.url_only = True ... urls = app.run() # a set of image URLs
-
download_config.num_threads: int
: The number of threads for parallel download⚠️ -
download_config.thread_delay: float
: The delay for each thread to start.
-
-
DebugConfig
import debug_config from pixiv_utils.pixiv_crawler
-
debug_config.verbose: bool
: Whether to print debug information. -
debug_config.show_error: bool
: Whether to print detailed error information.
-
-
RankingCrawler
""" Download artworks from rankings NOTE: Require cookie for R18 images! Args: capacity (int): flow capacity, default is 1024MB """ app = RankingCrawler(capacity=200) app.run()
-
BookmarkCrawler
""" Download artworks from public bookmarks NOTE: Require cookie! Args: n_images (int): max download number, default is 200 capacity (int): flow capacity, default is 1024MB """ app = BookmarkCrawler(n_images=20, capacity=200) app.run()
-
UserCrawler
""" Download artworks from a single artist NOTE: Require cookie for R18 images! Args: artist_id (str): artist id capacity (int): flow capacity, default is 1024MB """ app = UserCrawler(artist_id="32548944", capacity=200) app.run()
-
KeywordCrawler
NOTE: Popularity sorting requires a
premium
account.""" Download search results of a keyword (sorted by popularity if order=True) Support advanced search, e.g. "(Lucy OR 边缘行者) AND (5000users OR 10000users)", refer to https://www.pixiv.help/hc/en-us/articles/235646387-I-would-like-to-know-how-to-search-for-content-on-pixiv NOTE: Require cookie for R18 images! NOTE: Require premium account for popularity sorting! Args: keyword (str): search keyword order (bool): order by popularity or not, default is False mode (str): content mode, default is "safe", support ["safe", "r18", "all"] n_images (int): max download number, default is 200 capacity (int): flow capacity, default is 1024MB """ app = KeywordCrawler( keyword="(Lucy OR 边缘行者) AND (5000users OR 10000users)", order=False, mode=["safe", "r18", "all"][-1], n_images=20, capacity=200, ) app.run()
Just run your script. 😆
-
COOKIE
expiration time is relatively long, and can be reused within a few days. -
Use
displayAllConfig()
to display all configurations and check if they are correct.
-
Tutorial: Quick start tutorial of Pixiv Crawler
-
Configuration: Configuration of Pixiv Crawler
-
Pixiv Crawler: Detailed instructions for Pixiv Crawler
-
Mosaic Puzzles: Detailed instructions for Mosaic Puzzles