First, make sure the wikidl package is installed. You can use any Python package manager you prefer (for example, pip or conda).
Bash
pip install wikidl
Python
from wikidl import WikiDL downloader = WikiDL( # Do not use more than 3 processes. See warning below. num_proc=3, snapshot_date='20240801', ) downloaded_files = downloader.start(output_dir='./output') print(downloaded_files)
The code above downloads the latest article dump (LAD) for the August 1, 2024 snapshot into the ./output directory. It uses 3 CPU processes on your machine to download files in parallel.
Warning
You can use any number of processes as you wish. However, the dump provider (either Wikimedia or a third-party mirror provider) may limit the number of parallel connections to keep the service fair for everyone. You may see a 503 error if your configured number of processes exceeds the server-side limits.
If you are willing to see the current progress by filenames, you can add this following argument to WikiDL constructor:
Python
import logging downloader = WikiDL( ..., log_level=logging.DEBUG, )
© 2025 Lingxi Li.
San Francisco