Quick Start

Preparation

First, make sure the wikidl package is installed. You can use any Python package manager you prefer (for example, pip or conda).

Bash

pip install wikidl

The Simplest Example

Python

from wikidl import WikiDL

downloader = WikiDL(
    # Do not use more than 3 processes. See warning below.
    num_proc=3,
    snapshot_date='20240801',
)
downloaded_files = downloader.start(output_dir='./output')
print(downloaded_files)

The code above downloads the latest article dump (LAD) for the August 1, 2024 snapshot into the ./output directory. It uses 3 CPU processes on your machine to download files in parallel.

Warning

You can use any number of processes as you wish. However, the dump provider (either Wikimedia or a third-party mirror provider) may limit the number of parallel connections to keep the service fair for everyone. You may see a 503 error if your configured number of processes exceeds the server-side limits.

If you are willing to see the current progress by filenames, you can add this following argument to WikiDL constructor:

Python

import logging

downloader = WikiDL(
    ...,
    log_level=logging.DEBUG,
)

© 2025 Lingxi Li.

San Francisco