API Reference

WikiDL.__init__

Initializes the WikiDL class for downloading specific snapshots from Wikimedia data dumps.

Parameters

snapshot_date

str

The date of the snapshot to download. This field is required.

master_url

str

The master URL pointing to the data dump. Defaults to https://dumps.wikimedia.org/enwiki.

select_pattern

str

File selector pattern. Defaults to lad (latest article dump). Use ehd for the edit history dump or customize with custom_select_pattern.

custom_select_pattern

str | None

Custom select pattern for target file names. Defaults to None, in which case the specified select_pattern is used.

num_proc

int

Number of processes to use.

log_level

int

Logging level to use. Defaults to logging.INFO.

WikiDL.start

Starts the downloading task specified in the WikiDL instance.

Parameters

output_dir

str

Output directory for downloaded files. This field is required.

limit

int | None

Maximum number of files to download. Useful for debugging. Defaults to downloading all matching files.

Returns

This function does not return any value. The downloaded files will be saved in the specified output_dir.

© 2025 Lingxi Li.

San Francisco

API Reference - WikiDL