WikiDL

API Reference

WikiDL.__init__

Initializes the WikiDL class for downloading specific snapshots from Wikimedia data dumps.

Parameters

NameTypeDescription
snapshot_datestr

The date of the snapshot to download. This field is required.

master_urlstr

The master URL pointing to the data dump. Defaults to https://dumps.wikimedia.org/enwiki.

select_patternstr

File selector pattern. Defaults to lad (latest articles dump). Use ehd for edit history dump or customize with custom_select_pattern.

custom_select_patternstr | None

Custom select pattern for target file names. Defaults to None, using the specified select_pattern.

num_procint

Number of processes to use.

log_levelint

Logging level to use. Defaults to logging.INFO.

WikiDL.start

Starts the downloading task specified in the WikiDL instance.

Parameters

NameTypeDescription
output_dirstr

Output directory for downloaded files. This field is required.

limitint | None

Maximum number of files to download. Useful for debugging. Defaults to downloading all matching files.

Returns

This function does not return any value. The downloaded files will be saved in the specified output_dir.