WikiDL.__init__
Initializes the WikiDL class for downloading specific snapshots from Wikimedia data dumps.
Parameters
snapshot_date
str
The date of the snapshot to download. This field is required.
master_url
str
The master URL pointing to the data dump. Defaults to https://dumps.wikimedia.org/enwiki.
select_pattern
str
File selector pattern. Defaults to lad (latest article dump). Use ehd for the edit history dump or customize with custom_select_pattern.
custom_select_pattern
str | None
Custom select pattern for target file names. Defaults to None, in which case the specified select_pattern is used.
num_proc
int
Number of processes to use.
log_level
int
Logging level to use. Defaults to logging.INFO.
WikiDL.start
Starts the downloading task specified in the WikiDL instance.
Parameters
output_dir
str
Output directory for downloaded files. This field is required.
limit
int | None
Maximum number of files to download. Useful for debugging. Defaults to downloading all matching files.
Returns
This function does not return any value. The downloaded files will be saved in the specified output_dir.
© 2025 Lingxi Li.
San Francisco