WikiDL
WikiDL.__init__
Initializes the WikiDL
class for downloading specific snapshots from Wikimedia data dumps.
Name | Type | Description |
---|---|---|
snapshot_date | str | The date of the snapshot to download. This field is required. |
master_url | str | The master URL pointing to the data dump. Defaults to https://dumps.wikimedia.org/enwiki. |
select_pattern | str | File selector pattern. Defaults to |
custom_select_pattern | str | None | Custom select pattern for target file names. Defaults to |
num_proc | int | Number of processes to use. |
log_level | int | Logging level to use. Defaults to |
WikiDL.start
Starts the downloading task specified in the WikiDL
instance.
Name | Type | Description |
---|---|---|
output_dir | str | Output directory for downloaded files. This field is required. |
limit | int | None | Maximum number of files to download. Useful for debugging. Defaults to downloading all matching files. |
This function does not return any value. The downloaded files will be saved in the specified output_dir
.