API Reference
Last updated
Last updated
snapshot_date
str
The date of the snapshot that you are trying to download. This field is required.
master_url
str
The master URL pointing to the data dump. By default, it is https://dumps.wikimedia.org/enwiki
select_pattern
str
The file selector pattern. lad
means the latest articles dump; ehd
means the edit history dump. You can customize select pattern by using custom_select_pattern
argument. By default, it is lad
.
custom_select_pattern
str | None
Customize a select pattern for target file names. By default, it is None
, so it uses specified select pattern.
num_proc
int
Number of processes you want to use.
log_level
int
The logging level that you want to use. By default, it is logging.INFO
output_dir
str
The output directory that you want all downloaded files to go into. It is required.
limit
int | None
Maximum number of files to be downloaded. It is useful for debugging (you can download a few to see if it is what you want before downloading a huge batch).