Scrape pages, posts, images and other data from a WordPress instance using the WordPress REST API. Use simple command line arguments to clean up the scraped data.
Node.js v19 or newer (for native fetch support).
The following commands use the latest version of wpdl
that is published in npm. To run the script locally, clone this repo and replace npx wpdl
with npx .
.
Scrape pages and posts
npx wpdl --url https://your-wp-instance.com --pages --posts
Scrape pages and clean up the html by filtering out all img
elements and elements with the class foo
. Also remove all elements without text content. From the json files, remove all the Jetpack and Yoast SEO data.
npx wpdl --url https://your-wp-instance.com --pages --elementFilter img --classFilter foo --jsonFilter "jetpack_*" --jsonFilter "yoast_*" --removeEmptyElements
To see full usage, run
npx wpdl -h