- A collection of
pyspark
scripts exploring Wikipedia data dumps and how they can be used to generate questions. - A small
flask
-based web app that generates questions by usingllama3
deployed viaollama
.
More information about this project can be found in its' accompanying dev journal here:
This project uses poetry for Python dependency management and running the scripts, Docker for containerization, and GNU Make as a build tool. With these three tools installed, you can run
make WIKIDATA_DUMP=<path_to_xml.bz2_file>
and all the necessary steps should be done out of the box.
To clean everything up, run
make clean