A Java-based web scraper designed to retrieve and send the daily menu from the Restaurante Universitário (RU) of the Federal University of Paraná (UFPR) to anyone interested.
Every day, thousands of people visit the university's website just to check what's being served at the RU (Restaurante Universitário) from Monday to Friday. Many even take screenshots to share the menu via WhatsApp.
This project was created to automate this process, sending the menu directly to WhatsApp every day at 5 AM. This simplifies the daily routine for thousands of people, giving them access to the menu as soon as they wake up and making it easy to share the menu within their conversations.
- Java 17: The core programming language used for the scraper.
- Spring Cloud Function: A framework to create stand-alone, production-grade Spring-based applications.
- JSoup: A Java library for parsing HTML, used here for web scraping.
- Node.js: Used to send messages via WhatsApp.
- Baileys: A third-party library that enables sending WhatsApp messages to channels.
Every workday at 5 AM, EventBridge triggers the ru-scraper function, which retrieves the parsed HTML menu and sends it to ru-whatsapp for formatting and delivery to a WhatsApp channel.
The entire project was designed to be as flexible as possible. Since the university has multiple restaurants, the code and infrastructure were made to be easily adaptable and deployable.
An example can be seen in the diagram, where two restaurants (Politécnico and Botânico) have their own data pipelines.
The project uses AWS cloud services to provide multiple services, making it easier to maintain and develop new features.
- EventBridge: Uses cron expressions to trigger the entire application.
- CodeBuild: Enables triggering the CI/CD pipeline stored on GitHub.
- Lambda: A low-cost and efficient way to run applications.
- CloudWatch: Monitors logs and responses.
Each function has a buildspec file that provides CodeBuild commands for the correct build and deployment. Functions are deployed to Lambda as zip files.
Node.js 17 and Java 17 are used.
The happy path of the data pipeline is website
-> scraper
-> sender
-> WhatsApp
.
The image below is available at this link, where I saved an example.
After processing by the scraper function, this response is sent to the function responsable for sending the WhatsApp message.
The JSON format with its keys was thought for a more general approach. This JSON is generated by the scraper function.
{
"date": "2024-08-25T21:15:44.593395018-03:00",
"imgMenu": null,
"ruName": "JARDIM BOTÂNICO",
"ruUrl": "https://gxlpes.github.io/ru-menu/website-example/ru-website-pol-0908-0808.html",
"ruCode": "BOT",
"served": [
"breakfast",
"lunch",
"dinner"
],
"meals": {
"lunch": [
{
"name": "Peixe à milanesa com limão",
"icons": [
"Origem-animal-site",
"Gluten-site",
"Leite-e-derivados-site",
"Ovo-site",
"Alergenicos-site"
]
},
{
"name": "Vegano: croquete de soja",
"icons": [
"Simbolo-vegano-300x300",
"Gluten-site",
"Alergenicos-site"
]
},
{
"name": "Cenoura com acelga à moda chinesa",
"icons": [
"Simbolo-vegano-300x300",
"Alergenicos-site"
]
},
{
"name": "Saladas de folhosa e tomate",
"icons": [
"Simbolo-vegano-300x300",
"Simbolo-vegano-300x300"
]
},
{
"name": "Gelatina de cereja com creme",
"icons": [
"Origem-animal-site",
"Leite-e-derivados-site"
]
}
],
"breakfast": [
{
"name": "Pão hot-dog com doce de banana",
"icons": [
"Gluten-site",
"Simbolo-vegano-300x300"
]
},
{
"name": "Laranja",
"icons": [
"Simbolo-vegano-300x300"
]
}
],
"dinner": [
{
"name": "Bife à fantasia",
"icons": [
"Origem-animal-site",
"Alergenicos-site"
]
},
{
"name": "Vegano: trouxinha com pasta de feijão branco e milho",
"icons": [
"Simbolo-vegano-300x300"
]
},
{
"name": "Caldo verde",
"icons": [
"Origem-animal-site"
]
},
{
"name": "Saladas de folhosas e pepino",
"icons": [
"Simbolo-vegano-300x300",
"Simbolo-vegano-300x300"
]
}
]
}
}
After getting this JSON formatted data, the sender function can then format in a WhatsApp message with proper formatting and emojis.
- Make better testing cases
- Improve the overall documentation
- Create other services (saver, collector)