- Scrape CVEs from a number of applications from https://cve.mitre.org/cgi-bin/cvekey.cgi, see scripts/scrape_cves.py
- Download the CVEs details, save them in
cvelistV5-extracted
and extravt related insearch_results
- Use llms to identify the CVEs related to extensions, see scripts/check_extension.py and scripts/prompts/check_extension.template.md
- The result is saved in each dir like search_results/minecraft/extension_analysis.json and a summary like extension_analysis.json
- Mannually check the extension analysis results to removed false positives for analysis targets (checked: minecraft, mysql, nginx, apache httpd, postgres, redis, kubernetes, docker, vcenter), see the search_results/0001-update-revised-human-check.patch for human checked modified CVEs
- Find 1217 CVEs related to extensions in 17279 total CVEs
Some issues:
- How to define an extension?
- What kind of extension do we care about? is kubernetes controller or some service a kind of extension?
Use scripts to generated reports, see report/analysis.py
Include 2 types of reports:
- without clean report: just the raw CVEs details and checked extensions, see report/basic_report/README.md
- A lot of CVEs has no problem description and no enough meta data
- with clean report: the CVEs details and checked extensions with clean symptoms and causes, see report/cleaned_report/README.md
- See report/clean_csv/README.md for How we clean the raw reports, we filter the apache httpd server out of all apache software, remove the cves has no problem description.
Generate AI summaries for the reports, see report/generate_ai_summary.py
- See report/cleaned_report/ai_report/README.md for the AI summary of the cleaned report
Goal: Look at all security-related CVEs from a number of applications and study their symptoms and causes.Applications:
- Postgres
- MySQL
- Redis
- Nginx
- Apache
- Chrome
- Firefox
- Kubernetes
- Docker
- vCenter
- Minecraft
Symptoms:
- Detail of Service: program crashes or executes indefinitely
- Isolation violation: Attacker can read data that they should not be able to.
- Remote Code Execution: attacker can execute arbitrary code
Causes:
- Resource management issues: memory leaks, concurrency bugs, infinite loops
- Insufficient input validate: does not correctly validate an end-user’s input to the system
- Logic and Design oversights: semantic bugs where the program is written incorrectly.
Insight: preventing these causes, especially input validation and logic/design oversights, is difficult, if not impossible, for a platform to achieve. So, we aim instead to provide an isolated sandbox in which we specify and enforce policies on each extension to minimize the impact of exploits