-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation Ingestion lacks accuracy #28
Comments
We did put some work in to improve it. GLU-104. Still not amazing but better. Need to merge this at some point. |
Here is my solution: |
That's a solid approach for improving HTML parsing and memory efficiency! Using Cheerio for targeted extraction would definitely help get better quality content from HTML docs. |
Right now we do an axios request on the documentation URL.
If the documentation is graphql or an openapi string, we do an OK job at parsing that and acting accordingly.
If the documentation is an html page, we convert that into markdown and use the first 20k characters or so (see documentation.ts for exact logic) in our context window for the api configuration generation. This seems like a very limited approach, particularly with longer documentation, we can do better.
The text was updated successfully, but these errors were encountered: