Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eyecite Changes and Impact on Report #192

Open
flooie opened this issue Jan 13, 2025 · 5 comments
Open

Eyecite Changes and Impact on Report #192

flooie opened this issue Jan 13, 2025 · 5 comments
Assignees

Comments

@flooie
Copy link
Contributor

flooie commented Jan 13, 2025

With the recent PR'd changes to Eyecite, there are corresponding updates required for the Eyecite report. (maybe)

The new branch in the PR resulted in around 500 fewer citations. I want to ensure this reduction is accurate.

My goal is to adjust the Eyecite report to account for the newly created reference citations while also understanding the reasons behind the dropped citations.

@flooie flooie moved this to General Backlog in Case Law Sprint Jan 13, 2025
@mlissner
Copy link
Member

I thought that just reflected it being slower?

@flooie
Copy link
Contributor Author

flooie commented Jan 13, 2025

If you look at the output CSV you can see what got added or dropped.

https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/191/results/output.csv

It shows a bunch of at 123 stuff dropped. things that dont look like full citations and I would be surprised about.

@flooie flooie moved this from General Backlog to Buffer Zone in Case Law Sprint Jan 13, 2025
@flooie flooie moved this from Buffer Zone to January 27 to Feb 7 in Case Law Sprint Jan 13, 2025
@flooie flooie moved this from January 27 to Feb 7 to Backlog Jan 13 to Jan 24 in Case Law Sprint Jan 13, 2025
@flooie flooie self-assigned this Jan 14, 2025
@flooie
Copy link
Contributor Author

flooie commented Jan 15, 2025

The Good
• The report successfully worked and identified a bug before we could run it.

The Bad
• The “1 percent random sample” is actually 0.0078%.
• The “10 percent random sample” is actually 0.076%.
• The report inverts the gains and losses when run on the GitHub Action, but it behaves correctly when run locally.

@flooie
Copy link
Contributor Author

flooie commented Jan 23, 2025

We need to decide whether to use a 1% file. While it finds more issues, it significantly slows down execution—locally, it takes about 45 minutes to run.

Additionally, I encountered edge cases that caused crashes, particularly related to empty opinions and XML parsing—issues we’ve already seen in CL, so they’re not surprising.

Another concern is the file size, which is quite large. To mitigate this locally, I extracted only the relevant HTML for testing and removed unnecessary data to reduce the file size.

None of these are insurmountable - I just wonder if 1% is too large considering how slowly it runs. @mlissner

@flooie flooie moved this from Backlog Jan 13 to Jan 24 to Blocked in Case Law Sprint Jan 23, 2025
@mlissner
Copy link
Member

That'd be a real pain if we had to wait for that all the time. It's nice to be able to find more problems, but I think 0.1% is probably fine. Ideally having the option would be nice, like if you could tag the PR with a label or if they both ran all the time and you could choose whether to merge before the 1% file finished.

But I think my take is it's almost certainly not worth the delay. If we can do it sometimes, that's nice, but probably not worth spending more than a few minutes on?

@flooie flooie moved this from Blocked to Future... in Case Law Sprint Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Future...
Development

No branches or pull requests

2 participants