Eyecite Changes and Impact on Report #192

flooie · 2025-01-13T15:29:10Z

With the recent PR'd changes to Eyecite, there are corresponding updates required for the Eyecite report. (maybe)

The new branch in the PR resulted in around 500 fewer citations. I want to ensure this reduction is accurate.

My goal is to adjust the Eyecite report to account for the newly created reference citations while also understanding the reasons behind the dropped citations.

mlissner · 2025-01-13T15:34:30Z

I thought that just reflected it being slower?

flooie · 2025-01-13T15:47:38Z

If you look at the output CSV you can see what got added or dropped.

https://raw.githubusercontent.com/freelawproject/eyecite/artifacts/191/results/output.csv

It shows a bunch of at 123 stuff dropped. things that dont look like full citations and I would be surprised about.

flooie · 2025-01-15T15:50:14Z

The Good
• The report successfully worked and identified a bug before we could run it.

The Bad
• The “1 percent random sample” is actually 0.0078%.
• The “10 percent random sample” is actually 0.076%.
• The report inverts the gains and losses when run on the GitHub Action, but it behaves correctly when run locally.

flooie · 2025-01-23T16:16:26Z

We need to decide whether to use a 1% file. While it finds more issues, it significantly slows down execution—locally, it takes about 45 minutes to run.

Additionally, I encountered edge cases that caused crashes, particularly related to empty opinions and XML parsing—issues we’ve already seen in CL, so they’re not surprising.

Another concern is the file size, which is quite large. To mitigate this locally, I extracted only the relevant HTML for testing and removed unnecessary data to reduce the file size.

None of these are insurmountable - I just wonder if 1% is too large considering how slowly it runs. @mlissner

mlissner · 2025-01-23T19:00:06Z

That'd be a real pain if we had to wait for that all the time. It's nice to be able to find more problems, but I think 0.1% is probably fine. Ideally having the option would be nice, like if you could tag the PR with a label or if they both ran all the time and you could choose whether to merge before the 1% file finished.

But I think my take is it's almost certainly not worth the delay. If we can do it sometimes, that's nice, but probably not worth spending more than a few minutes on?

flooie added this to Case Law Sprint Jan 13, 2025

flooie moved this to General Backlog in Case Law Sprint Jan 13, 2025

flooie moved this from General Backlog to Buffer Zone in Case Law Sprint Jan 13, 2025

flooie moved this from Buffer Zone to January 27 to Feb 7 in Case Law Sprint Jan 13, 2025

flooie moved this from January 27 to Feb 7 to Backlog Jan 13 to Jan 24 in Case Law Sprint Jan 13, 2025

flooie self-assigned this Jan 14, 2025

flooie moved this from Backlog Jan 13 to Jan 24 to Blocked in Case Law Sprint Jan 23, 2025

flooie moved this from Blocked to Future... in Case Law Sprint Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eyecite Changes and Impact on Report #192

Eyecite Changes and Impact on Report #192

flooie commented Jan 13, 2025

mlissner commented Jan 13, 2025

flooie commented Jan 13, 2025

flooie commented Jan 15, 2025

flooie commented Jan 23, 2025

mlissner commented Jan 23, 2025

Eyecite Changes and Impact on Report #192

Eyecite Changes and Impact on Report #192

Comments

flooie commented Jan 13, 2025

mlissner commented Jan 13, 2025

flooie commented Jan 13, 2025

flooie commented Jan 15, 2025

flooie commented Jan 23, 2025

mlissner commented Jan 23, 2025