Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3.2-vision:11b returns summary instead of OCR'd text #14

Open
darkobodnaruk opened this issue Jan 9, 2025 · 2 comments
Open

llama3.2-vision:11b returns summary instead of OCR'd text #14

darkobodnaruk opened this issue Jan 9, 2025 · 2 comments

Comments

@darkobodnaruk
Copy link

When I use the example from README to process a jpg using the llama3.2-vision:11b, I get a summary of the image instead of the model extracting the exact text. I've verified the prompt to be correct ("Please look at this image and extract all the text content...") so it's weird that the returned result goes like:

The image shows a computer screen displaying multiple windows with text in a foreign language, likely Slovenian. The purpose of the image is to provide information about medical records or patient data.

Here are the details of the image:

... and then goes to describe the image instead of extracting the text.

I've also tried with a screenshot with English text and the results are the same, it tries to summarize the text in the screen, not extract/quote it.

This is likely an issue with the model, just wondering if anyone else had this?

@darkobodnaruk darkobodnaruk changed the title image with non-English text llama3.2-vision:11b respondign with summary instead of OCR'd text Jan 9, 2025
@darkobodnaruk darkobodnaruk changed the title llama3.2-vision:11b respondign with summary instead of OCR'd text llama3.2-vision:11b returns with summary instead of OCR'd text Jan 9, 2025
@darkobodnaruk darkobodnaruk changed the title llama3.2-vision:11b returns with summary instead of OCR'd text llama3.2-vision:11b returns summary instead of OCR'd text Jan 9, 2025
@darkobodnaruk
Copy link
Author

I haven't been playing with (smaller) open models a lot, but maybe this is what they mean by "the model is good, but it's bad at following instructions"?

@bdqnaccphantianyang
Copy link

Okay, I have the same problem, have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants