llama3.2-vision:11b returns summary instead of OCR'd text #14

darkobodnaruk · 2025-01-09T12:51:34Z

When I use the example from README to process a jpg using the llama3.2-vision:11b, I get a summary of the image instead of the model extracting the exact text. I've verified the prompt to be correct ("Please look at this image and extract all the text content...") so it's weird that the returned result goes like:

The image shows a computer screen displaying multiple windows with text in a foreign language, likely Slovenian. The purpose of the image is to provide information about medical records or patient data.

Here are the details of the image:

... and then goes to describe the image instead of extracting the text.

I've also tried with a screenshot with English text and the results are the same, it tries to summarize the text in the screen, not extract/quote it.

This is likely an issue with the model, just wondering if anyone else had this?

The text was updated successfully, but these errors were encountered:

darkobodnaruk · 2025-01-10T08:06:16Z

I haven't been playing with (smaller) open models a lot, but maybe this is what they mean by "the model is good, but it's bad at following instructions"?

bdqnaccphantianyang · 2025-01-11T07:50:56Z

Okay, I have the same problem, have you solved it?

darkobodnaruk changed the title ~~image with non-English text~~ llama3.2-vision:11b respondign with summary instead of OCR'd text Jan 9, 2025

darkobodnaruk changed the title ~~llama3.2-vision:11b respondign with summary instead of OCR'd text~~ llama3.2-vision:11b returns with summary instead of OCR'd text Jan 9, 2025

darkobodnaruk changed the title ~~llama3.2-vision:11b returns with summary instead of OCR'd text~~ llama3.2-vision:11b returns summary instead of OCR'd text Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3.2-vision:11b returns summary instead of OCR'd text #14

llama3.2-vision:11b returns summary instead of OCR'd text #14

darkobodnaruk commented Jan 9, 2025

darkobodnaruk commented Jan 10, 2025

bdqnaccphantianyang commented Jan 11, 2025

llama3.2-vision:11b returns summary instead of OCR'd text #14

llama3.2-vision:11b returns summary instead of OCR'd text #14

Comments

darkobodnaruk commented Jan 9, 2025

darkobodnaruk commented Jan 10, 2025

bdqnaccphantianyang commented Jan 11, 2025