You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use the example from README to process a jpg using the llama3.2-vision:11b, I get a summary of the image instead of the model extracting the exact text. I've verified the prompt to be correct ("Please look at this image and extract all the text content...") so it's weird that the returned result goes like:
The image shows a computer screen displaying multiple windows with text in a foreign language, likely Slovenian. The purpose of the image is to provide information about medical records or patient data.
Here are the details of the image:
... and then goes to describe the image instead of extracting the text.
I've also tried with a screenshot with English text and the results are the same, it tries to summarize the text in the screen, not extract/quote it.
This is likely an issue with the model, just wondering if anyone else had this?
The text was updated successfully, but these errors were encountered:
darkobodnaruk
changed the title
image with non-English text
llama3.2-vision:11b respondign with summary instead of OCR'd text
Jan 9, 2025
darkobodnaruk
changed the title
llama3.2-vision:11b respondign with summary instead of OCR'd text
llama3.2-vision:11b returns with summary instead of OCR'd text
Jan 9, 2025
darkobodnaruk
changed the title
llama3.2-vision:11b returns with summary instead of OCR'd text
llama3.2-vision:11b returns summary instead of OCR'd text
Jan 9, 2025
I haven't been playing with (smaller) open models a lot, but maybe this is what they mean by "the model is good, but it's bad at following instructions"?
When I use the example from README to process a jpg using the
llama3.2-vision:11b
, I get a summary of the image instead of the model extracting the exact text. I've verified the prompt to be correct ("Please look at this image and extract all the text content...") so it's weird that the returned result goes like:... and then goes to describe the image instead of extracting the text.
I've also tried with a screenshot with English text and the results are the same, it tries to summarize the text in the screen, not extract/quote it.
This is likely an issue with the model, just wondering if anyone else had this?
The text was updated successfully, but these errors were encountered: