-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run it from Python #4
Comments
this is a npm package for javascript/typescript. if you want to same functionality in python search for something similar on https://pypi.org/ or rewrite this script yourself using python. This package is quite simple to implement actually, so if you have any experience in python it should not be a problem to do |
@russssl i would suggest to write the package himself, but the model seem to be wrapped in the library together, unless it exists in pypi or open sourced then he can’t write the package in python. Please let me know if i’m wrong at something .. ? |
I made a python version (not tested a lot but it works) import os
import base64
from typing import Literal, Optional
from together import AsyncTogether
import requests
def encode_image(image_path: str) -> str:
"""Encode an image file to base64 string."""
with open(image_path, 'rb') as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def is_remote_file(file_path: str) -> bool:
"""Check if the file path is a remote URL."""
return file_path.startswith(('http://', 'https://'))
async def get_markdown(together_client: AsyncTogether, vision_llm: str, file_path: str) -> str:
"""Generate markdown from image using Together AI."""
system_prompt = """Convert the provided image into Markdown format. Ensure that all content from the page is included, such as headers, footers, subtexts, images (with alt text if possible), tables, and any other elements.
Requirements:
- Output Only Markdown: Return solely the Markdown content without any additional explanations or comments.
- No Delimiters: Do not use code fences or delimiters like ```markdown.
- Complete Content: Do not omit any part of the page, including headers, footers, and subtext.
"""
final_image_url = file_path if is_remote_file(
file_path) else f"data:image/jpeg;base64,{encode_image(file_path)}"
output = await together_client.chat.completions.create(
model=vision_llm,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": system_prompt},
{
"type": "image_url",
"image_url": {
"url": final_image_url
}
}
]
}
]
)
return output.choices[0].message.content
async def ocr(
file_path: str,
api_key: Optional[str] = None,
model: Literal["Llama-3.2-90B-Vision",
"Llama-3.2-11B-Vision", "free"] = "Llama-3.2-90B-Vision"
) -> str:
"""
Perform OCR on an image using Together AI.
Args:
file_path: Path to the image file or URL
api_key: Together AI API key (defaults to TOGETHER_API_KEY environment variable)
model: Model to use for vision processing
Returns:
Markdown string of the image content
"""
if api_key is None:
api_key = os.getenv("TOGETHER_API_KEY")
if not api_key:
raise ValueError(
"API key must be provided either directly or through TOGETHER_API_KEY environment variable")
vision_llm = f"meta-llama/{
model}-Instruct-Turbo" if model != "free" else "meta-llama/Llama-Vision-Free"
together_client = AsyncTogether(api_key=api_key)
final_markdown = await get_markdown(together_client, vision_llm, file_path)
return final_markdown
print(await ocr("image.png")) |
Hi,
Could you please provide guidance on how to run Llama OCR in Python?
Thanks
The text was updated successfully, but these errors were encountered: