Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run it from Python #4

Open
scotwilli opened this issue Nov 13, 2024 · 3 comments
Open

How to run it from Python #4

scotwilli opened this issue Nov 13, 2024 · 3 comments

Comments

@scotwilli
Copy link

Hi,

Could you please provide guidance on how to run Llama OCR in Python?

Thanks

@russssl
Copy link

russssl commented Nov 14, 2024

this is a npm package for javascript/typescript. if you want to same functionality in python search for something similar on https://pypi.org/ or rewrite this script yourself using python. This package is quite simple to implement actually, so if you have any experience in python it should not be a problem to do

@lamoboos223
Copy link

@russssl i would suggest to write the package himself, but the model seem to be wrapped in the library together, unless it exists in pypi or open sourced then he can’t write the package in python. Please let me know if i’m wrong at something .. ?

@Pant
Copy link

Pant commented Nov 16, 2024

I made a python version (not tested a lot but it works)

import os
import base64
from typing import Literal, Optional
from together import AsyncTogether
import requests


def encode_image(image_path: str) -> str:
    """Encode an image file to base64 string."""
    with open(image_path, 'rb') as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')


def is_remote_file(file_path: str) -> bool:
    """Check if the file path is a remote URL."""
    return file_path.startswith(('http://', 'https://'))


async def get_markdown(together_client: AsyncTogether, vision_llm: str, file_path: str) -> str:
    """Generate markdown from image using Together AI."""
    system_prompt = """Convert the provided image into Markdown format. Ensure that all content from the page is included, such as headers, footers, subtexts, images (with alt text if possible), tables, and any other elements.

    Requirements:

    - Output Only Markdown: Return solely the Markdown content without any additional explanations or comments.
    - No Delimiters: Do not use code fences or delimiters like ```markdown.
    - Complete Content: Do not omit any part of the page, including headers, footers, and subtext.
    """

    final_image_url = file_path if is_remote_file(
        file_path) else f"data:image/jpeg;base64,{encode_image(file_path)}"

    output = await together_client.chat.completions.create(
        model=vision_llm,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": system_prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": final_image_url
                        }
                    }
                ]
            }
        ]
    )

    return output.choices[0].message.content


async def ocr(
    file_path: str,
    api_key: Optional[str] = None,
    model: Literal["Llama-3.2-90B-Vision",
                   "Llama-3.2-11B-Vision", "free"] = "Llama-3.2-90B-Vision"
) -> str:
    """
    Perform OCR on an image using Together AI.

    Args:
        file_path: Path to the image file or URL
        api_key: Together AI API key (defaults to TOGETHER_API_KEY environment variable)
        model: Model to use for vision processing

    Returns:
        Markdown string of the image content
    """
    if api_key is None:
        api_key = os.getenv("TOGETHER_API_KEY")
        if not api_key:
            raise ValueError(
                "API key must be provided either directly or through TOGETHER_API_KEY environment variable")

    vision_llm = f"meta-llama/{
        model}-Instruct-Turbo" if model != "free" else "meta-llama/Llama-Vision-Free"

    together_client = AsyncTogether(api_key=api_key)

    final_markdown = await get_markdown(together_client, vision_llm, file_path)

    return final_markdown


print(await ocr("image.png"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants