How to run it from Python #4

scotwilli · 2024-11-13T12:16:32Z

Hi,

Could you please provide guidance on how to run Llama OCR in Python?

Thanks

russssl · 2024-11-14T10:02:36Z

this is a npm package for javascript/typescript. if you want to same functionality in python search for something similar on https://pypi.org/ or rewrite this script yourself using python. This package is quite simple to implement actually, so if you have any experience in python it should not be a problem to do

lamoboos223 · 2024-11-14T21:30:13Z

@russssl i would suggest to write the package himself, but the model seem to be wrapped in the library together, unless it exists in pypi or open sourced then he can’t write the package in python. Please let me know if i’m wrong at something .. ?

Pant · 2024-11-16T17:29:44Z

I made a python version (not tested a lot but it works)

import os
import base64
from typing import Literal, Optional
from together import AsyncTogether
import requests


def encode_image(image_path: str) -> str:
    """Encode an image file to base64 string."""
    with open(image_path, 'rb') as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')


def is_remote_file(file_path: str) -> bool:
    """Check if the file path is a remote URL."""
    return file_path.startswith(('http://', 'https://'))


async def get_markdown(together_client: AsyncTogether, vision_llm: str, file_path: str) -> str:
    """Generate markdown from image using Together AI."""
    system_prompt = """Convert the provided image into Markdown format. Ensure that all content from the page is included, such as headers, footers, subtexts, images (with alt text if possible), tables, and any other elements.

    Requirements:

    - Output Only Markdown: Return solely the Markdown content without any additional explanations or comments.
    - No Delimiters: Do not use code fences or delimiters like ```markdown.
    - Complete Content: Do not omit any part of the page, including headers, footers, and subtext.
    """

    final_image_url = file_path if is_remote_file(
        file_path) else f"data:image/jpeg;base64,{encode_image(file_path)}"

    output = await together_client.chat.completions.create(
        model=vision_llm,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": system_prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": final_image_url
                        }
                    }
                ]
            }
        ]
    )

    return output.choices[0].message.content


async def ocr(
    file_path: str,
    api_key: Optional[str] = None,
    model: Literal["Llama-3.2-90B-Vision",
                   "Llama-3.2-11B-Vision", "free"] = "Llama-3.2-90B-Vision"
) -> str:
    """
    Perform OCR on an image using Together AI.

    Args:
        file_path: Path to the image file or URL
        api_key: Together AI API key (defaults to TOGETHER_API_KEY environment variable)
        model: Model to use for vision processing

    Returns:
        Markdown string of the image content
    """
    if api_key is None:
        api_key = os.getenv("TOGETHER_API_KEY")
        if not api_key:
            raise ValueError(
                "API key must be provided either directly or through TOGETHER_API_KEY environment variable")

    vision_llm = f"meta-llama/{
        model}-Instruct-Turbo" if model != "free" else "meta-llama/Llama-Vision-Free"

    together_client = AsyncTogether(api_key=api_key)

    final_markdown = await get_markdown(together_client, vision_llm, file_path)

    return final_markdown


print(await ocr("image.png"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run it from Python #4

How to run it from Python #4

scotwilli commented Nov 13, 2024

russssl commented Nov 14, 2024

lamoboos223 commented Nov 14, 2024

Pant commented Nov 16, 2024

How to run it from Python #4

How to run it from Python #4

Comments

scotwilli commented Nov 13, 2024

russssl commented Nov 14, 2024

lamoboos223 commented Nov 14, 2024

Pant commented Nov 16, 2024