Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDFs in Claude and Gemini #265

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

walkerke
Copy link

@walkerke walkerke commented Jan 23, 2025

Hi {ellmer} devs - thanks so much for developing this package. It's become a near-daily part of my work.

Both Claude and Gemini have native support for PDF files as content; this PR adds support for PDF files for these providers, and handles both local and remote PDFs. In my work with document processing, I've found that uploading the PDF directly gets better results than a local OCR --> text --> LLM workflow, so I'm finding this feature in my fork of {ellmer} quite useful.

Here's how it works:

library(ellmer)

pdf <- content_pdf_url("https://cran.r-project.org/web/packages/ellmer/ellmer.pdf")

chat <- chat_claude()

chat$chat("I've uploaded documentation for an R package.  Tell me the name and authors of the R package, and give me a concise summary on what you think the most significant use-cases are for the package.", pdf)
Based on the documentation, the 'ellmer' package (version 0.1.0) is authored by 
Hadley Wickham, Joe Cheng, and Posit Software, PBC. Here are what I see as the 
most significant use-cases for this package:

1. Unified Interface for Multiple LLM Providers:
- The package provides a consistent interface for interacting with various large 
language model providers including Claude, OpenAI, Azure, Bedrock, Databricks, 
Gemini, and others
- This allows R users to easily switch between different LLM providers while 
using the same code structure

2. Interactive Chat Capabilities:
- Supports both console and browser-based interactive chat interfaces
- Enables streaming responses in real-time
- Allows asynchronous calls for non-blocking operations

3. Structured Data Extraction:
- Provides tools for extracting structured data from LLM responses
- Includes type specification systems to ensure proper data formatting
- Supports conversion between JSON and R data structures

4. Tool Integration:
- Allows registration and calling of R functions as tools that the LLM can use
- Enables the LLM to programmatically interact with R functions
- Supports automated handling of the tool calling loop

5. Image Handling:
- Supports sending images to LLMs that have image understanding capabilities
- Can handle both remote image URLs and local image files
- Includes utilities for image resizing and format conversion

The package appears designed to make it easier for R users to incorporate large 
language models into their workflows while maintaining a consistent and familiar 
programming interface.

The only additional dependency is {qpdf}, added to Suggests, which allows users to compress PDFs before uploading in the event of size limitations.

Is this something that would interest you? If so I'll clean up the failing checks, and add some tests and potentially some error handling for other providers that don't yet support PDFs before flagging it as ready to merge.

@hadley
Copy link
Member

hadley commented Jan 23, 2025

Yes, for sure! Let me know if there's anything I can do to help.

@walkerke
Copy link
Author

Great, thanks @hadley! I'll let you know when the PR is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants