Add support for PDFs in Claude and Gemini #265

walkerke · 2025-01-23T14:55:09Z

Hi {ellmer} devs - thanks so much for developing this package. It's become a near-daily part of my work.

Both Claude and Gemini have native support for PDF files as content; this PR adds support for PDF files for these providers, and handles both local and remote PDFs. In my work with document processing, I've found that uploading the PDF directly gets better results than a local OCR --> text --> LLM workflow, so I'm finding this feature in my fork of {ellmer} quite useful.

Here's how it works:

library(ellmer)

pdf <- content_pdf_url("https://cran.r-project.org/web/packages/ellmer/ellmer.pdf")

chat <- chat_claude()

chat$chat("I've uploaded documentation for an R package.  Tell me the name and authors of the R package, and give me a concise summary on what you think the most significant use-cases are for the package.", pdf)

Based on the documentation, the 'ellmer' package (version 0.1.0) is authored by 
Hadley Wickham, Joe Cheng, and Posit Software, PBC. Here are what I see as the 
most significant use-cases for this package:

1. Unified Interface for Multiple LLM Providers:
- The package provides a consistent interface for interacting with various large 
language model providers including Claude, OpenAI, Azure, Bedrock, Databricks, 
Gemini, and others
- This allows R users to easily switch between different LLM providers while 
using the same code structure

2. Interactive Chat Capabilities:
- Supports both console and browser-based interactive chat interfaces
- Enables streaming responses in real-time
- Allows asynchronous calls for non-blocking operations

3. Structured Data Extraction:
- Provides tools for extracting structured data from LLM responses
- Includes type specification systems to ensure proper data formatting
- Supports conversion between JSON and R data structures

4. Tool Integration:
- Allows registration and calling of R functions as tools that the LLM can use
- Enables the LLM to programmatically interact with R functions
- Supports automated handling of the tool calling loop

5. Image Handling:
- Supports sending images to LLMs that have image understanding capabilities
- Can handle both remote image URLs and local image files
- Includes utilities for image resizing and format conversion

The package appears designed to make it easier for R users to incorporate large 
language models into their workflows while maintaining a consistent and familiar 
programming interface.

The only additional dependency is {qpdf}, added to Suggests, which allows users to compress PDFs before uploading in the event of size limitations.

Is this something that would interest you? If so I'll clean up the failing checks, and add some tests and potentially some error handling for other providers that don't yet support PDFs before flagging it as ready to merge.

hadley · 2025-01-23T19:47:45Z

Yes, for sure! Let me know if there's anything I can do to help.

walkerke · 2025-01-24T17:17:12Z

Great, thanks @hadley! I'll let you know when the PR is ready for review.

walkerke added 2 commits January 22, 2025 09:04

support for PDFs in Claude and Gemini

1502688

declare the url param

a26f8c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PDFs in Claude and Gemini #265

Add support for PDFs in Claude and Gemini #265

walkerke commented Jan 23, 2025 •

edited

Loading

hadley commented Jan 23, 2025

walkerke commented Jan 24, 2025

Add support for PDFs in Claude and Gemini #265

Are you sure you want to change the base?

Add support for PDFs in Claude and Gemini #265

Conversation

walkerke commented Jan 23, 2025 • edited Loading

hadley commented Jan 23, 2025

walkerke commented Jan 24, 2025

walkerke commented Jan 23, 2025 •

edited

Loading