Skip to content

Commit

Permalink
Merge branch 'main' into fix_convert_messages
Browse files Browse the repository at this point in the history
  • Loading branch information
jxnl authored Nov 23, 2024
2 parents 23f938c + ff4c1f1 commit feb7b50
Show file tree
Hide file tree
Showing 126 changed files with 6,563 additions and 4,388 deletions.
6 changes: 5 additions & 1 deletion docs/blog/.authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,8 @@ authors:
name: Thierry Jean
description: Contributor
avatar: https://avatars.githubusercontent.com/u/68975210?v=4
url: https://www.linkedin.com/in/thierry-jean/
url: https://www.linkedin.com/in/thierry-jean/
yanomaly:
name: Yan
description: Contributor
avatar: https://avatars.githubusercontent.com/u/87994542?v=4
13 changes: 6 additions & 7 deletions docs/blog/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,13 @@ If you want to get updates on new features and tips on how to use Instructor, yo

## Integrations and Tools

- [Ollama Integration](../hub/ollama.md)
- [llama-cpp-python Integration](../hub/llama-cpp-python.md)
- [Anyscale Integration](../hub/anyscale.md)
- [Together Compute Integration](../hub/together.md)
- [Extracting Data into Pandas DataFrame using GPT-3.5 Turbo](../hub/pandas_df.md)
- [Implementing Streaming Partial Responses with Field-Level Streaming](../hub/partial_streaming.md)
- [Ollama Integration](../integrations/ollama.md)
- [llama-cpp-python Integration](../integrations/llama-cpp-python.md)
- [Together Compute Integration](../integrations/together.md)
- [Pandas DataFrame Examples](../examples/bulk_classification.md#working-with-dataframes)
- [Streaming Response Examples](../examples/bulk_classification.md#streaming-responses)

## Media and Resources

- [Course: Structured Outputs with Instructor](https://www.wandb.courses/courses/steering-language-models?x=1)
- [Keynote: Pydantic is All You Need](posts/aisummit-2023.md)
- [Keynote: Pydantic is All You Need](posts/aisummit-2023.md)
12 changes: 6 additions & 6 deletions docs/blog/posts/best_framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ from pydantic import BaseModel
import instructor

class User(BaseModel):
name: str
name: str
age: int

client = instructor.from_openai(openai.OpenAI())
Expand All @@ -42,7 +42,7 @@ user = client.chat.completions.create(
response_model=User, # (1)!
messages=[
{
"role": "user",
"role": "user",
"content": "Extract the user's name and age from this: John is 25 years old"
}
]
Expand All @@ -63,14 +63,14 @@ Other features on instructor, in and out of the llibrary are:
2. Ability to use [Pydantic's validation context](../../concepts/reask_validation.md)
3. [Parallel Tool Calling](../../concepts/parallel.md) with correct types
4. Streaming [Partial](../../concepts/partial.md) and [Iterable](../../concepts/iterable.md) data.
5. Returning [Primitive](../../concepts/types.md) Types and [Unions](../../concepts/unions.md) as well!
6. Lots, and Lots of [Cookbooks](../../examples/index.md), [Tutorials](../../tutorials/1-introduction.ipynb), Documentation and even [instructor hub](../../hub/index.md)
5. Returning [Primitive](../../concepts/types.md) Types and [Unions](../../concepts/unions.md) as well!
6. Lots of [Cookbooks](../../examples/index.md), [Tutorials](../../tutorials/1-introduction.ipynb), and comprehensive Documentation in our [Integration Guides](../../integrations/index.md)

## Instructor's Broad Applicability

One of the key strengths of Instructor is that it's designed as a lightweight patch over the official OpenAI Python SDK. This means it can be easily integrated not just with OpenAI's hosted API service, but with any provider or platform that exposes an interface compatible with the OpenAI SDK.

For example, providers like [Anyscale](../../hub/anyscale.md), [Together](../../hub/together.md), [Ollama](../../hub/ollama.md), [Groq](../../hub/groq.md), and [llama-cpp-python](../../hub/llama-cpp-python.md) all either use or mimic the OpenAI Python SDK under the hood. With Instructor's zero-overhead patching approach, teams can immediately start deriving structured data outputs from any of these providers. There's no need for custom integration work.
For example, providers like [Together](../../integrations/together.md), [Ollama](../../integrations/ollama.md), [Groq](../../integrations/groq.md), and [llama-cpp-python](../../integrations/llama-cpp-python.md) all either use or mimic the OpenAI Python SDK under the hood. With Instructor's zero-overhead patching approach, teams can immediately start deriving structured data outputs from any of these providers. There's no need for custom integration work.

## Direct access to the messages array

Expand All @@ -84,4 +84,4 @@ This incremental, zero-overhead adoption path makes Instructor perfect for sprin

And if you decide Instructor isn't a good fit after all, removing it is as simple as not applying the patch! The familiarity and flexibility of working directly with the OpenAI SDK is a core strength.

Instructor solves the "string hellll" of unstructured LLM outputs. It allows teams to easily realize the full potential of tools like GPTs by mapping their text to type-safe, validated data structures. If you're looking to get more structured value out of LLMs, give Instructor a try!
Instructor solves the "string hellll" of unstructured LLM outputs. It allows teams to easily realize the full potential of tools like GPTs by mapping their text to type-safe, validated data structures. If you're looking to get more structured value out of LLMs, give Instructor a try!
108 changes: 108 additions & 0 deletions docs/blog/posts/chat-with-your-pdf-with-gemini.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
authors:
- ivanleomk
categories:
- Gemini
- Document Processing
comments: true
date: 2024-11-11
description: Learn how to use Google's Gemini model with Instructor to process PDFs and extract structured information
draft: false
tags:
- Gemini
- Document Processing
- PDF Analysis
- Pydantic
- Python
---

# PDF Processing with Structured Outputs with Gemini

In this post, we'll explore how to use Google's Gemini model with Instructor to analyse the [Gemini 1.5 Pro Paper](https://github.com/google-gemini/generative-ai-python/blob/0e5c5f25fe4ce266791fa2afb20d17dee780ca9e/third_party/test.pdf) and extract a structured summary.

## The Problem

Processing PDFs programmatically has always been painful. The typical approaches all have significant drawbacks:

- **PDF parsing libraries** require complex rules and break easily
- **OCR solutions** are slow and error-prone
- **Specialized PDF APIs** are expensive and require additional integration
- **LLM solutions** often need complex document chunking and embedding pipelines

What if we could just hand a PDF to an LLM and get structured data back? With Gemini's multimodal capabilities and Instructor's structured output handling, we can do exactly that.

## Quick Setup

First, install the required packages:

```bash
pip install "instructor[google-generativeai]"
```

Then, here's all the code you need:

```python
import instructor
import google.generativeai as genai
from google.ai.generativelanguage_v1beta.types.file import File
from pydantic import BaseModel
import time

# Initialize the client
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-flash-latest",
)
)

# Define your output structure
class Summary(BaseModel):
summary: str

# Upload the PDF
file = genai.upload_file("path/to/your.pdf")

# Wait for file to finish processing
while file.state != File.State.ACTIVE:
time.sleep(1)
file = genai.get_file(file.name)
print(f"File is still uploading, state: {file.state}")

print(f"File is now active, state: {file.state}")
print(file)

resp = client.chat.completions.create(
messages=[
{"role": "user", "content": ["Summarize the following file", file]},
],
response_model=Summary,
)

print(resp.summary)
```

??? note "Expand to see Raw Results"

```bash
summary="Gemini 1.5 Pro is a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Gemini 1.5 Pro is built to handle extremely long contexts; it has the ability to recall and reason over fine-grained information from up to at least 10M tokens. This scale is unprecedented among contemporary large language models (LLMs), and enables the processing of long-form mixed-modality inputs including entire collections of documents, multiple hours of video, and almost five days long of audio. Gemini 1.5 Pro surpasses Gemini 1.0 Pro and performs at a similar level to 1.0 Ultra on a wide array of benchmarks while requiring significantly less compute to train. It can recall information amidst distractor context, and it can learn to translate a new language from a single set of linguistic documentation. With only instructional materials (a 500-page reference grammar, a dictionary, and ≈ 400 extra parallel sentences) all provided in context, Gemini 1.5 Pro is capable of learning to translate from English to Kalamang, a Papuan language with fewer than 200 speakers, and therefore almost no online presence."
```

## Benefits

The combination of Gemini and Instructor offers several key advantages over traditional PDF processing approaches:

**Simple Integration** - Unlike traditional approaches that require complex document processing pipelines, chunking strategies, and embedding databases, you can directly process PDFs with just a few lines of code. This dramatically reduces development time and maintenance overhead.

**Structured Output** - Instructor's Pydantic integration ensures you get exactly the data structure you need. The model's outputs are automatically validated and typed, making it easier to build reliable applications. If the extraction fails, Instructor automatically handles the retries for you with support for [custom retry logic using tenacity](../../concepts/retrying.md).

**Multimodal Support** - Gemini's multimodal capabilities mean this same approach works for various file types. You can process images, videos, and audio files all in the same api request. Check out our [multimodal processing guide](./multimodal-gemini.md) to see how we extract structured data from travel videos.

## Conclusion

Working with PDFs doesn't have to be complicated.

By combining Gemini's multimodal capabilities with Instructor's structured output handling, we can transform complex document processing into simple, Pythonic code.

No more wrestling with parsing rules, managing embeddings, or building complex pipelines – just define your data model and let the LLM do the heavy lifting.

If you liked this, give `instructor` a try today and see how much easier structured outputs makes working with LLMs become. [Get started with Instructor today!](../../index.md)
170 changes: 170 additions & 0 deletions docs/blog/posts/generating-pdf-citations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
authors:
- ivanleomk
categories:
- Gemini
- Document Processing
comments: true
date: 2024-11-15
description: Generate accurate citations and eliminate hallucinations with structured outputs using Gemini.
draft: false
tags:
- Gemini
- Document Processing
- PDF Analysis
- Pydantic
- Python
---

# Eliminating Hallucinations with Structured Outputs using Gemini

In this post, we'll explore how to use Google's Gemini model with Instructor to generate accurate citations from PDFs. This approach ensures that answers are grounded in the actual content of the PDF, reducing the risk of hallucinations.

We'll be using the Nvidia 10k report for this example which you can download at this [link](https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/78501ce3-7816-4c4d-8688-53dd140df456.pdf).

<!-- more -->

## Introduction

When processing PDFs, it's crucial to ensure that any answers or insights derived are directly linked to the source material. This is especially important in applications where users need to verify the origin of information, such as legal or academic contexts.

We're using PyMuPDF here to handle PDF parsing but you can use any other library that you want. Ultimately when your citations get more complex, you'll want to invest more time into validating the PDF citations against a document.

## Setting Up the Environment

First, let's set up our environment with the necessary libraries:

```bash
pip install "instructor[google-generativeai]" pymupdf
```

Then let's import the necessary libraries:

```python
import instructor
import google.generativeai as genai
from google.ai.generativelanguage_v1beta.types.file import File
from pydantic import BaseModel
import pymupdf
import time
```

## Defining Our Data Models

We'll use Pydantic to define our data models for citations and answers:

```python
class Citation(BaseModel):
reason_for_relevance: str
text: list[str]
page_number: int

class Answer(BaseModel):
chain_of_thought: str
citations: list[Citation]
answer: str
```

## Initializing the Gemini Client

Next, we'll set up our Gemini client using Instructor:

```python
client = instructor.from_gemini(
client=genai.GenerativeModel(
model_name="models/gemini-1.5-pro-latest",
)
)
```

## Processing the PDF

To analyze a PDF and generate citations, follow these steps:

```python
pdf_path = "./10k.pdf"
doc = pymupdf.open(pdf_path)

# Upload the PDF
file = genai.upload_file(pdf_path)

# Wait for file to finish processing
while file.state != File.State.ACTIVE:
time.sleep(1)
file = genai.get_file(file.name)
print(f"File is still uploading, state: {file.state}")

resp: Answer = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful assistant that can answer questions about the provided pdf file. You will be given a question and a pdf file. Your job is to answer the question using the information in the pdf file. Provide all citations that are relevant to the question and make sure that the coordinates are accurate.",
},
{
"role": "user",
"content": [
"What were all of the export restrictions announced by the USG in 2023? What chips did they affect?",
file,
],
},
],
response_model=Answer,
)

print(resp)
# Answer(
# chain_of_thought="The question asks about export restrictions in 2023. Page 25 mentions the USG announcing licensing requirements for A100 and H100 chips in August 2022, and additional licensing requirements for a subset of these products in July 2023.",
# citations=[
# Citation(
# reason_for_relevance="Describes the export licensing requirements and which chips they affect.",
# text=[
# "In August 2022, the U.S. government, or the USG, announced licensing requirements that, with certain exceptions, impact exports to China (including Hong",
# "Kong and Macau) and Russia of our A100 and H100 integrated circuits, DGX or any other systems or boards which incorporate A100 or H100 integrated circuits.",
# "In July 2023, the USG informed us of an additional licensing requirement for a subset of A100 and H100 products destined to certain customers and other",
# "regions, including some countries in the Middle East.",
# ],
# page_number=25,
# )
# ],
# answer="In 2023, the U.S. government (USG) announced new licensing requirements for the export of certain chips to China, Russia, and other countries. These chips included the A100 and H100 integrated circuits, the DGX system, and any other systems or boards incorporating the A100 or H100 chips.",
# )

```

## Highlighting Citations in the PDF

Once you have the citations, you can highlight them in the PDF:

```python
for citation in resp.citations:
page = doc.load_page(citation.page_number - 1)
for text in citation.text:
text_instances = page.search_for(text)
for instance in text_instances:
page.add_highlight_annot(instance)

doc.save("./highlighted.pdf")
doc.close()
```

In our case, we can see that the citations are accurate and the answer is correct.

![Gemini Citations](./img/gemini_citations.png)

## Why Structured Outputs?

One of the significant advantages of using structured outputs is the ability to handle complex data extraction tasks with ease and reliability. When dealing with raw completion strings or JSON data, developers often face challenges related to parsing complexity and code maintainability.

Over time, this just becomes error-prone, difficult to iterate upon and impossible to maintain. Instead, by leveraging pydantic, you get access to one of the best tools available for validating and parsing data.

1. Ease of Definition: Pydantic allows you to define data models with specific fields effortlessly. This makes it easy to understand and maintain the structure of your data.
2. Robust Validation: With Pydantic, you can build validators to test against various edge cases, ensuring that your data is accurate and reliable. This is particularly useful when working with PDFs and citations, as you can validate the extracted data without worrying about the underlying language model.
3. Separation of Concerns: By using structured outputs, the language model's role is reduced to a single function call. This separation allows you to focus on building reliable and efficient data processing pipelines without being bogged down by the intricacies of the language model.

In summary, structured outputs with Pydantic provide a powerful and ergonomic way to manage complex data extraction tasks. They enhance reliability, simplify code maintenance, and enable developers to build better applications with less effort.

## Conclusion

By using Gemini and Instructor, you can generate accurate citations from PDFs, ensuring that your answers are grounded in the source material. This approach is invaluable for applications requiring high levels of accuracy and traceability.

Give instructor a try today and see how you can build reliable applications. Just run `pip install instructor` or check out our [Getting Started Guide](../../index.md)
2 changes: 2 additions & 0 deletions docs/blog/posts/google-openai-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ If you're unfamiliar with instructor, we provide a simple interface to get struc

This makes it easy to switch between providers, get reliable outputs from language models and ultimately build production grade LLM applications.

<!-- more -->

## The current state

The new integration provides an easy integration with the Open AI Client, this means that using function calling with Gemini models has become much easier. We don't need to use a gemini specific library like `vertexai` or `google.generativeai` anymore to define response models.
Expand Down
Binary file added docs/blog/posts/img/gemini_citations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/posts/img/untidy_table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit feb7b50

Please sign in to comment.