Skip to content

Commit

Permalink
updated readme to use extract_from_file example
Browse files Browse the repository at this point in the history
  • Loading branch information
emcf committed Oct 31, 2024
1 parent 59e50f4 commit e922fb8
Showing 1 changed file with 11 additions and 15 deletions.
26 changes: 11 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,8 @@ The output from thepi.pe is a list of chunks containing all content within the s
The extract function allows you to extract structured data from documents. You can use it as follows:

```python
from thepipe.extract import extract_from_chunk
from thepipe.scraper import scrape_file

# First, scrape the document
chunks = scrape_file(filepath="document.pdf", ai_extraction=True)
from thepipe.extract import extract_from_file
import json

# Define your schema
schema = {
Expand All @@ -100,16 +97,15 @@ schema = {
"is_student": "bool"
}

# Extract data from each chunk
for chunk in chunks:
result, tokens_used = extract_from_chunk(
chunk=chunk,
schema=json.dumps(schema),
ai_model="gpt-4o-mini",
multiple_extractions=True
)
print(result)
print(f"Tokens used: {tokens_used}")
# Extract data from the file
result = extract_from_file(
file_path="document.pdf",
schema=schema,
ai_model="gpt-4o-mini",
multiple_extractions=True
)

print(result)
```

### Local Installation (Python)
Expand Down

0 comments on commit e922fb8

Please sign in to comment.