Skip to content

Commit

Permalink
clarified token limit behaviour in readme
Browse files Browse the repository at this point in the history
  • Loading branch information
emcf committed Apr 21, 2024
1 parent b9a57bc commit d9aea24
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ The input source is either a file path, a URL, or a directory. The pipe will ext
},
]
```
If you want to feed these messages directly into the model, it is important to be mindful of the token limit.
OpenAI does not allow too many images in the prompt (see discussion [here](https://community.openai.com/t/gpt-4-vision-maximum-amount-of-images/573110/6)), so long files should be extracted with `text_only=True` to avoid this issue.

The text and images from these messages may also be prepared for a vector database with `thepipe.core.create_chunks_from_messages` or for downstream use with RAG frameworks. [LiteLLM](https://github.com/BerriAI/litellm) can be used to easily integrate The Pipe with any LLM provider.

It uses a variety of heuristics for optimal performance with vision-language models, including AI filetype detection with [filetype detection](https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient-file-type-identification.html), opt-in AI [table, equation, and figure extraction](https://thepi.pe/pricing), efficient [token compression](https://arxiv.org/abs/2403.12968), automatic [image encoding](https://en.wikipedia.org/wiki/Base64), [reranking](https://arxiv.org/abs/2310.06839) for [lost-in-the-middle](https://arxiv.org/abs/2307.03172) effects, and more, all pre-built to work out-of-the-box.
Expand Down

0 comments on commit d9aea24

Please sign in to comment.