clarified token limit behaviour in readme

emcf · Apr 21, 2024 · d9aea24 · d9aea24
1 parent b9a57bc
commit d9aea24
Showing 1 changed file with 3 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -90,6 +90,9 @@ The input source is either a file path, a URL, or a directory. The pipe will ext
   },
 ]
 ```
+If you want to feed these messages directly into the model, it is important to be mindful of the token limit.
+OpenAI does not allow too many images in the prompt (see discussion [here](https://community.openai.com/t/gpt-4-vision-maximum-amount-of-images/573110/6)), so long files should be extracted with `text_only=True` to avoid this issue. 
+
 The text and images from these messages may also be prepared for a vector database with `thepipe.core.create_chunks_from_messages` or for downstream use with RAG frameworks. [LiteLLM](https://github.com/BerriAI/litellm) can be used to easily integrate The Pipe with any LLM provider. 
 
 It uses a variety of heuristics for optimal performance with vision-language models, including AI filetype detection with [filetype detection](https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient-file-type-identification.html), opt-in AI [table, equation, and figure extraction](https://thepi.pe/pricing), efficient [token compression](https://arxiv.org/abs/2403.12968), automatic [image encoding](https://en.wikipedia.org/wiki/Base64), [reranking](https://arxiv.org/abs/2310.06839) for [lost-in-the-middle](https://arxiv.org/abs/2307.03172) effects, and more, all pre-built to work out-of-the-box.