From 0c08dd57f0b66219da80f475043bfd76a0bca65b Mon Sep 17 00:00:00 2001 From: Paul Cornell Date: Mon, 23 Jun 2025 14:47:47 -0700 Subject: [PATCH] Open source: additional limits --- open-source/introduction/overview.mdx | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/open-source/introduction/overview.mdx b/open-source/introduction/overview.mdx index e5fe065e..04c2aab3 100644 --- a/open-source/introduction/overview.mdx +++ b/open-source/introduction/overview.mdx @@ -46,17 +46,20 @@ The Unstructured open source library has the following limits as compared to the * Not designed for production scenarios. * Significantly decreased performance on document and table extraction. -* Access only to older and less sophisticated vision transformer models. +* No access to Unstructured's latest vision language model (VLM) offerings. * No access to Unstructured's fine-tuned OCR models. * No access to Unstructured's by-page and by-similarity chunking strategies. -* Lack of security and SOC2 and HIPAA compliance. -* No authentication or identity management. +* No support for generating embeddings in the core open source offering. (However, there is limited availability for this in the open source's + [Unstructured Ingest CLI](/open-source/ingestion/ingest-cli) and [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) offerings). +* No support for Unstructured's enrichment types such as image descriptions, table descriptions, and named entity recognition (NER). +* Lack of support for SOC2 Type 2, HIPAA, and GDPR compliance. +* No authentication or identity management in the core open source offering for local document processing. * No incremental data loading. * No ETL job scheduling or monitoring. * No image extraction from documents. * Less sophisticated document hierarchy detection. * You must manage many of your own code dependencies, for instance for libraries such as Poppler and Tesseract. -* You must manage your own infrastructure, including parallelization and other performance optimizations. +* For local document processing, you must manage your own infrastructure, including parallelization and other performance optimizations. ## Pricing