Skip to content

Open source: additional limits #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions open-source/introduction/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,20 @@ The Unstructured open source library has the following limits as compared to the

* Not designed for production scenarios.
* Significantly decreased performance on document and table extraction.
* Access only to older and less sophisticated vision transformer models.
* No access to Unstructured's latest vision language model (VLM) offerings.
* No access to Unstructured's fine-tuned OCR models.
* No access to Unstructured's by-page and by-similarity chunking strategies.
* Lack of security and SOC2 and HIPAA compliance.
* No authentication or identity management.
* No support for generating embeddings in the core open source offering. (However, there is limited availability for this in the open source's
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It could be helpful to mention the Unstructured GitHub repo when referring to the core open-source offering.
  • Also, we can define that the limited availability in the embedding service within Unstructured Ingest, such as the available model or BYOM (Bring Your Own Model)

[Unstructured Ingest CLI](/open-source/ingestion/ingest-cli) and [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) offerings).
* No support for Unstructured's enrichment types such as image descriptions, table descriptions, and named entity recognition (NER).
* Lack of support for SOC2 Type 2, HIPAA, and GDPR compliance.
* No authentication or identity management in the core open source offering for local document processing.
* No incremental data loading.
* No ETL job scheduling or monitoring.
* No image extraction from documents.
* Less sophisticated document hierarchy detection.
* You must manage many of your own code dependencies, for instance for libraries such as Poppler and Tesseract.
* You must manage your own infrastructure, including parallelization and other performance optimizations.
* For local document processing, you must manage your own infrastructure, including parallelization and other performance optimizations.

## Pricing

Expand Down