Skip to content

Unstructured.IO: ETL for LLMs

Welcome to Unstructured.IO! We're here on a mission to make all of your documents available for LLM applications, from PDFs and Word Docs to emails and markdown. To get started, check out our open source offerings.

Tried the open source library and ready for more power? Check out our products page to learn more about our paid API and Unstructured Platform, and ETL tool built around our core file transformation capabilities.

Learn more

Section Description
Company Website Unstructured.io product and company info
Documentation Full unstructured documentation

Popular repositories Loading

  1. unstructured unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    HTML 11.1k 920

  2. unstructured-api unstructured-api Public

    Python 720 161

  3. unstructured-inference unstructured-inference Public

    Python 179 60

  4. pipeline-sec-filings pipeline-sec-filings Public archive

    Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

    Jupyter Notebook 144 32

  5. unstructured-python-client unstructured-python-client Public

    A Python client for the Unstructured Platform API

    Python 99 17

  6. unstructured-ingest unstructured-ingest Public

    HTML 85 39

Repositories

Showing 10 of 37 repositories
  • Unstructured-IO/unstructured-ingest’s past year of commit activity
    HTML 85 Apache-2.0 39 53 25 Updated May 5, 2025
  • unstructured-js-client Public

    A JavaScript/Typescript client for the Unstructured Platform API

    Unstructured-IO/unstructured-js-client’s past year of commit activity
    TypeScript 51 MIT 16 7 3 Updated May 5, 2025
  • unstructured-python-client Public

    A Python client for the Unstructured Platform API

    Unstructured-IO/unstructured-python-client’s past year of commit activity
    Python 99 MIT 17 12 5 Updated May 5, 2025
  • Unstructured-IO/unstructured-api’s past year of commit activity
    Python 720 Apache-2.0 161 34 8 Updated May 1, 2025
  • docs Public

    Documentation for all Unstructured products and libraries

    Unstructured-IO/docs’s past year of commit activity
    MDX 6 22 0 5 Updated Apr 30, 2025
  • unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Unstructured-IO/unstructured’s past year of commit activity
    HTML 11,061 Apache-2.0 920 158 (3 issues need help) 50 Updated Apr 29, 2025
  • UNS-MCP Public
    Unstructured-IO/UNS-MCP’s past year of commit activity
    Jupyter Notebook 25 9 0 2 Updated Apr 29, 2025
  • Unstructured-IO/unstructured-platform-plugins’s past year of commit activity
    Python 5 Apache-2.0 1 0 2 Updated Apr 18, 2025
  • Unstructured-IO/unstructured-inference’s past year of commit activity
    Python 179 Apache-2.0 60 21 12 Updated Apr 15, 2025
  • base-images Public

    Store Dockerfiles and Packer configs for images to use as a base to build upon

    Unstructured-IO/base-images’s past year of commit activity
    Shell 4 Apache-2.0 2 1 2 Updated Mar 25, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…