diff --git a/api-reference/partition/examples.mdx b/api-reference/partition/examples.mdx
index 11fddfbd..ade4e3c1 100644
--- a/api-reference/partition/examples.mdx
+++ b/api-reference/partition/examples.mdx
@@ -20,7 +20,7 @@ Here's how you can modify partition strategy for a PDF file, and select an alter
```bash POST
- curl -X 'POST' $UNSTRUCTURED_API_URL \
+ curl -X 'POST' "$UNSTRUCTURED_API_URL/v1/partition_async" \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
@@ -29,6 +29,9 @@ Here's how you can modify partition strategy for a PDF file, and select an alter
-F 'vlm_model_provider=openai' \
-F 'vlm_model=gpt-4o'
```
+
+ To get the results of this request, you must make a follow-up request with the job ID that is returned in the response. See the `/v1/partition_async/` example
+ in [Process an individual file by making a direct POST request](/api-reference/partition/post-requests).
@@ -191,7 +194,7 @@ For better OCR results, you can specify what languages your document is in using
```bash POST
- curl -X 'POST' $UNSTRUCTURED_API_URL \
+ curl -X 'POST' "$UNSTRUCTURED_API_URL/v1/partition_async" \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
@@ -200,6 +203,9 @@ For better OCR results, you can specify what languages your document is in using
-F 'vlm_model_provider=openai' \
-F 'vlm_model=gpt-4o' \-F 'languages=kor'
```
+
+ To get the results of this request, you must make a follow-up request with the job ID that is returned in the response. See the `/v1/partition_async/` example
+ in [Process an individual file by making a direct POST request](/api-reference/partition/post-requests).
@@ -359,7 +365,7 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t
```bash POST
- curl -X 'POST' $UNSTRUCTURED_API_URL \
+ curl -X 'POST' "$UNSTRUCTURED_API_URL/v1/partition_async" \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
@@ -367,6 +373,9 @@ Set the `coordinates` parameter to `true` to add this field to the elements in t
-F 'coordinates=true' \
-F 'strategy=hi_res'
```
+
+ To get the results of this request, you must make a follow-up request with the job ID that is returned in the response. See the `/v1/partition_async/` example
+ in [Process an individual file by making a direct POST request](/api-reference/partition/post-requests).
@@ -530,7 +539,7 @@ This can be helpful if you'd like to use the IDs as a primary key in a database,
```bash POST
- curl -X 'POST' $UNSTRUCTURED_API_URL \
+ curl -X 'POST' "$UNSTRUCTURED_API_URL/v1/partition_async" \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
@@ -540,6 +549,9 @@ This can be helpful if you'd like to use the IDs as a primary key in a database,
-F 'vlm_model_provider=openai' \
-F 'vlm_model=gpt-4o'
```
+
+ To get the results of this request, you must make a follow-up request with the job ID that is returned in the response. See the `/v1/partition_async/` example
+ in [Process an individual file by making a direct POST request](/api-reference/partition/post-requests).
@@ -703,7 +715,7 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform
```bash POST
- curl -X 'POST' $UNSTRUCTURED_API_URL \
+ curl -X 'POST' "$UNSTRUCTURED_API_URL/v1/partition_async" \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H 'unstructured-api-key: $UNSTRUCTURED_API_KEY' \
@@ -714,6 +726,9 @@ By default, the `chunking_strategy` is set to `None`, and no chunking is perform
-F 'vlm_model_provider=openai' \
-F 'vlm_model=gpt-4o'
```
+
+ To get the results of this request, you must make a follow-up request with the job ID that is returned in the response. See the `/v1/partition_async/` example
+ in [Process an individual file by making a direct POST request](/api-reference/partition/post-requests).
diff --git a/api-reference/partition/overview.mdx b/api-reference/partition/overview.mdx
index 78d09a9d..e7984cf4 100644
--- a/api-reference/partition/overview.mdx
+++ b/api-reference/partition/overview.mdx
@@ -5,7 +5,7 @@ title: Overview
The Unstructured Partition Endpoint, part of the [Unstructured API](/api-reference/overview), is intended for rapid prototyping of Unstructured's
various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file
at a time. Use the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) for production-level scenarios, file processing in
-batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and
+large batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and
highest-performing models, and for the highest quality results at the lowest cost.
## Get started
@@ -52,16 +52,16 @@ import SharedPagesBilling from '/snippets/general-shared-text/pages-billing.mdx'
## Quickstart
-This example uses the [curl](https://curl.se/) utility on your local machine to call the Unstructured Partition Endpoint. It sends a source (input) file from your local machine to the Unstructured Partition Endpoint which then delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.
+This example uses the [curl](https://curl.se/) utility on your local machine to call the Unstructured Partition Endpoint. It sends one or more source (input) files from your local machine to the Unstructured Partition Endpoint which then delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.
-If you do not have a source file readily available, you could use for example a sample PDF file containing the text of the United States Constitution,
+If you do not have source files readily available, you could use for example a sample PDF file containing the text of the United States Constitution,
available for download from [https://constitutioncenter.org/media/files/constitution.pdf](https://constitutioncenter.org/media/files/constitution.pdf).
From your terminal or Command Prompt, set the following two environment variables.
- - Replace `` with the Unstructured Partition Endpoint URL, which is `https://api.unstructuredapp.io/general/v0/general`
+ - Replace `` with the Unstructured Partition Endpoint base URL, which is `https://api.unstructuredapp.io`
- Replace `` with your Unstructured API key, which you generated earlier on this page.
```bash
@@ -69,28 +69,90 @@ available for download from [https://constitutioncenter.org/media/files/constitu
export UNSTRUCTURED_API_KEY=""
```
-
- Run the following `curl` command, replacing `` with the path to the source file on your local machine.
+
+ Run the following `curl` command, replacing `` with the path to the source file on your local machine. To specify
+ multiple files, repeat the `--form 'files=@;type=application/pdf'` option in this command for each additional file.
- If the source file is not a PDF file, then remove `;type=application/pdf` from the final `--form` option in this command.
+ If the source file is not a PDF file, then remove `;type=application/pdf` from the related `--form` option in this command.
```bash
curl --request 'POST' \
- "$UNSTRUCTURED_API_URL" \
+ "$UNSTRUCTURED_API_URL/v1/partition_async" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-Type: multipart/form-data' \
- --form 'content_type=string' \
--form 'strategy=vlm' \
--form 'vlm_model_provider=openai' \
--form 'vlm_model=gpt-4o' \
--form 'output_format=application/json' \
- --form 'files=@;type=application/pdf'
+ --form 'files=@;type=application/pdf' \
+ --form 'files=@;type=application/pdf'
```
+
+ The results are printed to your terminal or Command Prompt with a format similar to the following:
+
+ ```json
+ {
+ "partition_id": "",
+ "partition_status": "scheduled",
+ "partition_status_message": "Partition job created"
+ }
+ ```
+
+ Make a note of the `` value, as you will need it in the next step.
+
+
+ Run the following `curl` command, replacing `` with the `` value from the previous step.
+
+ ```bash
+ curl --request 'GET' \
+ "$UNSTRUCTURED_API_URL/v1/partition_async/" \
+ --header 'accept: application/json' \
+ --header "unstructured-api-key: $UNSTRUCTURED_API_KEY"
+ ```
+
+ The results are printed to your terminal or Command Prompt with a format similar to the following:
+
+ ```json
+ {
+ "partition_id": "",
+ "partition_status": "in_progress",
+ "partition_status_message": "Started processing partition request",
+ "elements": null
+ }
+ ```
+
+ If the job is still in progress, repeat the `curl` command until the job is complete.
- After you run the `curl` command, the results are printed to your terminal or Command Prompt. The command might take several
- minutes to complete.
+ If you run the preceding command and the job has successfully completed, the results that are printed to your terminal or Command Prompt will contain the processed data within the
+ `elements` array, for example:
+
+ ```json
+ {
+ "partition_id": "",
+ "partition_status": "in_progress",
+ "partition_status_message": "Started processing partition request",
+ "elements": [
+ {
+ "type": "...",
+ "element_id": "...",
+ "text": "...",
+ "metadata": {
+ "...": "..."
+ }
+ },
+ {
+ "type": "...",
+ "element_id": "...",
+ "text": "...",
+ "metadata": {
+ "...": "..."
+ }
+ }
+ ]
+ }
+ ```
By default, the JSON is printed without indenting or other whitespace. You can pretty-print the JSON output by using utilities such as [jq](https://jqlang.org/tutorial/) in future command runs.
diff --git a/api-reference/partition/post-requests.mdx b/api-reference/partition/post-requests.mdx
index 385ce2fb..c5b025db 100644
--- a/api-reference/partition/post-requests.mdx
+++ b/api-reference/partition/post-requests.mdx
@@ -3,21 +3,7 @@ title: Process an individual file by making a direct POST request
sidebarTitle: POST request
---
-Watch the following 4-minute video to learn how to make POST requests to the Unstructured Partition Endpoint to process individual files:
-
-
-
-Open the related [notebook](https://colab.research.google.com/drive/1rJOZYZfsTQ_JV2hXaY4kgYvbA7xEWBZn?usp=sharing) that is shown in the preceding video.
-
-To make POST requests to the Unstructured Partition Endpoint, you will need:
+To make POST requests to the [Unstructured Partition Endpoint](/api-reference/partition/overview), you will need:
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
@@ -29,7 +15,7 @@ import GetStartedSimpleAPIOnly from '/snippets/general-shared-text/get-started-s
-The API URL is `https://api.unstructuredapp.io/general/v0/general`
+The API base URL is `https://api.unstructuredapp.io`
Let's start with a simple example in which you use [curl](https://curl.se/) to send a local PDF file (`*.pdf`) to partition via the Unstructured Partition Endpoint.
@@ -37,11 +23,10 @@ In this command, be sure to replace `` with the path to your local
```bash
curl --request 'POST' \
-"$UNSTRUCTURED_API_URL" \
+"$UNSTRUCTURED_API_URL/v1/partition_async" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-Type: multipart/form-data' \
---form 'content_type=string' \
--form 'strategy=vlm' \
--form 'vlm_model_provider=openai' \
--form 'vlm_model=gpt-4o' \
@@ -49,9 +34,105 @@ curl --request 'POST' \
--form 'files=@;type=application/pdf'
```
-In the example above we're representing the API endpoint with the environment variable `UNSTRUCTURED_API_URL`. Note, however, that you also need to authenticate yourself with
+In the example above we're representing the base API endpoint with the environment variable `UNSTRUCTURED_API_URL`. Note, however, that you also need to authenticate yourself with
your individual API Key, represented by the environment variable `UNSTRUCTURED_API_KEY`. Learn how to obtain an API URL and API key in the [Unstructured Partition Endpoint guide](/api-reference/partition/overview).
+The results are printed to your terminal or Command Prompt with a format similar to the following:
+
+```json
+{
+ "partition_id": "",
+ "partition_status": "scheduled",
+ "partition_status_message": "Partition job created"
+}
+```
+
+Make a note of the `` value.
+
+Next, run the following `curl` command, replacing `` with the `` value:
+
+```bash
+curl --request 'GET' \
+"$UNSTRUCTURED_API_URL/v1/partition_async/" \
+--header 'accept: application/json' \
+--header "unstructured-api-key: $UNSTRUCTURED_API_KEY"
+```
+
+The results are printed to your terminal or Command Prompt with a format similar to the following:
+
+```json
+{
+ "partition_id": "",
+ "partition_status": "in_progress",
+ "partition_status_message": "Started processing partition request",
+ "elements": null
+}
+```
+
+If the job is still in progress, repeat the `curl` command until the job is complete. However, if you run the `curl` command
+and the job has successfully completed, the results that are printed to your terminal or Command Prompt will contain the processed data within the
+`elements` array, for example:
+
+```json
+{
+ "partition_id": "",
+ "partition_status": "completed",
+ "partition_status_message": "Partition job completed",
+ "elements": [
+ {
+ "type": "...",
+ "element_id": "...",
+ "text": "...",
+ "metadata": {
+ "...": "..."
+ }
+ },
+ {
+ "type": "...",
+ "element_id": "...",
+ "text": "...",
+ "metadata": {
+ "...": "..."
+ }
+ }
+ ]
+}
+```
+
+By default, the JSON is printed without indenting or other whitespace. You can pretty-print the JSON output by using utilities such as [jq](https://jqlang.org/tutorial/) in future command runs.
+
+You can also pipe the JSON output to a local file by using the `curl` option [-o, --output \](https://curl.se/docs/manpage.html#-o) in future command runs.
+
+## Synchronous requests
+
+The preceding section describes how to make _asynchronous_ requests to the Unstructured Partition Endpoint.
+The Unstructured Partition Endpoint also supports _synchronous_ requests.
+
+To make a synchronous request, replace `/v1/partition_async` with `/general/v0/general` as follows:
+
+```bash
+curl --request 'POST' \
+"$UNSTRUCTURED_API_URL/general/v0/general" \
+--header 'accept: application/json' \
+--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
+--header 'content-Type: multipart/form-data' \
+--form 'strategy=fast' \
+--form 'output_format=application/json' \
+--form 'files=@;type=text/plain'
+```
+
+If the request is successfully completed, the processed data is returned directly to your terminal or Command Prompt, without needing to make any
+additional requests. However, you must wait for the request to complete and return the results.
+
+While supported, synchronous requests are not recommended for the following reasons:
+
+- A synchronous request must wait until the operation is complete, which might result in a timeout error.
+- A waiting synchronous request is a poor user experience, as it can give the user the mistaken impression that nothing is happening.
+- While waiting, you cannot poll to check the current status of a synchronous request.
+- If the related network connection is lost during a synchronous request, the request might fail.
+
+For these reasons, if you must make synchronous requests, they should be limited to processing only relatively small, text-based files, partitioning such files with only the `fast` partitioning strategy.
+
## Parameters & examples
The API parameters are the same across all methods of accessing the Unstructured Partition Endpoint.
@@ -74,16 +155,40 @@ Unstructured offers a [Postman collection](https://learning.postman.com/docs/col
4. In the **Paste cURL, Raw text or URL** box, enter the following URL, and then press `Enter`:
```
- https://raw.githubusercontent.com/Unstructured-IO/docs/main/examplecode/codesamples/api/Unstructured-POST.postman_collection.json
+ https://raw.githubusercontent.com/Unstructured-IO/docs/main/examplecode/codesamples/api/Unstructured-REST-API-Partition-Endpoint.postman_collection.json
```
5. On the sidebar, click **Collections**.
-6. Expand **Unstructured POST**.
-7. Click **(Partition Endpoint) Basic Request**.
-8. On the **Headers** tab, next to `unstructured-api-key`, enter your Unstructured API key in the **Value** column.
-9. On the **Body** tab, next to `files`, click the **Select files** box in the **Value** column.
-10. Click **New file from local machine**.
-11. Browse to and select the file that you want Unstructured to process.
-12. Click **Send**. Processing could take several minutes.
-
-To download the processed data to your local machine, in the response area, click the ellipses, and then click **Save response to file**.
\ No newline at end of file
+6. Expand **Unstructured REST**.
+7. Expand **Partition Endpoint**.
+
+To create an asynchronous request, do the following:
+
+1. Click **Asynchronous Job Request**.
+2. On the **Headers** tab, next to `unstructured-api-key`, enter your Unstructured API key in the **Value** column.
+3. On the **Body** tab, next to `files`, click the **Select files** box in the **Value** column.
+4. Click **New file from local machine**.
+5. Browse to and select the file that you want Unstructured to process.
+6. Repeat steps 3-5 for any additional files that you want to process.
+7. On the **Body** tab, add, remove, or modify any allowed [parameters](/api-reference/partition/api-parameters) that you want to use.
+8. Click **Send**. Note the value of the `partition_id` field in the response, as you will need it for the next procedure.
+
+To check the status of an asynchronous request, do the following:
+
+1. Click **Asynchronous Job Status**.
+2. In the URL, replace `` with the `partition_id` value that you noted in the previous procedure.
+3. On the **Headers** tab, next to `unstructured-api-key`, enter your Unstructured API key in the **Value** column.
+4. Click **Send**.
+5. Keep clicking **Send** periodically until the job is complete.
+6. To download the processed data to your local machine, in the response area, click the ellipses, and then click **Save response to file**.
+
+To create a synchronous request, do the following:
+
+1. Click **Synchronous Request**.
+2. On the **Headers** tab, next to `unstructured-api-key`, enter your Unstructured API key in the **Value** column.
+3. On the **Body** tab, next to `files`, click the **Select files** box in the **Value** column.
+4. Click **New file from local machine**.
+5. Browse to and select the file that you want Unstructured to process.
+6. On the **Body** tab, add, remove, or modify any allowed [parameters](/api-reference/partition/api-parameters) that you want to use.
+7. Click **Send**. Processing could take several minutes.
+8. To download the processed data to your local machine, in the response area, click the ellipses, and then click **Save response to file**.
\ No newline at end of file
diff --git a/examplecode/codesamples/api/Unstructured-POST.postman_collection.json b/examplecode/codesamples/api/Unstructured-POST.postman_collection.json
deleted file mode 100644
index 06cdd379..00000000
--- a/examplecode/codesamples/api/Unstructured-POST.postman_collection.json
+++ /dev/null
@@ -1,78 +0,0 @@
-{
- "info": {
- "_postman_id": "9cfd731e-2818-465b-b0ec-875586dbf4f0",
- "name": "Unstructured POST",
- "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
- "_exporter_id": "38317384"
- },
- "item": [
- {
- "name": "(Partition Endpoint) Basic Request",
- "request": {
- "method": "POST",
- "header": [
- {
- "key": "Accept",
- "value": "application/json",
- "type": "text"
- },
- {
- "key": "Content",
- "value": "multipart/form-data",
- "type": "text"
- },
- {
- "key": "unstructured-api-key",
- "value": "",
- "type": "text"
- }
- ],
- "body": {
- "mode": "formdata",
- "formdata": [
- {
- "key": "content_type",
- "value": "string",
- "type": "text"
- },
- {
- "key": "strategy",
- "value": "vlm",
- "type": "text"
- },
- {
- "key": "vlm_model_provider",
- "value": "openai",
- "type": "text"
- },
- {
- "key": "vlm_model",
- "value": "gpt-4o",
- "type": "text"
- },
- {
- "key": "files",
- "type": "file",
- "src": []
- }
- ]
- },
- "url": {
- "raw": "https://api.unstructuredapp.io/general/v0/general",
- "protocol": "https",
- "host": [
- "api",
- "unstructuredapp",
- "io"
- ],
- "path": [
- "general",
- "v0",
- "general"
- ]
- }
- },
- "response": []
- }
- ]
-}
\ No newline at end of file
diff --git a/examplecode/codesamples/api/Unstructured-REST-API-Partition-Endpoint.postman_collection.json b/examplecode/codesamples/api/Unstructured-REST-API-Partition-Endpoint.postman_collection.json
new file mode 100644
index 00000000..6f3da8c0
--- /dev/null
+++ b/examplecode/codesamples/api/Unstructured-REST-API-Partition-Endpoint.postman_collection.json
@@ -0,0 +1,170 @@
+{
+ "info": {
+ "_postman_id": "0631ab13-c148-4427-a09c-ce581477aaca",
+ "name": "Unstructured REST API - Partition Endpoint",
+ "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
+ "_exporter_id": "38317384"
+ },
+ "item": [
+ {
+ "name": "Asynchronous Job Request",
+ "request": {
+ "method": "POST",
+ "header": [
+ {
+ "key": "Accept",
+ "value": "application/json",
+ "type": "text"
+ },
+ {
+ "key": "Content-Type",
+ "value": "multipart/form-data",
+ "description": "\n",
+ "type": "text"
+ },
+ {
+ "key": "unstructured-api-key",
+ "value": "",
+ "type": "text"
+ }
+ ],
+ "body": {
+ "mode": "formdata",
+ "formdata": [
+ {
+ "key": "strategy",
+ "value": "vlm",
+ "type": "text"
+ },
+ {
+ "key": "vlm_model_provider",
+ "value": "openai",
+ "type": "text"
+ },
+ {
+ "key": "vlm_model",
+ "value": "gpt-4o",
+ "type": "text"
+ },
+ {
+ "key": "files",
+ "type": "file",
+ "src": []
+ }
+ ]
+ },
+ "url": {
+ "raw": "https://api.unstructuredapp.io/v1/partition_async/",
+ "protocol": "https",
+ "host": [
+ "api",
+ "unstructuredapp",
+ "io"
+ ],
+ "path": [
+ "v1",
+ "partition_async",
+ ""
+ ]
+ }
+ },
+ "response": []
+ },
+ {
+ "name": "Asynchronous Job Status",
+ "request": {
+ "method": "GET",
+ "header": [
+ {
+ "key": "Accept",
+ "value": "application/json",
+ "type": "text"
+ },
+ {
+ "key": "unstructured-api-key",
+ "value": "",
+ "type": "text"
+ }
+ ],
+ "url": {
+ "raw": "https://api.unstructuredapp.io/v1/partition_async/",
+ "protocol": "https",
+ "host": [
+ "api",
+ "unstructuredapp",
+ "io"
+ ],
+ "path": [
+ "v1",
+ "partition_async",
+ ""
+ ]
+ }
+ },
+ "response": []
+ },
+ {
+ "name": "Synchronous Request",
+ "request": {
+ "method": "POST",
+ "header": [
+ {
+ "key": "Accept",
+ "value": "application/json",
+ "type": "text"
+ },
+ {
+ "key": "Content",
+ "value": "multipart/form-data",
+ "type": "text"
+ },
+ {
+ "key": "unstructured-api-key",
+ "value": "",
+ "type": "text"
+ }
+ ],
+ "body": {
+ "mode": "formdata",
+ "formdata": [
+ {
+ "key": "strategy",
+ "value": "vlm",
+ "type": "text"
+ },
+ {
+ "key": "vlm_model_provider",
+ "value": "openai",
+ "type": "text"
+ },
+ {
+ "key": "vlm_model",
+ "value": "gpt-4o",
+ "type": "text"
+ },
+ {
+ "key": "files",
+ "type": "file",
+ "src": []
+ }
+ ]
+ },
+ "url": {
+ "raw": "https://api.unstructuredapp.io/general/v0/general",
+ "protocol": "https",
+ "host": [
+ "api",
+ "unstructuredapp",
+ "io"
+ ],
+ "path": [
+ "general",
+ "v0",
+ "general"
+ ]
+ }
+ },
+ "response": []
+ }
+ ]
+}
\ No newline at end of file