Skip to content

Structured outputs #463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: next
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 185 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ OpenAIClient client = OpenAIOkHttpClient.builder()

The SDK provides conveniences for streamed chat completions. A
[`ChatCompletionAccumulator`](openai-java-core/src/main/kotlin/com/openai/helpers/ChatCompletionAccumulator.kt)
can record the stream of chat completion chunks in the response as they are processed and accumulate
can record the stream of chat completion chunks in the response as they are processed and accumulate
a [`ChatCompletion`](openai-java-core/src/main/kotlin/com/openai/models/chat/completions/ChatCompletion.kt)
object similar to that which would have been returned by the non-streaming API.

Expand Down Expand Up @@ -334,6 +334,188 @@ client.chat()
ChatCompletion chatCompletion = chatCompletionAccumulator.chatCompletion();
```

## Structured outputs with JSON schemas

Open AI [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs?api-mode=chat)
is a feature that ensures that the model will always generate responses that adhere to a supplied
[JSON schema](https://json-schema.org/overview/what-is-jsonschema).

A JSON schema can be defined by creating a
[`ResponseFormatJsonSchema`](openai-java-core/src/main/kotlin/com/openai/models/ResponseFormatJsonSchema.kt)
and setting it on the input parameters. However, for greater convenience, a JSON schema can instead
be derived automatically from the structure of an arbitrary Java class. The JSON content from the
response will then be converted automatically to an instance of that Java class. A full, working
example of the use of Structured Outputs with arbitrary Java classes can be seen in
[`StructuredOutputsClassExample`](openai-java-example/src/main/java/com/openai/example/StructuredOutputsClassExample.java).

Java classes can contain fields declared to be instances of other classes and can use collections:

```java
class Person {
public String name;
public int birthYear;
}

class Book {
public String title;
public Person author;
public int publicationYear;
}

class BookList {
public List<Book> books;
}
```

Pass the top-level class—`BookList` in this example—to `responseFormat(Class<T>)` when building the
parameters and then access an instance of `BookList` from the generated message content in the
response:

```java
import com.openai.models.ChatModel;
import com.openai.models.chat.completions.ChatCompletionCreateParams;
import com.openai.models.chat.completions.StructuredChatCompletionCreateParams;

StructuredChatCompletionCreateParams<BookList> params = ChatCompletionCreateParams.builder()
.addUserMessage("List some famous late twentieth century novels.")
.model(ChatModel.GPT_4_1)
.responseFormat(BookList.class)
.build();

client.chat().completions().create(params).choices().stream()
.flatMap(choice -> choice.message().content().stream())
.flatMap(bookList -> bookList.books.stream())
.forEach(book -> System.out.println(book.title + " by " + book.author.name));
```

You can start building the parameters with an instance of
[`ChatCompletionCreateParams.Builder`](openai-java-core/src/main/kotlin/com/openai/models/chat/completions/ChatCompletionCreateParams.kt)
or
[`StructuredChatCompletionCreateParams.Builder`](openai-java-core/src/main/kotlin/com/openai/models/chat/completions/StructuredChatCompletionCreateParams.kt).
If you start with the former (which allows for more compact code) the builder type will change to
the latter when `ChatCompletionCreateParams.Builder.responseFormat(Class<T>)` is called.

If a field in a class is optional and does not require a defined value, you can represent this using
the [`java.util.Optional`](https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html) class.
It is up to the AI model to decide whether to provide a value for that field or leave it empty.

```java
import java.util.Optional;

class Book {
public String title;
public Person author;
public int publicationYear;
public Optional<String> isbn;
}
```

Generic type information for fields is retained in the class's metadata, but _generic type erasure_
applies in other scopes. While, for example, a JSON schema defining an array of strings can be
derived from the `BoolList.books` field with type `List<String>`, a valid JSON schema cannot be
derived from a local variable of that same type, so the following will _not_ work:

```java
List<String> books = new ArrayList<>();

StructuredChatCompletionCreateParams<BookList> params = ChatCompletionCreateParams.builder()
.responseFormat(books.class)
// ...
.build();
```

If an error occurs while converting a JSON response to an instance of a Java class, the error
message will include the JSON response to assist in diagnosis. For instance, if the response is
truncated, the JSON data will be incomplete and cannot be converted to a class instance. If your
JSON response may contain sensitive information, avoid logging it directly, or ensure that you
redact any sensitive details from the error message.

### Local JSON schema validation

Structured Outputs supports a
[subset](https://platform.openai.com/docs/guides/structured-outputs#supported-schemas) of the JSON
Schema language. Schemas are generated automatically from classes to align with this subset.
However, due to the inherent structure of the classes, the generated schema may still violate
certain OpenAI schema restrictions, such as exceeding the maximum nesting depth or utilizing
unsupported data types.

To facilitate compliance, the method `responseFormat(Class<T>)` performs a validation check on the
schema derived from the specified class. This validation ensures that all restrictions are adhered
to. If any issues are detected, an exception will be thrown, providing a detailed message outlining
the reasons for the validation failure.

- **Local Validation**: The validation process occurs locally, meaning no requests are sent to the
remote AI model. If the schema passes local validation, it is likely to pass remote validation as
well.
- **Remote Validation**: The remote AI model will conduct its own validation upon receiving the JSON
schema in the request.
- **Version Compatibility**: There may be instances where local validation fails while remote
validation succeeds. This can occur if the SDK version is outdated compared to the restrictions
enforced by the remote AI model.
- **Disabling Local Validation**: If you encounter compatibility issues and wish to bypass local
validation, you can disable it by passing
[`JsonSchemaLocalValidation.NO`](openai-java-core/src/main/kotlin/com/openai/core/JsonSchemaLocalValidation.kt)
to the `responseFormat(Class<T>, JsonSchemaLocalValidation)` method when building the parameters.
(The default value for this parameter is `JsonSchemaLocalValidation.YES`.)

```java
import com.openai.core.JsonSchemaLocalValidation;
import com.openai.models.ChatModel;
import com.openai.models.chat.completions.ChatCompletionCreateParams;
import com.openai.models.chat.completions.StructuredChatCompletionCreateParams;

StructuredChatCompletionCreateParams<BookList> params = ChatCompletionCreateParams.builder()
.addUserMessage("List some famous late twentieth century novels.")
.model(ChatModel.GPT_4_1)
.responseFormat(BookList.class, JsonSchemaLocalValidation.NO)
.build();
```

By following these guidelines, you can ensure that your structured outputs conform to the necessary
schema requirements and minimize the risk of remote validation errors.

### Annotating classes and JSON schemas

You can use annotations to add further information to the JSON schema derived from your Java
classes, or to exclude individual fields from the schema. Details from annotations captured in the
JSON schema may be used by the AI model to improve its response. The SDK supports the use of
[Jackson Databind](https://github.com/FasterXML/jackson-databind) annotations.

```java
import com.fasterxml.jackson.annotation.JsonClassDescription;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonPropertyDescription;

class Person {
@JsonPropertyDescription("The first name and surname of the person")
public String name;
public int birthYear;
@JsonPropertyDescription("The year the person died, or 'present' if the person is living.")
public String deathYear;
}

@JsonClassDescription("The details of one published book")
class Book {
public String title;
public Person author;
@JsonPropertyDescription("The year in which the book was first published.")
public int publicationYear;
@JsonIgnore public String genre;
}

class BookList {
public List<Book> books;
}
```

- Use `@JsonClassDescription` to add a detailed description to a class.
- Use `@JsonPropertyDescription` to add a detailed description to a field of a class.
- Use `@JsonIgnore` to omit a field of a class from the generated JSON schema.

If you use `@JsonProperty(required = false)`, the `false` value will be ignored. OpenAI JSON schemas
must mark all properties as _required_, so the schema generated from your Java classes will respect
that restriction and ignore any annotation that would violate it.

## File uploads

The SDK defines methods that accept files.
Expand Down Expand Up @@ -607,7 +789,7 @@ If the SDK threw an exception, but you're _certain_ the version is compatible, t

## Microsoft Azure

To use this library with [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/overview), use the same
To use this library with [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/overview), use the same
OpenAI client builder but with the Azure-specific configuration.

```java
Expand All @@ -620,7 +802,7 @@ OpenAIClient client = OpenAIOkHttpClient.builder()
.build();
```

See the complete Azure OpenAI example in the [`openai-java-example`](openai-java-example/src/main/java/com/openai/example/AzureEntraIdExample.java) directory. The other examples in the directory also work with Azure as long as the client is configured to use it.
See the complete Azure OpenAI example in the [`openai-java-example`](openai-java-example/src/main/java/com/openai/example/AzureEntraIdExample.java) directory. The other examples in the directory also work with Azure as long as the client is configured to use it.

## Network options

Expand Down
2 changes: 2 additions & 0 deletions openai-java-core/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ dependencies {
implementation("com.fasterxml.jackson.module:jackson-module-kotlin:2.18.2")
implementation("org.apache.httpcomponents.core5:httpcore5:5.2.4")
implementation("org.apache.httpcomponents.client5:httpclient5:5.3.1")
implementation("com.github.victools:jsonschema-generator:4.38.0")
implementation("com.github.victools:jsonschema-module-jackson:4.38.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what version of jackson this depends on? Just wanna make sure it's compatible with Jackson 2.13.4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, it depends on Jackson 2.17.2 (see here).


testImplementation(kotlin("test"))
testImplementation(project(":openai-java-client-okhttp"))
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package com.openai.core

/**
* Options for local validation of JSON schemas derived from arbitrary classes before a request is
* executed.
*/
enum class JsonSchemaLocalValidation {
/**
* Validate the JSON schema locally before the request is executed. The remote AI model will
* also validate the JSON schema.
*/
YES,

/**
* Do not validate the JSON schema locally before the request is executed. The remote AI model
* will always validate the JSON schema.
*/
NO,
}
Loading