The Chat service enables multi-turn conversations with AI models using different message roles. It provides support for both synchronous and streaming responses, allowing for interactive dialogues with the model.
// ChatService handles chat completion requests for multi-turn conversations
// with support for different message roles (system, user, assistant).
type ChatService interface {
// Create generates a chat completion for the provided request.
// This method returns a complete, non-streaming response.
Create(ctx context.Context, req *ChatCompletionRequest) (*ChatCompletionResponse, error)
// CreateStream generates a streaming chat completion where tokens are returned
// incrementally as they're generated.
// The returned ChatCompletionStream must be closed when no longer needed to release resources.
CreateStream(ctx context.Context, req *ChatCompletionRequest) (ChatCompletionStream, error)
}
The Chat service uses different message roles to structure conversations:
type ChatMessageRole string
const (
// ChatMessageRoleUser represents a message from the user
ChatMessageRoleUser ChatMessageRole = "user"
// ChatMessageRoleAssistant represents a message from the assistant
ChatMessageRoleAssistant ChatMessageRole = "assistant"
// ChatMessageRoleSystem represents a system message, typically used for instructions
ChatMessageRoleSystem ChatMessageRole = "system"
// ChatMessageRoleTool represents a tool message, typically used for function calls
ChatMessageRoleTool ChatMessageRole = "tool"
)
Each role serves a specific purpose:
- System: Sets behavior instructions, context, or constraints for the assistant
- User: Represents user queries or inputs
- Assistant: Contains responses from the assistant
- Tool: Contains outputs from tool or function calls
The ChatCompletionRequest
struct defines the parameters for a chat request:
type ChatCompletionRequest struct {
Messages []ChatMessage `json:"messages"` // Conversation history
MaxTokens int `json:"max_tokens,omitempty"` // Maximum tokens to generate
Temperature float64 `json:"temperature,omitempty"` // Controls randomness
TopP float64 `json:"top_p,omitempty"` // Alternative to temperature
TopK int `json:"top_k,omitempty"` // Limit to K most likely tokens
Stream bool `json:"stream,omitempty"` // Enable streaming responses
Stop []string `json:"stop,omitempty"` // Sequences where generation should stop
Model string `json:"model,omitempty"` // Model to use (if not default)
JSONSchema interface{} `json:"json_schema,omitempty"` // JSON schema for structured output
}
type ChatMessage struct {
Role ChatMessageRole `json:"role"` // Role of the message sender
Content interface{} `json:"content"` // String or array of ChatMessageContent
}
Parameter | Type | Description | Default |
---|---|---|---|
Messages | []ChatMessage | Array of messages representing the conversation | (required) |
MaxTokens | int | Maximum number of tokens to generate | (model dependent) |
Temperature | float64 | Controls randomness (higher = more random) | 1.0 |
TopP | float64 | Nucleus sampling parameter | 1.0 |
TopK | int | Only sample from top K most likely tokens | 0 (disabled) |
Stream | bool | Enable streaming response (token-by-token) | false |
Stop | []string | Stop sequences to end generation when encountered | [] |
Model | string | Model ID to use (if multiple available) | (currently loaded model) |
JSONSchema | interface{} | Schema for structured JSON output | nil |
The ChatMessage content can be either a string or an array of content parts with different types:
// ChatMessageContent represents a part of a message content
type ChatMessageContent struct {
Type string `json:"type"` // Content type ("text" or "image_url")
Text string `json:"text,omitempty"` // Text content
ImageURL *ChatImageURL `json:"image_url,omitempty"` // Image URL content
}
// ChatImageURL represents an image URL in a chat message
type ChatImageURL struct {
URL string `json:"url"` // The URL of the image
}
This allows for creating multimodal messages that contain both text and images.
The response to a chat completion request contains the model's reply:
type ChatCompletionResponse struct {
ID string `json:"id"` // Unique identifier for the completion
Object string `json:"object"` // Type of object (always "chat.completion")
Created int64 `json:"created"` // Unix timestamp of creation
Model string `json:"model"` // Model used for completion
Choices []ChatCompletionRespChoice `json:"choices"` // Generated chat choices
Usage *UsageStats `json:"usage,omitempty"` // Token usage statistics
}
type ChatCompletionRespChoice struct {
Index int `json:"index"` // Choice index
Message ChatMessage `json:"message"` // The assistant's response message
FinishReason string `json:"finish_reason,omitempty"` // Why generation ended
}
type UsageStats struct {
PromptTokens int `json:"prompt_tokens"` // Tokens in the prompt
CompletionTokens int `json:"completion_tokens"` // Tokens in the completion
TotalTokens int `json:"total_tokens"` // Total tokens used
}
For streaming chat completions, use CreateStream
which returns delta updates incrementally:
type ChatCompletionStream = Stream[*ChatCompletionStreamResponse]
type ChatCompletionStreamResponse struct {
ID string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
Model string `json:"model"`
Choices []ChatCompletionStreamChoice `json:"choices"`
}
type ChatCompletionStreamChoice struct {
Index int `json:"index"` // Choice index
Delta *Delta `json:"delta"` // Partial content update
FinishReason string `json:"finish_reason,omitempty"` // Present only in the final chunk
}
type Delta struct {
Role ChatMessageRole `json:"role,omitempty"` // Present only in the first chunk
Content string `json:"content,omitempty"` // Partial content
}
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/pixelsquared/go-tabbyapi/tabby"
)
func main() {
client := tabby.NewClient(
tabby.WithBaseURL("http://localhost:8080"),
tabby.WithAPIKey("your-api-key"),
)
defer client.Close()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
req := &tabby.ChatCompletionRequest{
Messages: []tabby.ChatMessage{
{
Role: tabby.ChatMessageRoleSystem,
Content: "You are a helpful assistant specialized in programming and technology.",
},
{
Role: tabby.ChatMessageRoleUser,
Content: "What are the benefits of using dependency injection in software design?",
},
},
MaxTokens: 150,
Temperature: 0.7,
}
resp, err := client.Chat().Create(ctx, req)
if err != nil {
log.Fatalf("Error: %v", err)
}
if len(resp.Choices) > 0 {
fmt.Println("Assistant's Response:")
fmt.Println(resp.Choices[0].Message.Content)
}
if resp.Usage != nil {
fmt.Printf("\nToken Usage:\n")
fmt.Printf(" Prompt tokens: %d\n", resp.Usage.PromptTokens)
fmt.Printf(" Completion tokens: %d\n", resp.Usage.CompletionTokens)
fmt.Printf(" Total tokens: %d\n", resp.Usage.TotalTokens)
}
}
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/pixelsquared/go-tabbyapi/tabby"
)
func main() {
client := tabby.NewClient(
tabby.WithBaseURL("http://localhost:8080"),
tabby.WithAPIKey("your-api-key"),
)
defer client.Close()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Initial conversation
messages := []tabby.ChatMessage{
{
Role: tabby.ChatMessageRoleSystem,
Content: "You are a helpful assistant specialized in programming and technology.",
},
{
Role: tabby.ChatMessageRoleUser,
Content: "What is the difference between REST and GraphQL?",
},
}
req := &tabby.ChatCompletionRequest{
Messages: messages,
MaxTokens: 150,
Temperature: 0.7,
}
// First response
resp, err := client.Chat().Create(ctx, req)
if err != nil {
log.Fatalf("Error: %v", err)
}
if len(resp.Choices) > 0 {
fmt.Println("Initial Response:")
fmt.Println(resp.Choices[0].Message.Content)
// Add the assistant's response to the conversation
messages = append(messages, tabby.ChatMessage{
Role: tabby.ChatMessageRoleAssistant,
Content: resp.Choices[0].Message.Content,
})
// Add a follow-up question
messages = append(messages, tabby.ChatMessage{
Role: tabby.ChatMessageRoleUser,
Content: "Which one is better for mobile applications and why?",
})
// Create a new request with the updated conversation
req.Messages = messages
// Get the follow-up response
resp, err = client.Chat().Create(ctx, req)
if err != nil {
log.Fatalf("Error: %v", err)
}
if len(resp.Choices) > 0 {
fmt.Println("\nFollow-up Response:")
fmt.Println(resp.Choices[0].Message.Content)
}
}
}
package main
import (
"context"
"fmt"
"io"
"log"
"strings"
"time"
"github.com/pixelsquared/go-tabbyapi/tabby"
)
func main() {
client := tabby.NewClient(
tabby.WithBaseURL("http://localhost:8080"),
tabby.WithAPIKey("your-api-key"),
)
defer client.Close()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
req := &tabby.ChatCompletionRequest{
Messages: []tabby.ChatMessage{
{
Role: tabby.ChatMessageRoleSystem,
Content: "You are a helpful assistant specialized in programming and technology.",
},
{
Role: tabby.ChatMessageRoleUser,
Content: "Explain the concept of containers in Docker and how they differ from virtual machines.",
},
},
MaxTokens: 300,
Temperature: 0.7,
Stream: true,
}
stream, err := client.Chat().CreateStream(ctx, req)
if err != nil {
log.Fatalf("Error creating stream: %v", err)
}
defer stream.Close()
fmt.Println("Streaming response:")
var fullResponse strings.Builder
for {
resp, err := stream.Recv()
if err == io.EOF {
break // Stream completed
}
if err != nil {
log.Fatalf("Error receiving from stream: %v", err)
}
// Process each chunk
if len(resp.Choices) > 0 {
chunk := resp.Choices[0].Delta.Content
fmt.Print(chunk)
fullResponse.WriteString(chunk)
}
}
fmt.Println("\n\nFull response received and assembled.")
}
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/pixelsquared/go-tabbyapi/tabby"
)
func main() {
client := tabby.NewClient(
tabby.WithBaseURL("http://localhost:8080"),
tabby.WithAPIKey("your-api-key"),
)
defer client.Close()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Create a multimodal message with text and image
content := []tabby.ChatMessageContent{
{
Type: "text",
Text: "What's in this image? Describe it in detail.",
},
{
Type: "image_url",
ImageURL: &tabby.ChatImageURL{
URL: "https://example.com/image.jpg",
},
},
}
req := &tabby.ChatCompletionRequest{
Messages: []tabby.ChatMessage{
{
Role: tabby.ChatMessageRoleSystem,
Content: "You are a helpful assistant that can analyze images.",
},
{
Role: tabby.ChatMessageRoleUser,
Content: content,
},
},
MaxTokens: 200,
Temperature: 0.7,
}
resp, err := client.Chat().Create(ctx, req)
if err != nil {
log.Fatalf("Error: %v", err)
}
if len(resp.Choices) > 0 {
fmt.Println("Assistant's Response:")
fmt.Println(resp.Choices[0].Message.Content)
}
}
The Chat service can return the same types of errors as other services:
resp, err := client.Chat().Create(ctx, req)
if err != nil {
var apiErr *tabby.APIError
if errors.As(err, &apiErr) {
fmt.Printf("API Error (code %s): %s\n", apiErr.Code(), apiErr.Error())
switch apiErr.Code() {
case "invalid_request":
fmt.Println("Check your request parameters")
case "authentication_error":
fmt.Println("Check your API key")
case "permission_error":
fmt.Println("Your API key doesn't have sufficient permissions")
}
}
log.Fatalf("Error: %v", err)
}
-
Conversation Context: Keep the conversation history for multi-turn chats, but be mindful of token limits.
-
System Instructions: Use system messages to set behavior, constraints, or provide context.
-
Streaming for Long Responses: Use streaming for longer responses to improve user experience.
-
Resource Management: Always close streams when done:
stream, err := client.Chat().CreateStream(ctx, req) if err != nil { return err } defer stream.Close()
-
Temperature Settings:
- Lower temperature (0.1-0.5) for factual, consistent responses
- Higher temperature (0.7-1.0) for more creative, varied responses
-
Message Role Usage:
- System: Set overall behavior and constraints (typically one message at the start)
- User: Provide inputs, questions, or requests
- Assistant: Include previous responses in the conversation history
- Tool: Include outputs from tool or function calls when relevant
-
Token Management:
- Be aware of model context window limits
- Consider truncating or summarizing very long conversation histories