Pydantic v2.10 - processing the output of an LLM stream response

Dec 11, 2024

Introduction

Pydantic v2.10.0 introduces experimental support for "partial validation". Partial validation is especially useful when working with the output of large language model (LLM) streams, where responses are generated in chunks.

This approach allows you to validate incomplete JSON strings or Python objects as they are received. By starting the validation process early, you can process and display partial data to users while the full response is still being generated.

This is a key feature for handling real-time data in applications, enabling smoother user experiences even with incomplete or ongoing inputs.

Another important use case for partial validation, especially in the context of large language model (LLM) outputs, is dealing with incomplete or incorrectly structured data that may still be valid in certain scenarios.

Here's the idea: when working with LLMs, especially when processing large or complex responses, the model might sometimes produce incomplete, malformed, or inconsistent outputs due to the streaming nature of the data. However, rather than rejecting such outputs outright, partial validation allows you to still check for overall validity in real-time. This is particularly useful when:

Handling Incomplete Data: If the model is still generating a response, partial validation can confirm whether the data structure is valid up to the point it has been received. This is valuable for applications where partial data can still be meaningful (e.g., displaying a summary or showing what the model has processed so far).
Detecting Errors Early: Even if the response is incomplete, you can catch errors early on, such as missing fields or invalid types. This allows you to adjust your application or user interface based on the data that's available, or to handle errors gracefully without waiting for the entire stream.
Correcting Malformed Data: Sometimes, the LLM's output might be slightly malformed (e.g., missing commas, incorrect nesting) but still largely valid in structure. Partial validation can catch these issues while the response is still in progress, allowing for quicker recovery or corrections without waiting for the complete data.

Cohere Chat Stream and Partial Validation with Pydantic

This code demonstrates how to use Cohere’s API to stream responses from a language model (LLM) and process them using Pydantic for data validation. The stream is processed in real-time, and partial validation is used to validate incomplete or progressively received data while it's still being generated.

import cohere
from pydantic import BaseModel, TypeAdapter
from rich import print
import os 

co = cohere.Client(
  api_key=os.environ['COHERE_API_KEY'],
)
# Define a simple car model
class Car(BaseModel):
    make: Optional[str] = None
    model: Optional[str] = None
    year: Optional[int] = None

# Create a TypeAdapter for the car model
ta = TypeAdapter(list[Car])

# Set up the chat stream with a simple prompt for generating car objects
stream = co.chat_stream( 
    model='command-r-08-2024',
    message='Generate a list of 10 car objects with the following attributes: make, model, and year. Start with [{',
    temperature=0.8,
    chat_history=[
        {"role": "User", "message": "Create a simple list of car objects"},
        {"role": "Chatbot", "message": "[{\"make\":\"Toyota\",\"model\":\"Camry\",\"year\":2022}"}],
    prompt_truncation='AUTO',
    connectors=[],
    preamble="Respond with a raw JSON object containing a list of car objects with the following attributes: make, model, and year.",
) 

# Collect and print the streamed output
text = ""

for event in stream:
    if event.event_type == "text-generation":
        text += event.text
        print(ta.validate_json(text, experimental_allow_partial="trailing-strings"))

Defining a Pydantic Model (Car)
1. Pydantic Model: The Car class defines a simple data model using Pydantic’s BaseModel. This model is used to describe the structure of the data we're expecting to receive from the LLM.
2. The Car model has three attributes: make, model, and year, each with a specified type (str for make and model, int for year).
3. Pydantic’s BaseModel ensures that any object of type Car will automatically enforce type checking, data validation, and provide helpful error messages if the data is malformed.
Creating a TypeAdapter for Car List
1. TypeAdapter: Pydantic’s TypeAdapter is used to define a custom validation adapter for lists of Car objects. It allows us to validate that the incoming data consists of a list of valid Car objects.
2. The TypeAdapter will ensure that the structure of each item in the list matches the Car model as the data is received.
Processing the Streamed Output
1. Streaming Loop: This loop listens for streamed responses from the model.
  - event.event_type == "text-generation": Checks if the event contains text generated by the model.
  - text += event.text: Appends the streamed text to the text variable, gradually building up the response as the model generates more data.
2. Partial Validation:
  - validate_json: This method is used to validate the incoming JSON data. Pydantic’s TypeAdapter ensures that the received data conforms to the Car model’s structure.
  - experimental_allow_partial="trailing-strings": This experimental feature allows validation even if the JSON is incomplete or contains "trailing strings" (extra characters at the end). This is useful in a streaming context where the response may not be fully formed yet.
3. Rich Print for Debugging
  1. Rich: The rich module is imported to use its enhanced printing features. It allows for better formatting and color-coded output in the terminal, which can be useful for debugging and displaying streamed data.

By validating each partial chunk of text as it arrives, this approach ensures that any incomplete or slightly malformed data can still be processed in real-time, allowing for a smoother user experience, especially in applications where immediate feedback or interaction is crucial.

Example output in console

Conclusion

We explored the concept of partial validation introduced in Pydantic v2.10.0 and its application in real-time data processing, particularly when working with large language model (LLM) streams. The ability to validate incomplete, malformed, or progressively received data as it arrives is a powerful feature, especially in contexts where user interaction with the model’s output is needed immediately, even before the entire response is fully generated.

By using Cohere's API in conjunction with Pydantic's TypeAdapter, we demonstrated how to stream data from a language model and perform validation on the fly. This ensures that the data structure remains consistent and meaningful, even if the model produces output in chunks. The integration of partial validation allows for error detection and correction in real-time, making it possible to provide users with valuable insights, summaries, or responses even with incomplete data.

Partial validation makes it possible to work effectively with streaming data from LLMs, ensuring smoother user experiences and more reliable real-time applications. Whether it's handling incomplete inputs, detecting errors early, or correcting malformed data, this technique significantly enhances the overall functionality and responsiveness of applications.

Jakub’s Substack

Discussion about this post

Ready for more?