Getting LLMs to return reliable, parseable data is critical for production applications. Learn how structured outputs and JSON mode work.
LLMs naturally produce free-form text. For applications, you need structured data you can parse reliably.
# ❌ Unreliable — model might format differently each time
response = "The user's name is John, age 30, from Mumbai"
# ✅ Reliable — always parseable
response = {"name": "John", "age": 30, "city": "Mumbai"}Without structured output, you're writing fragile string parsing code that breaks when the model changes its phrasing.
JSON Mode instructs the model to always return valid JSON. It guarantees the output is parseable — but not that it matches a specific schema.
from openai import OpenAI
import json
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"}, # enable JSON mode
messages=[
{"role": "system", "content": "Return responses as JSON"},
{"role": "user", "content": "Extract: John is 30 years old from Mumbai"}
]
)
data = json.loads(response.choices[0].message.content)
# {"name": "John", "age": 30, "city": "Mumbai"}Limitation: JSON mode guarantees valid JSON but not a specific structure. The model decides the keys.
Structured outputs let you define an exact schema using JSON Schema or Pydantic. The model is constrained to match it exactly.
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class UserInfo(BaseModel):
name: str
age: int
city: str
is_premium: bool
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "user", "content": "Extract: John is 30, from Mumbai, premium user"}
],
response_format=UserInfo,
)
user = response.choices[0].message.parsed
print(user.name) # "John"
class InvoiceData(BaseModel):
vendor: str
amount: float
currency: str
date: str
line_items: list[str]
# Extract structured data from unstructured invoice textclass SentimentResult(BaseModel):
sentiment: Literal["positive", "negative", "neutral"]
confidence: float
reasoning: strclass SearchQuery(BaseModel):
query: str
filters: list[str]
max_results: int
# LLM converts natural language to structured search params
# Then pass to your search APIEven with structured outputs, always validate:
from pydantic import ValidationError
try:
result = response.choices[0].message.parsed
if result is None:
# Model refused or couldn't parse
handle_refusal()
except ValidationError as e:
# Schema validation failed
log_error(e)
use_fallback()| JSON Mode | Structured Outputs | |
|---|---|---|
| Valid JSON | ✅ Always | ✅ Always |
| Exact schema | ❌ No | ✅ Yes |
| Type safety | ❌ No | ✅ Yes |
| Model support | Broad | GPT-4o, some others |
| Use when | Simple JSON needed | Exact schema required |
None responses — the model may refuse to fill a schema