JSON structured output
  • 28 Feb 2024
  • 6 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

JSON structured output

  • Dark
    Light
  • PDF

Article summary

Structured Data Generation with JSON Schema

Enhance your applications with high-quality, structured data using our API's JSON Schema support. This feature is key for creating or parsing structured data, ensuring it meets specific format and validation rules.

Benefits of JSON Schema

Data Integrity and Validation

JSON Schema maintains high data quality by enforcing a predefined structure and validation rules, minimizing the need for additional checks.

How JSON Schema Works

Define your data model with JSON Schema, specifying properties, types, and validation rules. This ensures generated data strictly follows your specified structure. You can learn more about creating schemas in this step-by-step guide.

The Role of Stop Sequences

Stop sequences are critical in structured data tasks, marking the end of model responses to keep outputs within your schema's structure.

Mandatory Stop Sequence

Always use an appropriate stop sequence of a model (default for mistral is stop=["</s>"]) to ensure precise model output termination, aligning with your JSON Schema.

Using the extra_body Parameter

extra_body allows for advanced configurations, including JSON Schema, in your API requests to tailor output to specific data structures.

Leveraging extra_body

Use extra_body to include schema definitions and other parameters, enabling the API to produce outputs that match your exact data requirements.

Use Cases

The following use cases demonstrate practical applications of the Chat API in various scenarios, for creative generation or structured data extraction. These examples showcase how to tailor API requests to meet specific needs and automate processes efficiently.

Customer Data Generation

For applications that require generating customer profiles, define a schema to ensure each generated profile meets specific requirements.

Configuration Example

from openai import OpenAI
import json
import os

schema= """{
  "title": "Client Profile",
  "type": "object",
  "properties": {
    "name": {
      "title": "Name",
      "type": "string",
      "maxLength": 100  
    },
    "yearsOfActivity": {
      "title": "Years of Activity",
      "description": "Age of the company in years",
      "type": "integer",
      "minimum": 0  
    },
    "previousPurchases": {
      "title": "Previous Purchases",
      "type": "array",
      "items": {
        "$ref": "#/definitions/purchase"
      }
    },
    "areasOfInterest": {
      "title": "Areas of Interest",
      "type": "array",
      "items": {
        "$ref": "#/definitions/interest"
      }
    },
    "lastPurchase": {
      "title": "Last Purchase",
      "description": "Reference of last product sold to the client",
      "type": "string",
      "maxLength": 60
    }
  },
  "required": ["name", "yearsOfActivity", "previousPurchases", "areasOfInterest", "lastPurchase"],
  "definitions": {
    "purchase": {
      "title": "Purchase",
      "type": "object",
      "properties": {
        "type": {
          "title": "Purchase Type",
          "description": "Indicates if the purchase was a single item or multiple items",
          "enum": ["Single", "Multiple"],
          "type": "string"
        },
        "details": {
          "title": "Purchase Details",
          "description": "Details about the purchase",
          "type": "string"
        }
      },
      "required": ["type", "details"]
    },
    "interest": {
      "title": "Interest",
      "description": "Enumeration of the areas of interest",
      "type": "string",
      "enum": ["AI", "HR", "Finance", "Clothing", "Engineering", "CSR"]
    }
  }
}
"""

chat_completion = client.chat.completions.create(
    model="alfred-40b-1123",
    messages=[
        {"role": "user", "content": "Give me a detailed client profile for a B2B solution."},
    ],
    temperature=1,
    max_tokens=400,
    stream=False,
    stop=["</s>"],
    extra_body={"json_schema": schema},
)
#print the client profile
print(json.loads(chat_completion.choices[0].message.content))
Schema Accuracy

Ensure your JSON Schema correctly reflects the required data structure to avoid output errors or generic responses.

Here is the an example output from the previous example :

{'name': 'Tech-Enabled Manufacturing Co.',
'yearsOfActivity': 20, 
'previousPurchases': [
    {'type': 'Single', 'details': 'Bought 20 enterprise-grade 3D printers last year'},
    {'type': 'Multiple', 'details': 'Purchased maintenance and repair services for heavy machinery multiple times over the past 5 years'}],
'areasOfInterest': ['AI', 'Clothing'],
'lastPurchase': 'Maintenance and repair services for heavy machinery, 6 month'}

Extracting Information from Business Inquiries

Automate the extraction of structured data from unstructured texts like emails or web forms, streamlining data processing.

Configuration Example for Data Extraction

import json

#Make sur that you initialized the client before

# Define the JSON Schema for extracting inquiry details
schema = """{
    "title": "Inquiry Details",
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        },
        "email": {
            "type": "string"
        },
        "inquiry_type": {
            "type": "string"
        },
        "summary": {
            "type": "string",
            "pattern": "The summary of the conversation is: (.+)"
        }
    },
    "required": ["name", "email", "inquiry_type", "summary"]
}"""

# Example text from which to extract data
text_to_extract_from = """
Here is an email :

Subject: Inquiry about Custom Solutions for E-Commerce Platforms

Content of the e-mail:

``
Dear Paradigm Support Team,

I hope this message finds you well. My name is John Doe, and I am the CTO of XYZ Retail, an online retail company specializing in bespoke home decor items. We have been exploring advanced AI solutions to enhance our customer experience and streamline our operations, and we came across your innovative product offerings.

We are particularly interested in understanding how your AI solutions can be integrated with our existing e-commerce platform to provide personalized product recommendations and optimize our supply chain management.

Could you provide us with more detailed information about the following:

1. The integration process with existing e-commerce platforms, specifically Shopify and Magento.
2. Case studies or examples of similar implementations in the retail sector.
3. The scalability of your solutions to accommodate our rapid growth.
4. The level of customer support and technical assistance provided during and post-integration.

Additionally, we would like to inquire about the possibility of a custom solution tailored to our specific needs, including predictive analytics for inventory management and AI-driven insights for market trend analysis.

I look forward to your detailed response and hope to explore a potential collaboration.

Best regards,

John Doe
CTO, XYZ Retail
Email: johndoe@example.com
Phone: +1234567890
``

Review the following message and extract the key details including the sender's name, email address, type of inquiry, and the main message content. Format the extracted information according to the predefined JSON schema, ensuring that each piece of information is accurately captured and categorized.

Now give me the structured json:
"""

# Extract data using the defined schema
response = client.chat.completions.create(
    model="alfred-40b-1123",
    messages=[
        {"role": "user", "content": text_to_extract_from},
    ],
    temperature=0.7,
    max_tokens=1000,
    stream=False,
    stop=["</s>"],
    extra_body={"json_schema": schema},
)

# Print the extracted structured data
print(json.loads(response.choices[0].message.content))
Structured Data Extraction

This approach uses the model's natural language understanding to automate structured data extraction, enhancing efficiency and accuracy.

Here is an example of the results that you can get from the previous code snippet :

{
   "name":"John Doe",
   "email":"johndoe@example.com",
   "inquiry_type":"Request for information on custom AI solutions for e-commerce platforms",
   "summary":"The summary of the conversation is: John Doe, CTO of XYZ Retail, is interested in understanding how Paradigm's AI solutions can be integrated with their existing e-commerce platform (Shopify and Magento) to provide personalized product recommendations and optimize supply chain management. John has also inquired about the scalability of the solutions, level of customer support, and the possibility of a custom solution tailored to their specific needs, including predictive analytics for inventory management and AI-driven insights for market trend analysis."
}
Harnessing Pattern Matching

Use the pattern attribute in JSON Schema to define regular expressions (regex) for matching specific text formats. The (.+) regex is especially useful for capturing varied text segments, enabling precise extraction of desired information from texts. This feature is key for parsing specific data points from unstructured or semi-structured text.

Utilizing Extracted Data

Leverage the structured data from your extractions to enhance databases, CRM systems, or automate workflows, boosting operational efficiency.

Accurate Pattern Matching

Verify that your JSON Schema's regex patterns align with your text's expected formats. Mismatches can lead to data extraction errors or inaccuracies.

By employing JSON Schema, you can efficiently transform unstructured text into structured, actionable data, offering a scalable solution for data processing needs.

Conclusion

JSON Schema facilitates structured data generation, enriching user experiences with detailed content and ensuring application-wide data consistency. It's crucial for applications reliant on structured data integrity.

Common Pitfalls
  • Excluding necessary schema properties might result in partial data outputs.
  • Imposing too strict constraints can restrict the AI's ability to produce relevant content.
  • Neglecting to include stop sequences in API calls may cause processing issues.

Was this article helpful?