File size: 7,357 Bytes
e64dca4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
{
"cells": [
{
"cell_type": "markdown",
"id": "43a342b3",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"# Structured Outputs with Hugging Face Inference Providers\n",
"\n",
"This notebook demonstrates how to use structured outputs with both OpenAI-compatible and Hugging Face native clients using Hugging Face Inference Providers.\n",
"\n",
"## Overview\n",
"- **OpenAI-Compatible**: Use familiar OpenAI structured outputs with HF Inference Providers\n",
"- **Hugging Face Native**: Use HF's native InferenceClient with JSON schema validation\n",
"- **Shared Models**: Reusable Pydantic models and schemas across both approaches\n",
"- **Guaranteed Structure**: Ensure responses match your defined schemas\n",
"\n",
"## Installation\n",
"\n",
"First, install the required dependencies:\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "7071d771",
"metadata": {},
"outputs": [],
"source": [
"# %pip install openai huggingface-hub pydantic python-dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7323b5fb",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"from typing import Dict, Any, List, Optional\n",
"from openai import OpenAI\n",
"from huggingface_hub import InferenceClient\n",
"from pydantic import BaseModel, Field\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load environment variables\n",
"load_dotenv()\n",
"\n",
"# Create a shared configuration\n",
"HF_TOKEN = os.getenv(\"HF_TOKEN\")"
]
},
{
"cell_type": "markdown",
"id": "abbe98f5",
"metadata": {},
"source": [
"# Structured Outputs Task\n",
"\n",
"Let's setup a structured output task like analysing a research paper and returning a structured output."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "2c1799a9",
"metadata": {},
"outputs": [],
"source": [
"# Shared Pydantic Models and Sample Data\n",
"\n",
"# Define structured output models\n",
"class PaperAnalysis(BaseModel):\n",
" \"\"\"Analysis of a research paper.\"\"\"\n",
"\n",
" title: str = Field(description=\"The title of the paper\")\n",
" abstract_summary: str = Field(description=\"A concise summary of the abstract\")\n",
" main_contributions: List[str] = Field(description=\"Key contributions of the paper\")\n",
" methodology: str = Field(description=\"Brief description of the methodology used\")\n",
"\n",
"\n",
"# Sample data for testing\n",
"SAMPLE_PAPER = \"\"\"Title: Attention Is All You Need\n",
"\n",
"Abstract: The dominant sequence transduction models are based on complex recurrent \n",
"or convolutional neural networks that include an encoder and a decoder. The best \n",
"performing models also connect the encoder and decoder through an attention mechanism. \n",
"We propose a new simple network architecture, the Transformer, based solely on \n",
"attention mechanisms, dispensing with recurrence and convolutions entirely. \n",
"Experiments on two machine translation tasks show these models to be superior \n",
"in quality while being more parallelizable and requiring significantly less time to train.\n",
"\n",
"Introduction: Recurrent neural networks, long short-term memory and gated recurrent \n",
"neural networks in particular, have been firmly established as state of the art approaches \n",
"in sequence modeling and transduction problems such as language modeling and machine translation.\n",
"The Transformer architecture introduces multi-head attention mechanisms that allow the model\n",
"to jointly attend to information from different representation subspaces.\"\"\"\n"
]
},
{
"cell_type": "markdown",
"id": "d4cd793c",
"metadata": {},
"source": [
"# Demo!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b82ca76b",
"metadata": {},
"outputs": [],
"source": [
"# Unified Structured Output Handler\n",
"system_prompt = \"Analyze the research paper and extract structured information about its title, abstract, contributions, and methodology.\"\n",
"\n",
"client = OpenAI(\n",
" api_key=HF_TOKEN,\n",
" base_url=\"https://router.huggingface.co/novita/v3/openai\",\n",
")\n",
"\n",
"\n",
"def get_structured_output(content: str) -> Any:\n",
" \"\"\"Get structured output using OpenAI-compatible client.\"\"\"\n",
"\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": content},\n",
" ]\n",
"\n",
" # Use OpenAI's structured output parsing\n",
" completion = client.beta.chat.completions.parse(\n",
" model=\"moonshotai/kimi-k2-instruct\",\n",
" messages=messages,\n",
" response_format=PaperAnalysis,\n",
" )\n",
"\n",
" return completion.choices[0].message.parsed\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "8519e939",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"π Title: Attention Is All You Need\n",
"π Summary: Proposes the Transformer architecture, a sequence-to-sequence model that replaces all recurrence and convolution with attention mechanisms. Demonstrates state-of-the-art results on machine-translation benchmarks while being more parallelizable and faster to train.\n",
"π― Contributions: ['Introduces the Transformer architecture, the first transduction model built entirely on attention, eliminating recurrence and convolution.', 'Presents multi-head self-attention to jointly attend to information from different representation subspaces.', 'Shows that attention-only models outperform RNN/CNN baselines in translation quality while offering better parallelization and shorter training times.']\n",
"π¬ Methodology: Designs an encoder-decoder architecture composed solely of stacked self-attention and feed-forward layers. Uses multi-head scaled dot-product attention, positional encodings, and residual connections. Evaluates on WMT 2014 English-to-German and English-to-French translation tasks, comparing against previous RNN/CNN-based systems.\n"
]
}
],
"source": [
"paper_analysis = get_structured_output(\n",
" content=SAMPLE_PAPER,\n",
")\n",
"\n",
"print(f\"π Title: {paper_analysis.title}\")\n",
"print(f\"π Summary: {paper_analysis.abstract_summary}\")\n",
"print(f\"π― Contributions: {paper_analysis.main_contributions}\")\n",
"print(f\"π¬ Methodology: {paper_analysis.methodology}\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|