File size: 7,357 Bytes

e64dca4

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "43a342b3",
   "metadata": {
    "vscode": {
     "languageId": "raw"
    }
   },
   "source": [
    "# Structured Outputs with Hugging Face Inference Providers\n",
    "\n",
    "This notebook demonstrates how to use structured outputs with both OpenAI-compatible and Hugging Face native clients using Hugging Face Inference Providers.\n",
    "\n",
    "## Overview\n",
    "- **OpenAI-Compatible**: Use familiar OpenAI structured outputs with HF Inference Providers\n",
    "- **Hugging Face Native**: Use HF's native InferenceClient with JSON schema validation\n",
    "- **Shared Models**: Reusable Pydantic models and schemas across both approaches\n",
    "- **Guaranteed Structure**: Ensure responses match your defined schemas\n",
    "\n",
    "## Installation\n",
    "\n",
    "First, install the required dependencies:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7071d771",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install openai huggingface-hub pydantic python-dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7323b5fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "from typing import Dict, Any, List, Optional\n",
    "from openai import OpenAI\n",
    "from huggingface_hub import InferenceClient\n",
    "from pydantic import BaseModel, Field\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# Load environment variables\n",
    "load_dotenv()\n",
    "\n",
    "# Create a shared configuration\n",
    "HF_TOKEN = os.getenv(\"HF_TOKEN\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abbe98f5",
   "metadata": {},
   "source": [
    "# Structured Outputs Task\n",
    "\n",
    "Let's setup a structured output task like analysing a research paper and returning a structured output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2c1799a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Shared Pydantic Models and Sample Data\n",
    "\n",
    "# Define structured output models\n",
    "class PaperAnalysis(BaseModel):\n",
    "    \"\"\"Analysis of a research paper.\"\"\"\n",
    "\n",
    "    title: str = Field(description=\"The title of the paper\")\n",
    "    abstract_summary: str = Field(description=\"A concise summary of the abstract\")\n",
    "    main_contributions: List[str] = Field(description=\"Key contributions of the paper\")\n",
    "    methodology: str = Field(description=\"Brief description of the methodology used\")\n",
    "\n",
    "\n",
    "# Sample data for testing\n",
    "SAMPLE_PAPER = \"\"\"Title: Attention Is All You Need\n",
    "\n",
    "Abstract: The dominant sequence transduction models are based on complex recurrent \n",
    "or convolutional neural networks that include an encoder and a decoder. The best \n",
    "performing models also connect the encoder and decoder through an attention mechanism. \n",
    "We propose a new simple network architecture, the Transformer, based solely on \n",
    "attention mechanisms, dispensing with recurrence and convolutions entirely. \n",
    "Experiments on two machine translation tasks show these models to be superior \n",
    "in quality while being more parallelizable and requiring significantly less time to train.\n",
    "\n",
    "Introduction: Recurrent neural networks, long short-term memory and gated recurrent \n",
    "neural networks in particular, have been firmly established as state of the art approaches \n",
    "in sequence modeling and transduction problems such as language modeling and machine translation.\n",
    "The Transformer architecture introduces multi-head attention mechanisms that allow the model\n",
    "to jointly attend to information from different representation subspaces.\"\"\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4cd793c",
   "metadata": {},
   "source": [
    "# Demo!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b82ca76b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Unified Structured Output Handler\n",
    "system_prompt = \"Analyze the research paper and extract structured information about its title, abstract, contributions, and methodology.\"\n",
    "\n",
    "client = OpenAI(\n",
    "    api_key=HF_TOKEN,\n",
    "    base_url=\"https://router.huggingface.co/novita/v3/openai\",\n",
    ")\n",
    "\n",
    "\n",
    "def get_structured_output(content: str) -> Any:\n",
    "    \"\"\"Get structured output using OpenAI-compatible client.\"\"\"\n",
    "\n",
    "    messages = [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": content},\n",
    "    ]\n",
    "\n",
    "    # Use OpenAI's structured output parsing\n",
    "    completion = client.beta.chat.completions.parse(\n",
    "        model=\"moonshotai/kimi-k2-instruct\",\n",
    "        messages=messages,\n",
    "        response_format=PaperAnalysis,\n",
    "    )\n",
    "\n",
    "    return completion.choices[0].message.parsed\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "8519e939",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "📄 Title: Attention Is All You Need\n",
      "📝 Summary: Proposes the Transformer architecture, a sequence-to-sequence model that replaces all recurrence and convolution with attention mechanisms. Demonstrates state-of-the-art results on machine-translation benchmarks while being more parallelizable and faster to train.\n",
      "🎯 Contributions: ['Introduces the Transformer architecture, the first transduction model built entirely on attention, eliminating recurrence and convolution.', 'Presents multi-head self-attention to jointly attend to information from different representation subspaces.', 'Shows that attention-only models outperform RNN/CNN baselines in translation quality while offering better parallelization and shorter training times.']\n",
      "🔬 Methodology: Designs an encoder-decoder architecture composed solely of stacked self-attention and feed-forward layers. Uses multi-head scaled dot-product attention, positional encodings, and residual connections. Evaluates on WMT 2014 English-to-German and English-to-French translation tasks, comparing against previous RNN/CNN-based systems.\n"
     ]
    }
   ],
   "source": [
    "paper_analysis = get_structured_output(\n",
    "    content=SAMPLE_PAPER,\n",
    ")\n",
    "\n",
    "print(f\"📄 Title: {paper_analysis.title}\")\n",
    "print(f\"📝 Summary: {paper_analysis.abstract_summary}\")\n",
    "print(f\"🎯 Contributions: {paper_analysis.main_contributions}\")\n",
    "print(f\"🔬 Methodology: {paper_analysis.methodology}\")\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}