Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
WauplinΒ 
posted an update Jul 11
Post
3346
πŸš€ I'm excited to announce that huggingface_hub's InferenceClient now supports OpenAI's Python client syntax! For developers integrating AI into their codebases, this means you can switch to open-source models with just three lines of code. Here's a quick example of how easy it is.

Why use the InferenceClient?
πŸ”„ Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub.
πŸ€— Direct integration: easily launch a model to run inference using our Inference Endpoint service.
πŸš€ Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.

More details in https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#openai-compatibility

Such good news thanks! With this we can now create AI pipelines with much greater simplicity to make models interchangeable service parts. I think for cutting edge techniques like MoE gating networks, Self Reward and Comparison across models, Memory across AI pipelines, etc this becomes the differentiator to make it all much easier. I hope that by operating key models like GPT-4o, Claude 3.5 Sonnet, Gemma, Llama, and other front runners in this open pattern unlocks better more powerful AI coding patterns.

What's the benefit? @Wauplin

Β·

Mostly that it's better integrated with HF services. If you pass a model_id you can use the serverless Inference API without setting an base_url. No need either to pass an api_key if you are already logged in (with $HF_TOKEN environment variable or huggingface-cli login). If you are an Inference Endpoint user (i.e. deploying a model using https://ui.endpoints.huggingface.co/), you get a seamless integration to make requests to it with URL already configured. Finally, you are assured that the client will stay up to date with latest updates in TGI/Inference API/Inference Endpoints.