š Just released version 0.24.0 of the ššššššššššš_ššš Python library!
Exciting updates include: ā” InferenceClient is now a drop-in replacement for OpenAI's chat completion!
āØ Support for response_format, adapter_id , truncate, and more in InferenceClient
š¾ Serialization module with a save_torch_model helper that handles shared layers, sharding, naming convention, and safe serialization. Basically a condensed version of logic scattered across safetensors, transformers , accelerate
š Optimized HfFileSystem to avoid getting rate limited when browsing HuggingFaceFW/fineweb
šØ HfApi & CLI improvements: prevent empty commits, create repo inside resource group, webhooks API, more options in the Search API, etc.
š I'm excited to announce that huggingface_hub's InferenceClient now supports OpenAI's Python client syntax! For developers integrating AI into their codebases, this means you can switch to open-source models with just three lines of code. Here's a quick example of how easy it is.
Why use the InferenceClient? š Seamless transition: keep your existing code structure while leveraging LLMs hosted on the Hugging Face Hub. š¤ Direct integration: easily launch a model to run inference using our Inference Endpoint service. š Stay Updated: always be in sync with the latest Text-Generation-Inference (TGI) updates.