๐Ÿ“ข [v0.19.0] Inference Endpoints and robustness!

#2
by Wauplin HF staff - opened

EDIT: Release v0.19.0is now available on PyPI!

๐Ÿš€ Inference Endpoints API

Inference Endpoints provides a secure solution to easily deploy models hosted on the Hub in a production-ready infrastructure managed by Huggingface. With huggingface_hub>=0.19.0 integration, you can now manage your Inference Endpoints programmatically. Combined with the InferenceClient, this becomes the go-to solution to deploy models and run jobs in production, either sequentially or in batch!

Here is an example how to get an inference endpoint, wake it up, wait for initialization, run jobs in batch and pause back the endpoint. All of this in a few lines of code! For more details, please check out our dedicated guide.

>>> import asyncio
>>> from huggingface_hub import get_inference_endpoint

# Get endpoint + wait until initialized
>>> endpoint = get_inference_endpoint("batch-endpoint").resume().wait()

# Run inference
>>> async_client = endpoint.async_client
>>> results = asyncio.gather(*[async_client.text_generation(...) for job in jobs])

# Pause endpoint
>>> endpoint.pause()

โฌ Improved download experience

huggingface_hub is a library primarily used to transfer (huge!) files with the Huggingface Hub. Our goal is to keep improving the experience for this core part of the library. In this release, we introduce a more robust download mechanism for slow/limited connection while improving the UX for users with a high bandwidth available!

More robust downloads

Getting a connection error in the middle of a download is frustrating. That's why we've implemented a retry mechanism that automatically reconnects if a connection get closed or a ReadTimeout error is raised. The download restart exactly where it stopped without having to redownload any bytes.

  • Retry on ConnectionError/ReadTimeout when streaming file from server by @Wauplin in #1766
  • Reset nb_retries if data has been received from the server by @Wauplin in #1784

In addition to this, it is possible to configure huggingface_hub with higher timeouts thanks to @Shahafgo. This should help getting around some issues on slower connections.

  • Adding the ability to configure the timeout of get request by @Shahafgo in #1720
  • Fix a bug to respect the HF_HUB_ETAG_TIMEOUT. by @Shahafgo in #1728

Progress bars while using hf_transfer

hf_transfer is a Rust-based library focused on improving upload and download speed on machines with a high bandwidth available. Once installed (pip install -U hf_transfer), it can transparently be used with huggingface_hub simply by setting HF_HUB_ENABLE_HF_TRANSFER=1 as environment variable. The counterpart of higher performances is the lack of some user-friendly features such as better error handling or a retry mechanism -meaning it is recommended only to power-users-. In this release we still ship a new feature to improve UX: progress bars. No need to update any existing code, a simple library upgrade is enough.

๐Ÿ“š Documentation

huggingface-cli guide

huggingface-cli is the CLI tool shipped with huggingface_hub. It recently got some nice improvement, especially with commands to download and upload files directly from the terminal. All of this needed a guide, so here it is!

Environment variables

Environment variables are useful to configure how huggingface_hub should work. Historically we had some inconsistencies on how those variables were named. This is now improved, with a backward compatible approach. Please check the package reference for more details. The goal is to propagate those changes to the whole HF-ecosystem, making configuration easier for everyone.

  • Harmonize environment variables by @Wauplin in #1786
  • Ensure backward compatibility for HUGGING_FACE_HUB_TOKEN env variable by @Wauplin in #1795
  • Do not promote HF_ENDPOINT environment variable by @Wauplin in #1799

Hindi translation

Hindi documentation landed on the Hub thanks to @aneeshd27 ! Checkout the Hindi version of the quickstart guide here.

  • Added translation of 3 files as mentioned in issue by @aneeshd27 in #1772

Minor docs fixes

๐Ÿ’” Breaking changes

Legacy ModelSearchArguments and DatasetSearchArguments have been completely removed from huggingface_hub. This shouldn't cause problem as they were already not in use (and unusable in practice).

  • Removed GeneralTags, ModelTags and DatasetTags by @VictorHugoPilled in #1761

Classes containing details about a repo (ModelInfo, DatasetInfo and SpaceInfo) have been refactored by @mariosasko to be more Pythonic and aligned with the other classes in huggingface_hub. In particular those objects are now based the dataclass module instead of a custom ReprMixin class. Every change is meant to be backward compatible, meaning no breaking changes is expected. However, if you detect any inconsistency, please let us know and we will fix it asap.

The legacy Repository and InferenceAPI classes are now deprecated but will not be removed before the next major release (v1.0).
Instead of the git-based Repository, we advice to use the http-based HfApi. Check out this guide explaining the reasons behind it. For InferenceAPI, we recommend to switch to InferenceClient which is much more feature-complete and will keep getting improved.

โš™๏ธ Miscellaneous improvements, fixes and maintenance

InferenceClient

HfFileSystem

  • [hffs] Raise NotImplementedError on transaction commits by @Wauplin in #1736
  • Fix huggingface filesystem repo_type not forwarded by @Wauplin in #1791
  • Fix HfFileSystemFile when init fails + improve error message by @Wauplin in #1805

FIPS compliance

  • Set usedforsecurity=False in hashlib methods (FIPS compliance) by @Wauplin in #1782

Misc fixes

Internal

  • Bump version to 0.19 by @Wauplin in #1723
  • Make @retry_endpoint a default for all test by @Wauplin in #1725
  • Retry test on 502 Bad Gateway by @Wauplin in #1737
  • Consolidated mypy type ignores in InferenceClient.post by @jamesbraza in #1742
  • fix: remove useless token by @rtrompier in #1765
  • Fix CI (typing-extensions minimal requirement by @Wauplin in #1781
  • remove black formatter to use only ruff by @Wauplin in #1783
  • Separate test and prod cache (+ ruff formatter) by @Wauplin in #1789
  • fix 3.8 tensorflow in ci by @Wauplin (direct commit on main)

๐Ÿค— Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @VictorHugoPilled
    • Removed GeneralTags, ModelTags and DatasetTags (#1761)
  • @aneeshd27
    • Added translation of 3 files as mentioned in issue (#1772)

Looks good to me! Nice work :)

Super cool! The inference endpoints API examples are so nice (literally was about to start handcrafting something for this and now I don't need to!)

Sign up or log in to comment