Filesystem API

The HfFileSystem class provides a pythonic file interface to the Hugging Face Hub based on fsspec.

HfFileSystem

HfFileSystem is based on fsspec, so it is compatible with most of the APIs that it offers. For more details, check out our guide and fsspec’s API Reference.

class huggingface_hub.HfFileSystem

< source >

( *args **kwargs )

Parameters

token (str or bool, optional) — A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see https://huggingface.co/docs/huggingface_hub/quick-start#authentication). To disable authentication, pass False.

Access a remote Hugging Face Hub repository as if were a local file system.

Usage:

>>> from huggingface_hub import HfFileSystem

>>> fs = HfFileSystem()

>>> # List files
>>> fs.glob("my-username/my-model/*.bin")
['my-username/my-model/pytorch_model.bin']
>>> fs.ls("datasets/my-username/my-dataset", detail=False)
['datasets/my-username/my-dataset/.gitattributes', 'datasets/my-username/my-dataset/README.md', 'datasets/my-username/my-dataset/data.json']

>>> # Read/write files
>>> with fs.open("my-username/my-model/pytorch_model.bin") as f:
...     data = f.read()
>>> with fs.open("my-username/my-model/pytorch_model.bin", "wb") as f:
...     f.write(data)

init

< source >

( *args endpoint: Optional = None token: Union = None **storage_options )

Parameters

use_listings_cache, listings_expiry_time, max_paths — passed to DirCache, if the implementation supports directory listing caching. Pass use_listings_cache=False to disable such caching. skip_instance_cache — bool If this is a cachable implementation, pass True here to force creating a new instance even if a matching instance exists, and prevent storing this instance. asynchronous — bool loop — asyncio-compatible IOLoop or None

Docstring taken from fsspec documentation.

Create and configure file-system instance

Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.

A reasonable default should be provided if there are no arguments.

Subclasses should call this method.

exists

< source >

( path **kwargs )

Docstring taken from fsspec documentation.

Is there a file at the given path

find

< source >

( path: str maxdepth: Optional = None withdirs: bool = False detail: bool = False refresh: bool = False revision: Optional = None **kwargs )

Parameters

path — str maxdepth — int or None If not None, the maximum number of levels to descend withdirs — bool Whether to include directory paths in the output. This is True when used by glob, but users usually only want files.
kwargs are passed to ls. —

Docstring taken from fsspec documentation.

List all files below path.

Like posix find command without conditions

get_file

< source >

( rpath lpath callback = <fsspec.callbacks.NoOpCallback object at 0x7f251c7928f0> outfile = None **kwargs )

Docstring taken from fsspec documentation.

Copy single remote file to local

glob

< source >

( path **kwargs )

Docstring taken from fsspec documentation.

Find files by glob-matching.

If the path ends with ’/’, only folders are returned.

We support "**", "?" and "[..]". We do not support ^ for pattern negation.

The maxdepth option is applied on the first **** found in the path.

kwargs are passed to ls.

info

< source >

( path: str refresh: bool = False revision: Optional = None **kwargs ) → dict with keys

Returns

dict with keys

name (full path in the FS), size (in bytes), type (file, directory, or something else) and other FS-specific keys.

Docstring taken from fsspec documentation.

Give details of entry at path

Returns a single dictionary, with exactly the same information as ls would with detail=True.

The default implementation calls ls and could be overridden by a shortcut. kwargs are passed on to “ls().

Some file systems might not be able to measure the file’s size, in which case, the returned dict will include `‘size’: None“.

invalidate_cache

< source >

( path: Optional = None )

Docstring taken from fsspec documentation.

Discard any cached directory information

isdir

< source >

( path )

Docstring taken from fsspec documentation.

Is this entry directory-like?

isfile

< source >

( path )

Docstring taken from fsspec documentation.

Is this entry file-like?

ls

< source >

( path: str detail: bool = True refresh: bool = False revision: Optional = None **kwargs )

Docstring taken from fsspec documentation.

List objects at path.

This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested.

The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:

full path to the entry (without protocol)
size of the entry, in bytes. If the value cannot be determined, will be None.
type of entry, “file”, “directory” or other

Additional information may be present, appropriate to the file-system, e.g., generation, checksum, etc.

May use refresh=True|False to allow use of self._ls_from_cache to check for a saved listing and avoid calling the backend. This would be common where listing may be expensive.

modified

< source >

( path: str **kwargs )

Docstring taken from fsspec documentation.

Return the modified timestamp of a file as a datetime.datetime

rm

< source >

( path: str recursive: bool = False maxdepth: Optional = None revision: Optional = None **kwargs )

Docstring taken from fsspec documentation.

Delete files.

start_transaction

< source >

( )

Docstring taken from fsspec documentation.

Begin write transaction for deferring files, non-context version

url

< source >

( path: str )

Get the HTTP URL of the given path

walk

< source >

( path *args **kwargs )

Docstring taken from fsspec documentation.

Return all files belows path

List all files, recursing into subdirectories; output is iterator-style, like os.walk(). For a simple list of files, find() is available.

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect. (see os.walk)

Note that the “files” outputted will include anything that is not a directory, such as links.

< > Update on GitHub

Hub Python Library

Filesystem API

HfFileSystem

class huggingface_hub.HfFileSystem

__init__

exists

find

get_file

glob

info

invalidate_cache

isdir

isfile

ls

modified

rm

start_transaction

url

walk

init