davanstrien's picture
davanstrien HF Staff
probe: docker sdk bucket mount test (UID 1000 per HF docs)
e1eb5a6 verified
metadata
title: bucket-sqlite-probe-docker
emoji: 🐳
colorFrom: red
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
tags:
  - bucket
  - sqlite
  - probe
  - reference

Bucket mount Γ— SQLite probe β€” Docker SDK Space (UID 1000)

This is the failing half of a matched pair. The Gradio SDK half is at davanstrien/bucket-sqlite-probe-gradio and all its probes pass. This Space runs the same probe code against the same bucket, but inside a Docker SDK container that follows the official spaces-sdks-docker#permissions guidance β€” creating a user account with uid=1000 and switching to it via USER user in the Dockerfile. Its write probes fail with Permission denied / unable to open database file.

The Dockerfile is intentionally as close to the official permissions example as possible β€” the only deviations are the CMD (runs app.py) and an explicit chown/chmod 777 /data step to prove that build-time permissions are overridden by the runtime mount.

What it demonstrates

With bucket davanstrien/search-v2-chroma attached R/W at /data:

  • The container runs as uid=1000(user), exactly as spaces-sdks-docker#permissions recommends.
  • /data is mounted by hf-mount with idmapped,user_id=0,group_id=0,default_permissions. That id-mapping pins the mount's writable UID to 0 (root), not the container's UID 1000.
  • ls -lan /data shows drwxr-xr-x 3 65534 65534 (nobody:nogroup, mode 755). The chmod 777 in the Dockerfile is silently overridden β€” the runtime mount replaces the build-time directory entirely.
  • Write probes all fail:
    FAIL touch /data/docker_probe_touch: PermissionError: [Errno 13] Permission denied: '/data/docker_probe_touch'
    FAIL sqlite3 connect + CREATE + INSERT: OperationalError: unable to open database file
    FAIL sqlite3 journal_mode=DELETE: OperationalError: unable to open database file
    FAIL fcntl.flock LOCK_EX|LOCK_NB: PermissionError: [Errno 13] Permission denied: '/data/docker_probe.lock'
    
  • The control probe (touch $HOME/app/control_probe on the container's build-time writable dir) succeeds β€” so the container is healthy and UID 1000 can write somewhere, just not to the bucket mount.

Why this matters

  1. The Docker Spaces docs explicitly tell you to run as UID 1000. There's no note that this is incompatible with Storage Bucket mounts.
  2. The Storage Buckets blog post (buckets as working layer) implies buckets are a drop-in R/W volume for Spaces.
  3. These two bits of guidance are silently incompatible today because the FUSE mount is provisioned with user_id=0,group_id=0. A root container sees the mount as writable; a UID 1000 container does not.
  4. Any SQLite-backed tool (ChromaDB, DuckDB persistent, LMDB, RocksDB) built on a Docker SDK Space following the official permissions guidance will silently fail to open its database on a bucket mount. huggingface-datasets-search-v2 hit this; trackio doesn't because Gradio SDK Spaces run as root.

The fix (almost certainly)

The mount provisioning layer should either:

  1. Mount with user_id=1000,group_id=1000 for Docker SDK Spaces (the conventional UID), or
  2. Mount with the Space's runtime UID dynamically (cleanest), or
  3. Chmod the mount root to 0777 at provisioning time (hacky but works for any UID).

All three are infra-side changes. The container user can't fix this β€” any chown / chmod in the Dockerfile is overridden when the mount is attached at runtime.

Reproducing

  1. Fork or duplicate this Space.
  2. Create or choose a Storage Bucket you own.
  3. Attach it R/W at /data via Space settings β†’ Volumes (UI) or via:
    from huggingface_hub import HfApi, Volume  # requires huggingface_hub >= 1.9.1
    HfApi().set_space_volumes(
        "your-namespace/bucket-sqlite-probe-docker",
        volumes=[Volume(type="bucket", source="your-namespace/your-bucket", mount_path="/data")],
    )
    
  4. Restart. Probe output appears in startup logs and at the Space URL.

Related


Throwaway investigative Space. Kept public as a reference example. Do not rely on for production.