hf-hub-query / _monty_codegen_shared.md
evalstate's picture
evalstate HF Staff
Deploy hf-hub-query with runtime capabilities helper and budget prompt fix
c830f69 verified

Runtime rules for generated code

  • No imports.
  • Helper functions are already in scope.
  • All helper/API calls are async: always use await.
  • max_calls is the total external-call budget for the whole generated program, not a generic helper argument.
  • The outer wrapper is an exact contract. Use this exact skeleton and only change the body:
async def solve(query, max_calls):
    ...

await solve(query, max_calls)
  • Do not modify that wrapper shape:
    • no async def solve(query, max_calls=100):
    • no async def solve(q, max_calls):
    • no async def solve(query, *, max_calls):
    • no await solve(query, max_calls or 100)
    • no await solve(query, max_calls if ... else ...)
    • no budget = max_calls followed by await solve(query, budget)
  • The runtime supplies max_calls; generated code must not invent defaults or fallbacks for it.
  • At the tool-call layer, normally omit max_calls and timeout_sec so the runtime defaults apply. Do not invent small explicit tool-call budgets like 10 or 20 for ordinary requests.
  • Use helper functions first. Use raw call_api('/api/...') only if no helper fits.
  • call_api must receive a raw path starting with /api/...; never call helper names through call_api.
  • Raw call_api(...) endpoints must match the runtime allowlist exactly. Do not invent hyphen/underscore variants or guessed path shapes.
  • call_api(...) returns {ok, status, url, data, error}. Always check resp["ok"] before reading resp["data"]. Do not read resp["items"] or resp["meta"] directly from call_api(...).
  • call_api(...) only accepts endpoint, params, method, and json_body. Do not guess extra kwargs.
  • Use call_api(...) only for endpoint families that do not already have a helper, such as /api/daily_papers or tag metadata endpoints.
  • For daily papers, use the exact raw endpoint string /api/daily_papers (underscore), not /api/daily-papers.
  • For questions about supported helpers, fields, limits, raw API affordances, or runtime capabilities, use hf_runtime_capabilities(...) instead of hand-authoring a static answer from memory.
  • Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
  • Prefer canonical snake_case keys in generated code and in JSON output.
  • When returning a structured dict that includes your own coverage metadata, use the exact top-level keys results and coverage unless the user explicitly requested different key names.
  • Omit unavailable optional fields instead of emitting null placeholders unless the user explicitly asked for a fixed schema with nulls.
  • If the user asks for specific fields or says "return only", return exactly that final shape from solve(...).
  • For current-user prompts (my, me), use helpers with username=None first. Only ask for identity if that fails.
  • When a current-user helper response has ok=false, return that helper response directly instead of flattening it into an empty result.

Common helper signature traps

These are high-priority rules. Do not guess helper arguments.

  • hf_repo_search(...) uses limit, not return_limit, and does not accept count_only.
  • hf_trending(...) uses limit, not return_limit.
  • hf_repo_discussions(...) uses limit, not return_limit.
  • hf_user_graph(...), hf_user_likes(...), hf_org_members(...), hf_recent_activity(...), and hf_collection_items(...) use return_limit.
  • For "how many models/datasets/spaces does org/user X have?" prefer hf_org_overview(...) or hf_user_summary(...)["item"]["overview"] instead of trying to count with hf_repo_search(...).
  • Never invent helper args such as count_only=True for helpers that do not document it.

Helper result shape

All helpers return:

{
  "ok": bool,
  "item": dict | None,
  "items": list[dict],
  "meta": dict,
  "error": str | None,
}

Rules:

  • items is the canonical list field.
  • item is only a singleton convenience.
  • meta contains helper-owned execution, coverage, and limit information.
  • For metadata-oriented prompts, return the relevant meta fields instead of inferring coverage from list length alone.
  • For bounded list/sample helpers in raw mode, returning the helper envelope directly preserves helper-owned meta fields.

Routing guide

Runtime self-description

  • Supported fields / helper signatures / limits / raw API affordances β†’ hf_runtime_capabilities(...)

Repo questions

  • Exact owner/name details β†’ hf_repo_details(repo_type="auto", ...)
  • Search/discovery/list/top repos β†’ hf_repo_search(...)
  • True trending requests β†’ hf_trending(...)
  • Repo discussions β†’ hf_repo_discussions(...)
  • Specific discussion details / latest comment text β†’ hf_repo_discussion_details(...)
  • Users who liked a specific repo β†’ hf_repo_likers(...)

User questions

  • Profile / overview / "tell me about user X" β†’ hf_user_summary(...)
  • Followers / following / graph samples β†’ hf_user_graph(...)
  • Repos a user liked β†’ hf_user_likes(...)
  • Recent actions / activity feed β†’ hf_recent_activity(feed_type="user", entity=...)

Organization questions

  • Organization details and counts β†’ hf_org_overview(...)
  • Organization members β†’ hf_org_members(...)
  • Organization repos β†’ hf_repo_search(author="<org>", repo_types=[...])
  • Organization or user collections β†’ hf_collections_search(owner="<org-or-user>", ...)
  • Repos inside a known collection β†’ hf_collection_items(collection_id=...)

Direction reminders

  • hf_user_likes(...) = user β†’ repos
  • hf_repo_likers(...) = repo β†’ users
  • hf_user_graph(...) = user/org β†’ followers/following
  • If the author/org is already known, start with hf_repo_search(author=...) instead of semantic search.
  • For "most popular repo a user liked", use hf_user_likes(sort="repoLikes" | "repoDownloads", ranking_window=40) instead of fetching recent likes and re-ranking locally.

Common row keys

Use these canonical keys unless the user explicitly wants different names.

  • Repo rows: repo_id, repo_type, title, author, likes, downloads, created_at, last_modified, pipeline_tag, library_name, repo_url, tags
  • User graph/member rows: username, fullname, isPro, role, type
  • Activity rows: event_type, repo_id, repo_type, timestamp
  • Collection rows: collection_id, slug, title, owner, owner_type, description, last_updated, item_count
  • hf_user_summary(...)["item"]["overview"]: username, fullname, bio, websiteUrl, twitter, github, linkedin, bluesky, followers, following, likes, isPro

Common aliases in fields=[...] are tolerated by the runtime, but prefer the canonical names above in generated code.

Common repo fields

  • repo_id
  • repo_type
  • title
  • author
  • likes
  • downloads
  • created_at
  • last_modified
  • pipeline_tag
  • repo_url
  • model: library_name
  • dataset: description, paperswithcode_id
  • space: sdk, models, datasets, subdomain

Common aliases tolerated in fields=[...]:

  • repoId β†’ repo_id
  • repoType β†’ repo_type
  • repoUrl β†’ repo_url
  • createdAt β†’ created_at
  • lastModified β†’ last_modified

Common collection fields

  • collection_id
  • slug
  • title
  • owner
  • owner_type
  • description
  • last_updated
  • item_count

Common aliases tolerated in fields=[...]:

  • collectionId β†’ collection_id
  • lastUpdated β†’ last_updated
  • ownerType β†’ owner_type
  • itemCount β†’ item_count
  • author β†’ owner

High-signal usage notes

  • hf_repo_search(...) defaults to models if no repo type is specified. For prompts like "what repos does <author/org> have", search across repo_types=["model", "dataset", "space"] unless the user asked for one type.
  • hf_trending(...) returns the Hub's ordered trending list. Use trending_rank / ordering, not a fabricated numeric trending score.
  • If the user explicitly asks for trending scores, say the upstream endpoint does not expose them and return the ordered repos instead.
  • hf_user_summary(...) is the fastest way to answer common profile prompts. Read profile/social fields from summary["item"]["overview"].
  • For "how many models/datasets/spaces does user/org X have?" prompts, prefer the overview helpers (hf_user_summary(...)["item"]["overview"] or hf_org_overview(...)) over hf_repo_search(..., limit=1) or invented count_only args.
  • Use hf_whoami() when you need the explicit current username for joins, comparisons, or output labeling.
  • For overlap/comparison/ranking tasks, fetch a broad enough working set first and compute locally in code.
  • Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
  • For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned meta fields explicitly.
  • For follower/member social-link lookups, first fetch usernames with hf_user_graph(...) or hf_org_members(...), then fetch profile/social data with hf_user_summary(username=...).
  • For fan-out tasks that require one helper call per follower/member/liker/repo/user, prefer bounded seed sets by default so ordinary requests stay fast and predictable.
  • If the user explicitly asks for exhaustive coverage (all, scan all, entire, not just the first N, ensure more than the first 20, etc.), do not silently cap the seed at a small sample such as 20 or 50.
  • For those explicit exhaustive requests, attempt a substantially broader seed scan first when the runtime budget permits.
  • For explicit exhaustive follower/member scans, prefer omitting return_limit or using a value large enough to cover the expected total. Do not choose arbitrary small caps like 50 or 100 if that would obviously prevent an exhaustive answer.
  • If the prompt says both scan all and more than the first 20, the scan all requirement wins. Do not satisfy that request with a bare sample of 50 unless you also mark the result as partial.
  • If exhaustive coverage is still not feasible within max_calls or timeout, say so clearly and return an explicit partial result with coverage metadata instead of presenting a bounded sample as if it were complete.
  • When you return a composed partial result, use the exact top-level keys results and coverage unless the user explicitly asked for a different schema. Do not rename results to items, rows, liked_models, or similar.
  • Do not use your own top-level transport wrapper named meta in raw mode; runtime already owns the outer meta.
  • Good coverage fields for partial fan-out results include: partial, reason, seed_limit, seed_processed, seed_total, seed_more_available, per_entity_limit, and next_request_hint.
  • If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with Max API calls exceeded.
  • If the user did explicitly require exhaustiveness and you cannot complete it, do not imply success. Report that the result is partial and include the relevant coverage/limit fields.
  • For explicit exhaustive follower/member prompts, if meta.more_available is true or seed_processed < seed_total, the final output must not be a bare list that looks complete. Include explicit partial/coverage information.
  • Use hf_recent_activity(...) for activity feeds instead of raw call_api('/api/recent-activity', ...).
  • Use hf_repo_search(author=..., repo_type="space", ...) for Spaces by author; there is no separate spaces-by-author helper.
  • Use hf_collections_search(owner=...) for "what collections does this org/user have?" prompts.
  • hf_collections_search(...) is for finding/listing collections. It returns collection rows plus item_count, not the full repo rows inside each collection.
  • Use hf_collection_items(collection_id=...) for "what repos/models/datasets/spaces are in this collection?" prompts.
  • Do not guess raw collection item endpoints such as /api/collections/.../items.

Helper API

await hf_runtime_capabilities(section: str | None = None)

await hf_org_overview(organization: str)

await hf_org_members(
  organization: str,
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_search(
  query: str | None = None,
  repo_type: str | None = None,
  repo_types: list[str] | None = None,
  author: str | None = None,
  filters: list[str] | None = None,
  sort: str | None = None,
  limit: int = 20,
  where: dict | None = None,
  fields: list[str] | None = None,
  advanced: dict | None = None,
)

await hf_repo_details(
  repo_id: str | None = None,
  repo_ids: list[str] | None = None,
  repo_type: str = "auto",
  fields: list[str] | None = None,
)

await hf_trending(
  repo_type: str = "model",
  limit: int = 20,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_user_summary(
  username: str | None = None,
  include: list[str] | None = None,
  sample_limit: int = 10,
  activity_limit: int = 10,
  graph_pro_only: bool | None = None,
)

await hf_user_graph(
  username: str | None = None,
  relation: str = "followers",
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  pro_only: bool | None = None,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_likers(
  repo_id: str,
  repo_type: str,
  return_limit: int | None = None,
  count_only: bool = False,
  pro_only: bool | None = None,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_user_likes(
  username: str | None = None,
  repo_types: list[str] | None = None,
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
  sort: str | None = None,
  ranking_window: int | None = None,
)

await hf_recent_activity(
  feed_type: str | None = None,
  entity: str | None = None,
  activity_types: list[str] | None = None,
  repo_types: list[str] | None = None,
  return_limit: int | None = None,
  max_pages: int | None = None,
  start_cursor: str | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_discussions(repo_type: str, repo_id: str, limit: int = 20)
await hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int)

await hf_collections_search(
  query: str | None = None,
  owner: str | None = None,
  return_limit: int = 20,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_collection_items(
  collection_id: str,
  repo_types: list[str] | None = None,
  return_limit: int = 100,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_whoami()
await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)

Minimal patterns

# Exact repo details
info = await hf_repo_details(
    repo_id="black-forest-labs/FLUX.1-dev",
    repo_type="auto",
    fields=["repo_id", "repo_type", "author", "pipeline_tag", "library_name", "likes", "downloads", "repo_url"],
)
item = info["item"] or (info["items"][0] if info["items"] else None)
return {
    "repo_id": item["repo_id"],
    "repo_type": item["repo_type"],
    "author": item["author"],
    "pipeline_tag": item.get("pipeline_tag"),
    "library_name": item.get("library_name"),
    "likes": item.get("likes"),
    "downloads": item.get("downloads"),
    "repo_url": item.get("repo_url"),
}

# Runtime capability / supported-field introspection
caps = await hf_runtime_capabilities(section="fields")
if not caps["ok"]:
    return caps
item = caps["item"] or (caps["items"][0] if caps["items"] else None)
return item["content"]

# Compact user summary
summary = await hf_user_summary(
    username="mishig",
    include=["likes", "activity"],
    sample_limit=10,
    activity_limit=10,
)
item = summary["item"] or (summary["items"][0] if summary["items"] else None)
return {
    "total_followers": item["overview"]["followers"],
    "total_following": item["overview"]["following"],
    "latest_activity": item["activity"]["sample"],
    "latest_likes": item["likes"]["sample"],
}

# Current user's pro followers and their recent liked repos
followers = await hf_user_graph(
    relation="followers",
    pro_only=True,
    fields=["username"],
)
if not followers["ok"]:
    return followers
result = {}
for row in followers["items"]:
    uname = row.get("username")
    if not uname:
        continue
    likes = await hf_user_likes(
        username=uname,
        return_limit=3,
        fields=["repo_id", "repo_type", "liked_at", "repo_url"],
    )
    repos = []
    for item in likes["items"]:
        repo = {}
        for key in ["repo_id", "repo_type", "liked_at", "repo_url"]:
            if item.get(key) is not None:
                repo[key] = item[key]
        if repo:
            repos.append(repo)
    if repos:
        result[uname] = repos
return result

# Fan-out query with bounded partial coverage metadata
followers = await hf_user_graph(
    relation="followers",
    return_limit=20,
    fields=["username"],
)
if not followers["ok"]:
    return followers
result = {}
processed = 0
for row in followers["items"]:
    uname = row.get("username")
    if not uname:
        continue
    likes = await hf_user_likes(
        username=uname,
        repo_types=["model"],
        return_limit=3,
        fields=["repo_id", "repo_author", "liked_at"],
    )
    processed += 1
    items = []
    for item in likes["items"]:
        liked = {}
        for key in ["repo_id", "repo_author", "liked_at"]:
            if item.get(key) is not None:
                liked[key] = item[key]
        if liked:
            items.append(liked)
    if items:
        result[uname] = items
return {
    "results": result,
    "coverage": {
        "partial": bool(followers["meta"].get("more_available")),
        "reason": "fanout_budget",
        "seed_relation": "followers",
        "seed_limit": 20,
        "seed_processed": processed,
        "seed_total": followers["meta"].get("total"),
        "seed_more_available": followers["meta"].get("more_available"),
        "per_entity_limit": 3,
        "next_request_hint": "Ask for a smaller subset or a follow-up batch if you want more coverage.",
    },
}

# Popularity-ranked likes with metadata
likes = await hf_user_likes(
    username="julien-c",
    return_limit=1,
    sort="repoLikes",
    ranking_window=40,
    fields=["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"],
)
item = likes["item"] or (likes["items"][0] if likes["items"] else None)
if item is None:
    return {"error": "No liked repositories found"}
repo = {}
for key in ["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"]:
    if item.get(key) is not None:
        repo[key] = item[key]
return {
    "repo": repo,
    "metadata": {
        "sort_applied": likes["meta"].get("sort_applied"),
        "ranking_window": likes["meta"].get("ranking_window"),
        "ranking_complete": likes["meta"].get("ranking_complete"),
    },
}

# Recent activity with compact snake_case rows
activity = await hf_recent_activity(
    feed_type="user",
    entity="mishig",
    return_limit=15,
    fields=["event_type", "repo_id", "repo_type", "timestamp"],
)
result = []
for row in activity["items"]:
    item = {}
    for key in ["event_type", "repo_id", "repo_type", "timestamp"]:
        if row.get(key) is not None:
            item[key] = row[key]
    if item:
        result.append(item)
return result

# Repo discussions
rows = await hf_repo_discussions(
    repo_type="model",
    repo_id="Qwen/Qwen3.5-35B-A3B",
    limit=10,
)
return [
    {
        "num": row["num"],
        "title": row["title"],
        "author": row["author"],
        "status": row["status"],
    }
    for row in rows["items"]
]

# Collections owned by an org or user
collections = await hf_collections_search(
    owner="Qwen",
    return_limit=20,
    fields=["collection_id", "title", "owner", "description", "last_updated", "item_count"],
)
return collections["items"]

# Daily papers via the exact allowed raw endpoint
resp = await call_api("/api/daily_papers")
if not resp["ok"]:
    return resp
rows = []
for item in resp.get("data") or []:
    row = {}
    if item.get("title") is not None:
        row["title"] = item["title"]
    if item.get("repo_id") is not None:
        row["repo_id"] = item["repo_id"]
    if row:
        rows.append(row)
return rows

# Organization repo counts
org = await hf_org_overview("unsloth")
item = org["item"] or (org["items"][0] if org["items"] else None)
return {
    "organization": item["organization"],
    "models": item.get("models"),
    "datasets": item.get("datasets"),
    "spaces": item.get("spaces"),
}

# Do any authors of the top trending spaces follow me?
who = await hf_whoami()
if not who["ok"]:
    return who
me = (who["item"] or (who["items"][0] if who["items"] else None)).get("username")
spaces = await hf_trending(
    repo_type="space",
    limit=20,
    fields=["repo_id", "author", "repo_url"],
)
authors = []
seen = set()
for row in spaces["items"]:
    author = row.get("author")
    if isinstance(author, str) and author and author not in seen:
        seen.add(author)
        authors.append(author)

results = []
processed = 0
for author in authors[:20]:
    graph = await hf_user_graph(
        username=author,
        relation="following",
        return_limit=200,
        fields=["username"],
    )
    processed += 1
    if not graph["ok"]:
        continue
    if any(item.get("username") == me for item in graph["items"]):
        results.append(author)

return {
    "results": results,
    "coverage": {
        "partial": False,
        "reason": None,
        "seed_relation": "trending_space_authors",
        "seed_limit": 20,
        "seed_processed": processed,
        "seed_total": len(authors),
        "seed_more_available": False,
        "per_entity_limit": 200,
    },
}

# Models inside an org's collections
collections = await hf_collections_search(
    owner="openai",
    return_limit=20,
    fields=["collection_id", "title"],
)
result = {}
for coll in collections["items"]:
    collection_id = coll.get("collection_id")
    title = coll.get("title") or collection_id
    if not collection_id:
        continue
    items = await hf_collection_items(
        collection_id=collection_id,
        repo_types=["model"],
        fields=["repo_id", "repo_type", "repo_url"],
    )
    if items["items"]:
        result[title] = items["items"]
return result