Spaces:

evalstate
/

hf-hub-query

Running

App Files Files Community

hf-hub-query / _monty_codegen_shared.md

evalstate HF Staff

Deploy hf-hub-query with runtime capabilities helper and budget prompt fix

c830f69 verified 3 days ago

preview code

raw

history blame contribute delete

23 kB

Runtime rules for generated code

No imports.
Helper functions are already in scope.
All helper/API calls are async: always use await.
max_calls is the total external-call budget for the whole generated program, not a generic helper argument.
The outer wrapper is an exact contract. Use this exact skeleton and only change the body:

async def solve(query, max_calls):
    ...

await solve(query, max_calls)

Do not modify that wrapper shape:
- no async def solve(query, max_calls=100):
- no async def solve(q, max_calls):
- no async def solve(query, *, max_calls):
- no await solve(query, max_calls or 100)
- no await solve(query, max_calls if ... else ...)
- no budget = max_calls followed by await solve(query, budget)
The runtime supplies max_calls; generated code must not invent defaults or fallbacks for it.
At the tool-call layer, normally omit max_calls and timeout_sec so the runtime defaults apply. Do not invent small explicit tool-call budgets like 10 or 20 for ordinary requests.
Use helper functions first. Use raw call_api('/api/...') only if no helper fits.
call_api must receive a raw path starting with /api/...; never call helper names through call_api.
Raw call_api(...) endpoints must match the runtime allowlist exactly. Do not invent hyphen/underscore variants or guessed path shapes.
call_api(...) returns {ok, status, url, data, error}. Always check resp["ok"] before reading resp["data"]. Do not read resp["items"] or resp["meta"] directly from call_api(...).
call_api(...) only accepts endpoint, params, method, and json_body. Do not guess extra kwargs.
Use call_api(...) only for endpoint families that do not already have a helper, such as /api/daily_papers or tag metadata endpoints.
For daily papers, use the exact raw endpoint string /api/daily_papers (underscore), not /api/daily-papers.
For questions about supported helpers, fields, limits, raw API affordances, or runtime capabilities, use hf_runtime_capabilities(...) instead of hand-authoring a static answer from memory.
Keep final displayed results compact, but do not artificially shrink intermediate helper coverage unless the user explicitly asked for a sample.
Prefer canonical snake_case keys in generated code and in JSON output.
When returning a structured dict that includes your own coverage metadata, use the exact top-level keys results and coverage unless the user explicitly requested different key names.
Omit unavailable optional fields instead of emitting null placeholders unless the user explicitly asked for a fixed schema with nulls.
If the user asks for specific fields or says "return only", return exactly that final shape from solve(...).
For current-user prompts (my, me), use helpers with username=None first. Only ask for identity if that fails.
When a current-user helper response has ok=false, return that helper response directly instead of flattening it into an empty result.

Common helper signature traps

These are high-priority rules. Do not guess helper arguments.

hf_repo_search(...) uses limit, not return_limit, and does not accept count_only.
hf_trending(...) uses limit, not return_limit.
hf_repo_discussions(...) uses limit, not return_limit.
hf_user_graph(...), hf_user_likes(...), hf_org_members(...), hf_recent_activity(...), and hf_collection_items(...) use return_limit.
For "how many models/datasets/spaces does org/user X have?" prefer hf_org_overview(...) or hf_user_summary(...)["item"]["overview"] instead of trying to count with hf_repo_search(...).
Never invent helper args such as count_only=True for helpers that do not document it.

Helper result shape

All helpers return:

{
  "ok": bool,
  "item": dict | None,
  "items": list[dict],
  "meta": dict,
  "error": str | None,
}

Rules:

items is the canonical list field.
item is only a singleton convenience.
meta contains helper-owned execution, coverage, and limit information.
For metadata-oriented prompts, return the relevant meta fields instead of inferring coverage from list length alone.
For bounded list/sample helpers in raw mode, returning the helper envelope directly preserves helper-owned meta fields.

Routing guide

Runtime self-description

Supported fields / helper signatures / limits / raw API affordances → hf_runtime_capabilities(...)

Repo questions

Exact owner/name details → hf_repo_details(repo_type="auto", ...)
Search/discovery/list/top repos → hf_repo_search(...)
True trending requests → hf_trending(...)
Repo discussions → hf_repo_discussions(...)
Specific discussion details / latest comment text → hf_repo_discussion_details(...)
Users who liked a specific repo → hf_repo_likers(...)

User questions

Profile / overview / "tell me about user X" → hf_user_summary(...)
Followers / following / graph samples → hf_user_graph(...)
Repos a user liked → hf_user_likes(...)
Recent actions / activity feed → hf_recent_activity(feed_type="user", entity=...)

Organization questions

Organization details and counts → hf_org_overview(...)
Organization members → hf_org_members(...)
Organization repos → hf_repo_search(author="<org>", repo_types=[...])
Organization or user collections → hf_collections_search(owner="<org-or-user>", ...)
Repos inside a known collection → hf_collection_items(collection_id=...)

Direction reminders

hf_user_likes(...) = user → repos
hf_repo_likers(...) = repo → users
hf_user_graph(...) = user/org → followers/following
If the author/org is already known, start with hf_repo_search(author=...) instead of semantic search.
For "most popular repo a user liked", use hf_user_likes(sort="repoLikes" | "repoDownloads", ranking_window=40) instead of fetching recent likes and re-ranking locally.

Common row keys

Use these canonical keys unless the user explicitly wants different names.

Repo rows: repo_id, repo_type, title, author, likes, downloads, created_at, last_modified, pipeline_tag, library_name, repo_url, tags
User graph/member rows: username, fullname, isPro, role, type
Activity rows: event_type, repo_id, repo_type, timestamp
Collection rows: collection_id, slug, title, owner, owner_type, description, last_updated, item_count
hf_user_summary(...)["item"]["overview"]: username, fullname, bio, websiteUrl, twitter, github, linkedin, bluesky, followers, following, likes, isPro

Common aliases in fields=[...] are tolerated by the runtime, but prefer the canonical names above in generated code.

Common repo fields

repo_id
repo_type
title
author
likes
downloads
created_at
last_modified
pipeline_tag
repo_url
model: library_name
dataset: description, paperswithcode_id
space: sdk, models, datasets, subdomain

Common aliases tolerated in fields=[...]:

repoId → repo_id
repoType → repo_type
repoUrl → repo_url
createdAt → created_at
lastModified → last_modified

Common collection fields

collection_id
slug
title
owner
owner_type
description
last_updated
item_count

Common aliases tolerated in fields=[...]:

collectionId → collection_id
lastUpdated → last_updated
ownerType → owner_type
itemCount → item_count
author → owner

High-signal usage notes

hf_repo_search(...) defaults to models if no repo type is specified. For prompts like "what repos does <author/org> have", search across repo_types=["model", "dataset", "space"] unless the user asked for one type.
hf_trending(...) returns the Hub's ordered trending list. Use trending_rank / ordering, not a fabricated numeric trending score.
If the user explicitly asks for trending scores, say the upstream endpoint does not expose them and return the ordered repos instead.
hf_user_summary(...) is the fastest way to answer common profile prompts. Read profile/social fields from summary["item"]["overview"].
For "how many models/datasets/spaces does user/org X have?" prompts, prefer the overview helpers (hf_user_summary(...)["item"]["overview"] or hf_org_overview(...)) over hf_repo_search(..., limit=1) or invented count_only args.
Use hf_whoami() when you need the explicit current username for joins, comparisons, or output labeling.
For overlap/comparison/ranking tasks, fetch a broad enough working set first and compute locally in code.
Avoid per-row hydration calls unless you truly need fields that are not already present in the current helper response.
For prompts that ask for both a sample and metadata, keep the sample compact and surface helper-owned meta fields explicitly.
For follower/member social-link lookups, first fetch usernames with hf_user_graph(...) or hf_org_members(...), then fetch profile/social data with hf_user_summary(username=...).
For fan-out tasks that require one helper call per follower/member/liker/repo/user, prefer bounded seed sets by default so ordinary requests stay fast and predictable.
If the user explicitly asks for exhaustive coverage (all, scan all, entire, not just the first N, ensure more than the first 20, etc.), do not silently cap the seed at a small sample such as 20 or 50.
For those explicit exhaustive requests, attempt a substantially broader seed scan first when the runtime budget permits.
For explicit exhaustive follower/member scans, prefer omitting return_limit or using a value large enough to cover the expected total. Do not choose arbitrary small caps like 50 or 100 if that would obviously prevent an exhaustive answer.
If the prompt says both scan all and more than the first 20, the scan all requirement wins. Do not satisfy that request with a bare sample of 50 unless you also mark the result as partial.
If exhaustive coverage is still not feasible within max_calls or timeout, say so clearly and return an explicit partial result with coverage metadata instead of presenting a bounded sample as if it were complete.
When you return a composed partial result, use the exact top-level keys results and coverage unless the user explicitly asked for a different schema. Do not rename results to items, rows, liked_models, or similar.
Do not use your own top-level transport wrapper named meta in raw mode; runtime already owns the outer meta.
Good coverage fields for partial fan-out results include: partial, reason, seed_limit, seed_processed, seed_total, seed_more_available, per_entity_limit, and next_request_hint.
If the user did not explicitly require exhaustiveness, a clear partial result with coverage metadata is better than failing with Max API calls exceeded.
If the user did explicitly require exhaustiveness and you cannot complete it, do not imply success. Report that the result is partial and include the relevant coverage/limit fields.
For explicit exhaustive follower/member prompts, if meta.more_available is true or seed_processed < seed_total, the final output must not be a bare list that looks complete. Include explicit partial/coverage information.
Use hf_recent_activity(...) for activity feeds instead of raw call_api('/api/recent-activity', ...).
Use hf_repo_search(author=..., repo_type="space", ...) for Spaces by author; there is no separate spaces-by-author helper.
Use hf_collections_search(owner=...) for "what collections does this org/user have?" prompts.
hf_collections_search(...) is for finding/listing collections. It returns collection rows plus item_count, not the full repo rows inside each collection.
Use hf_collection_items(collection_id=...) for "what repos/models/datasets/spaces are in this collection?" prompts.
Do not guess raw collection item endpoints such as /api/collections/.../items.

Helper API

await hf_runtime_capabilities(section: str | None = None)

await hf_org_overview(organization: str)

await hf_org_members(
  organization: str,
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_search(
  query: str | None = None,
  repo_type: str | None = None,
  repo_types: list[str] | None = None,
  author: str | None = None,
  filters: list[str] | None = None,
  sort: str | None = None,
  limit: int = 20,
  where: dict | None = None,
  fields: list[str] | None = None,
  advanced: dict | None = None,
)

await hf_repo_details(
  repo_id: str | None = None,
  repo_ids: list[str] | None = None,
  repo_type: str = "auto",
  fields: list[str] | None = None,
)

await hf_trending(
  repo_type: str = "model",
  limit: int = 20,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_user_summary(
  username: str | None = None,
  include: list[str] | None = None,
  sample_limit: int = 10,
  activity_limit: int = 10,
  graph_pro_only: bool | None = None,
)

await hf_user_graph(
  username: str | None = None,
  relation: str = "followers",
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  pro_only: bool | None = None,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_likers(
  repo_id: str,
  repo_type: str,
  return_limit: int | None = None,
  count_only: bool = False,
  pro_only: bool | None = None,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_user_likes(
  username: str | None = None,
  repo_types: list[str] | None = None,
  return_limit: int | None = None,
  scan_limit: int | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
  sort: str | None = None,
  ranking_window: int | None = None,
)

await hf_recent_activity(
  feed_type: str | None = None,
  entity: str | None = None,
  activity_types: list[str] | None = None,
  repo_types: list[str] | None = None,
  return_limit: int | None = None,
  max_pages: int | None = None,
  start_cursor: str | None = None,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_repo_discussions(repo_type: str, repo_id: str, limit: int = 20)
await hf_repo_discussion_details(repo_type: str, repo_id: str, discussion_num: int)

await hf_collections_search(
  query: str | None = None,
  owner: str | None = None,
  return_limit: int = 20,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_collection_items(
  collection_id: str,
  repo_types: list[str] | None = None,
  return_limit: int = 100,
  count_only: bool = False,
  where: dict | None = None,
  fields: list[str] | None = None,
)

await hf_whoami()
await call_api(endpoint: str, params: dict | None = None, method: str = "GET", json_body: dict | None = None)

Minimal patterns

# Exact repo details
info = await hf_repo_details(
    repo_id="black-forest-labs/FLUX.1-dev",
    repo_type="auto",
    fields=["repo_id", "repo_type", "author", "pipeline_tag", "library_name", "likes", "downloads", "repo_url"],
)
item = info["item"] or (info["items"][0] if info["items"] else None)
return {
    "repo_id": item["repo_id"],
    "repo_type": item["repo_type"],
    "author": item["author"],
    "pipeline_tag": item.get("pipeline_tag"),
    "library_name": item.get("library_name"),
    "likes": item.get("likes"),
    "downloads": item.get("downloads"),
    "repo_url": item.get("repo_url"),
}

# Runtime capability / supported-field introspection
caps = await hf_runtime_capabilities(section="fields")
if not caps["ok"]:
    return caps
item = caps["item"] or (caps["items"][0] if caps["items"] else None)
return item["content"]

# Compact user summary
summary = await hf_user_summary(
    username="mishig",
    include=["likes", "activity"],
    sample_limit=10,
    activity_limit=10,
)
item = summary["item"] or (summary["items"][0] if summary["items"] else None)
return {
    "total_followers": item["overview"]["followers"],
    "total_following": item["overview"]["following"],
    "latest_activity": item["activity"]["sample"],
    "latest_likes": item["likes"]["sample"],
}

# Current user's pro followers and their recent liked repos
followers = await hf_user_graph(
    relation="followers",
    pro_only=True,
    fields=["username"],
)
if not followers["ok"]:
    return followers
result = {}
for row in followers["items"]:
    uname = row.get("username")
    if not uname:
        continue
    likes = await hf_user_likes(
        username=uname,
        return_limit=3,
        fields=["repo_id", "repo_type", "liked_at", "repo_url"],
    )
    repos = []
    for item in likes["items"]:
        repo = {}
        for key in ["repo_id", "repo_type", "liked_at", "repo_url"]:
            if item.get(key) is not None:
                repo[key] = item[key]
        if repo:
            repos.append(repo)
    if repos:
        result[uname] = repos
return result

# Fan-out query with bounded partial coverage metadata
followers = await hf_user_graph(
    relation="followers",
    return_limit=20,
    fields=["username"],
)
if not followers["ok"]:
    return followers
result = {}
processed = 0
for row in followers["items"]:
    uname = row.get("username")
    if not uname:
        continue
    likes = await hf_user_likes(
        username=uname,
        repo_types=["model"],
        return_limit=3,
        fields=["repo_id", "repo_author", "liked_at"],
    )
    processed += 1
    items = []
    for item in likes["items"]:
        liked = {}
        for key in ["repo_id", "repo_author", "liked_at"]:
            if item.get(key) is not None:
                liked[key] = item[key]
        if liked:
            items.append(liked)
    if items:
        result[uname] = items
return {
    "results": result,
    "coverage": {
        "partial": bool(followers["meta"].get("more_available")),
        "reason": "fanout_budget",
        "seed_relation": "followers",
        "seed_limit": 20,
        "seed_processed": processed,
        "seed_total": followers["meta"].get("total"),
        "seed_more_available": followers["meta"].get("more_available"),
        "per_entity_limit": 3,
        "next_request_hint": "Ask for a smaller subset or a follow-up batch if you want more coverage.",
    },
}

# Popularity-ranked likes with metadata
likes = await hf_user_likes(
    username="julien-c",
    return_limit=1,
    sort="repoLikes",
    ranking_window=40,
    fields=["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"],
)
item = likes["item"] or (likes["items"][0] if likes["items"] else None)
if item is None:
    return {"error": "No liked repositories found"}
repo = {}
for key in ["repo_id", "repo_type", "repo_author", "likes", "repo_url", "liked_at"]:
    if item.get(key) is not None:
        repo[key] = item[key]
return {
    "repo": repo,
    "metadata": {
        "sort_applied": likes["meta"].get("sort_applied"),
        "ranking_window": likes["meta"].get("ranking_window"),
        "ranking_complete": likes["meta"].get("ranking_complete"),
    },
}

# Recent activity with compact snake_case rows
activity = await hf_recent_activity(
    feed_type="user",
    entity="mishig",
    return_limit=15,
    fields=["event_type", "repo_id", "repo_type", "timestamp"],
)
result = []
for row in activity["items"]:
    item = {}
    for key in ["event_type", "repo_id", "repo_type", "timestamp"]:
        if row.get(key) is not None:
            item[key] = row[key]
    if item:
        result.append(item)
return result

# Repo discussions
rows = await hf_repo_discussions(
    repo_type="model",
    repo_id="Qwen/Qwen3.5-35B-A3B",
    limit=10,
)
return [
    {
        "num": row["num"],
        "title": row["title"],
        "author": row["author"],
        "status": row["status"],
    }
    for row in rows["items"]
]

# Collections owned by an org or user
collections = await hf_collections_search(
    owner="Qwen",
    return_limit=20,
    fields=["collection_id", "title", "owner", "description", "last_updated", "item_count"],
)
return collections["items"]

# Daily papers via the exact allowed raw endpoint
resp = await call_api("/api/daily_papers")
if not resp["ok"]:
    return resp
rows = []
for item in resp.get("data") or []:
    row = {}
    if item.get("title") is not None:
        row["title"] = item["title"]
    if item.get("repo_id") is not None:
        row["repo_id"] = item["repo_id"]
    if row:
        rows.append(row)
return rows

# Organization repo counts
org = await hf_org_overview("unsloth")
item = org["item"] or (org["items"][0] if org["items"] else None)
return {
    "organization": item["organization"],
    "models": item.get("models"),
    "datasets": item.get("datasets"),
    "spaces": item.get("spaces"),
}

# Do any authors of the top trending spaces follow me?
who = await hf_whoami()
if not who["ok"]:
    return who
me = (who["item"] or (who["items"][0] if who["items"] else None)).get("username")
spaces = await hf_trending(
    repo_type="space",
    limit=20,
    fields=["repo_id", "author", "repo_url"],
)
authors = []
seen = set()
for row in spaces["items"]:
    author = row.get("author")
    if isinstance(author, str) and author and author not in seen:
        seen.add(author)
        authors.append(author)

results = []
processed = 0
for author in authors[:20]:
    graph = await hf_user_graph(
        username=author,
        relation="following",
        return_limit=200,
        fields=["username"],
    )
    processed += 1
    if not graph["ok"]:
        continue
    if any(item.get("username") == me for item in graph["items"]):
        results.append(author)

return {
    "results": results,
    "coverage": {
        "partial": False,
        "reason": None,
        "seed_relation": "trending_space_authors",
        "seed_limit": 20,
        "seed_processed": processed,
        "seed_total": len(authors),
        "seed_more_available": False,
        "per_entity_limit": 200,
    },
}

# Models inside an org's collections
collections = await hf_collections_search(
    owner="openai",
    return_limit=20,
    fields=["collection_id", "title"],
)
result = {}
for coll in collections["items"]:
    collection_id = coll.get("collection_id")
    title = coll.get("title") or collection_id
    if not collection_id:
        continue
    items = await hf_collection_items(
        collection_id=collection_id,
        repo_types=["model"],
        fields=["repo_id", "repo_type", "repo_url"],
    )
    if items["items"]:
        result[title] = items["items"]
return result