Missing sync of HF main with current GitHub HEAD — PR #8 patch not yet on HF

#4
by nikmich1 - opened

Sync HF main with current GitHub HEAD?

Thanks for Falcon Perception — open-vocab + Apache 2.0 + small footprint is a great combo. Flagging in case it's unintentional: HF main appears to be one commit behind GitHub.

The gap

HF (tiiuae/falcon-perception, revision 5b0808e4af1e58ad6ff0b72be54d3a38b2e50c5b), modeling_falcon_perception.py:140:

output, aux_output = flex_fn(xq, xk, xv,
    block_mask=attention_masks,
    return_aux=AuxRequest(lse=True))
GitHub (falcon_perception/model.py:245, commit 1e21fba, PR #8 merged 2026-04-09):


output, aux_output = flex_fn(xq, xk, xv,
    block_mask=attention_masks,
    return_aux=AuxRequest(lse=True),
    kernel_options=flex_attn_kernel_options)
flex_attn_kernel_options is plumbed through TransformerBlock.forward and Attention.forward on HF — just not reaching the kernel.

Impact

For trust_remote_code users, the default Triton autotune picks tiles that overflow per-SM shared memory on non-H100 GPUs. Reproduced on A10G (Required: 149248, Hardware limit: 101376) and T4 (Required: 81920, Hardware limit: 65536). Same pattern as HF discussion #1 (RTX 3090) and GitHub issue #7 (A40). PR #8's batch_inference.py plumbing has no effect via this path since the kwarg dies at line 140.

Ask

Either is great:

Re-push HF main from current GitHub HEAD, or
Note on the model card that production should pip install git+https://github.com/tiiuae/Falcon-Perception rather than load via HF trust_remote_code.
Happy to share a repro notebook. Thanks!

Sign up or log in to comment