badrex Ethio-ASR Inference Endpoint

Custom HuggingFace Inference Endpoint handler for badrex Ethio-ASR (wav2vec2-bert CTC) models — the endpoint counterpart of src/transcribers/badrex.py. Native Tigrinya ASR that beat MMS in the 2026-06-14 eval.

The model served is chosen by the BADREX_MODEL environment variable (default badrex/Ethio-ASR-multilingual-1B). The handler holds no weights — they're pulled from the Hub on cold start, same as the MMS endpoint.

Long audio is chunked inside the HF ASR pipeline (chunk_length_s=30, stride_length_s=5), so a full broadcast goes through in one request — no client-side splitting, and no OOM on hour-long audio.

Deploy

1. Create a HuggingFace repo

huggingface.co → New model → e.g. badrex-endpoint.

2. Push this directory

cd endpoint-badrex/
git init
git remote add origin https://huggingface.co/YOUR_USERNAME/badrex-endpoint
git add handler.py requirements.txt config.json
git commit -m "add badrex custom handler"
git push origin main

3. Create the Inference Endpoint

ui.endpoints.huggingface.co → New endpoint:

Setting	Value
Model repository	`YOUR_USERNAME/badrex-endpoint`
Task	`Custom`
Hardware	`GPU · T4 · 1x`
Min replicas	`0` (scale to zero)
Max replicas	`1`

Pick the model under the endpoint's Environment variables:

Variable	Value
`BADREX_MODEL`	`badrex/Ethio-ASR-multilingual-1B` (default; auto-detects am/ti, robust to Amharic-leakage on bilingual ti channels)
	or `badrex/Ethio-ASR-tigrinya` (lighter, monolingual, slightly cleaner on pure Tigrinya)

4. Point newsgrab at it

In channels.yaml:

settings:
  asr_routing: {ti: badrex}   # send Tigrinya to badrex; everything else stays on MMS

badrex:
  device: api
  api_url: https://YOUR-ENDPOINT-ID.endpoints.huggingface.cloud
  api_token: null             # set HF_TOKEN environment variable instead

In api mode the endpoint is the model, so badrex.models / badrex.default_model are ignored — the served checkpoint is whatever BADREX_MODEL selects.

Updating / rolling back

Same as the MMS endpoint: push the new handler.py, then Settings → Revision (pin a commit SHA, or track main) → Update Endpoint. The URL is unchanged, so channels.yaml needs no edit. Switching the served model is just an env-var change (BADREX_MODEL) + endpoint restart — no code push.

Request format

import base64, requests

with open("audio.webm", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

r = requests.post(
    "https://YOUR-ENDPOINT-ID.endpoints.huggingface.cloud",
    headers={"Authorization": "Bearer hf_...", "Content-Type": "application/json"},
    json={"inputs": b64},
)
print(r.json()["text"])   # the multilingual model's leading [TIR] tag is stripped client-side

No language parameter — the deployed model is the language selector. The multilingual model emits a leading [TIR]/[AMH] tag; the newsgrab client (src/transcribers/badrex.py) strips it. If you call the endpoint directly, strip ^\s*\[[A-Za-z]{2,4}\]\s* yourself.

Hardware

Use T4 (16 GB). The 1B checkpoint fits comfortably; the 0.6B models more so. Scale-to-zero (min replicas: 0) means no idle cost; cold start (model load) is ~60–90 s. HF Endpoints accept up to ~100 MB per request — fine for full broadcasts as base64.

Notes

torch/torchaudio are pre-installed in the HF endpoint base image; only transformers>=4.44.0 is declared (wav2vec2-bert + pipeline support).
Tags transcripts Source: badrex-api on the newsgrab side — distinct from mms/gcp, so recheck-captions and prefer_mms are untouched.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support