HandleAtlas-166m

A fine-tuned GLiNER small v2.1 (~166M params) for extracting social-media handles from short bios. Built on Twitter/X bios but the patterns generalize to other platforms.

Labels

instagram_username
snapchat_username
youtube_username
twitch_username
tiktok_username
discord_username
x_username
cashapp_username
onlyfans_username
tumblr_username
github_username
kofi_username
patreon_username
roblox_username
generic_username

generic_username is a fallback for handle-shaped strings without a clear platform indicator.

Usage

from gliner import GLiNER

model = GLiNER.from_pretrained("LumeData/HandleAtlas-166m")

labels = ['instagram_username', 'snapchat_username', 'youtube_username', 'twitch_username', 'tiktok_username', 'discord_username', 'x_username', 'cashapp_username', 'onlyfans_username', 'tumblr_username', 'github_username', 'kofi_username', 'patreon_username', 'roblox_username', 'generic_username']

text = "Insta: foodgrammer | Snap: chefchef | DC: gamer420 | $cashtag"
for ent in model.predict_entities(text, labels, threshold=0.5):
    print(f"{ent['text']!r} -> {ent['label']} ({ent['score']:.2f})")

Training

Base: urchade/gliner_small-v2.1
Real data: ~1,000 hand-labeled Twitter bios
Synthetic data: ~2,200 generated bios (template-based + IG→Discord text rewriting for the discord_username class)
Case augmentation: each training record is emitted in original + fully-lowercased form so the model is robust to casing of platform prefixes (Dc:/dc:/DC: etc.)
5 epochs, batch 4 × grad-accum 2, lr 5e-6 (encoder) / 1e-5 (heads), cosine schedule

Eval

On a 100-record held-out slice of real Twitter bios:

metric	value
precision	0.849
recall	0.929
F1	0.887

Strong per-label F1 on instagram (0.95), youtube (1.00), tiktok (1.00), twitch (1.00), onlyfans (1.00), generic (0.88), cashapp (0.86), snapchat (0.80).

Recommended thresholds

Default: threshold=0.5
For generic_username, bump to 0.65 to reduce false positives; it's the catch-all label and over-fires at the default threshold.

Limitations

Trained on patterns common in Twitter/X bios; performance on other domains (LinkedIn-style, Reddit, forum sigs) will be lower.
discord_invite is not predicted — invite codes will be classified as discord_username or skipped.
Multi-line bios with many handles can occasionally confuse adjacent URL labels (e.g., patreon.com/x | github.com/x chains).

Downloads last month: -

Model tree for LumeData/HandleAtlas-166m

Base model

urchade/gliner_small-v2.1

Finetuned

(3)

this model

Quantizations

1 model