BEAT: Behavioral Encoder for Action Trajectories

๐ŸŽฎ Try the Live Demo โ†’

A foundation transformer model that encodes sequences of human behavioral events into dense, reusable embeddings.

What is BEAT?

Every company predicting churn, recommending products, or segmenting users starts by manually engineering features from behavioral data (RFM scores, click counts, session metrics). This feature engineering is where most prediction quality is lost.

BEAT eliminates that step. Feed it raw event sequences โ€” page views, purchases, searches, support tickets โ€” and get a rich 768-dimensional embedding that captures the user's behavioral state.

Key Innovation

Unlike text transformers (BERT, GPT) that encode language, BEAT is designed specifically for action sequences with temporal dynamics:

  • Temporal encoding: Learns from time gaps between events (a purchase 1 day after browsing means something different than 30 days after)
  • Action vocabulary: Encodes event types, not words
  • Behavioral context: Understands that the same action means different things in different sequences

Usage

from transformers import AutoModel
import torch

# Load model
model = AutoModel.from_pretrained("your-org/beat-encoder")

# Encode a behavioral sequence
action_ids = torch.tensor([[1, 2, 3, 5, 1, 6, 2, 5]])  # page_view, product_view, cart, purchase...
property_ids = torch.tensor([[12, 45, 45, 45, 8, 3, 22, 22]])  # category/property context
time_gaps = torch.tensor([[0.0, 0.1, 0.5, 1.2, 3.0, 3.1, 7.0, 7.5]])  # days between events

outputs = model(action_ids, property_ids, time_gaps)
embedding = outputs["embedding"]  # [1, 768] โ€” user behavioral state

Pre-training Objectives

  1. Masked Event Prediction: Randomly mask 15% of events, predict the action type (like MLM in BERT)
  2. Next Event Prediction: Given a sequence, predict what action comes next
  3. Contrastive Learning: Different time windows of the same user should produce similar embeddings

Downstream Tasks

BEAT embeddings can be used for:

Task Method Expected Improvement
Churn prediction Linear probe on embedding +8-15% AUC vs. manual features
User segmentation Cluster embeddings More stable, interpretable clusters
Next-best-action Fine-tune prediction head Captures temporal patterns manual features miss
Personalization Nearest-neighbor in embedding space Real behavioral similarity, not just demographics

Training Data

Pre-trained on the REES46 e-commerce behavioral dataset (20M+ events from a multi-category online store):

  • 50,000 users, 18,401 behavioral sequences
  • 10,350 training steps across 10 epochs
  • Training loss converged from 0.83 โ†’ 0.42
  • Hardware: 2ร— NVIDIA T4 GPU (~27 minutes)

The model generalizes to other behavioral domains through fine-tuning.

Architecture

Parameter Value
Hidden size 768
Layers 12
Attention heads 12
Parameters 86.4M
Embedding output 768-dim
Max sequence length 256 events
Temporal encoding Learned + sinusoidal (90-day window)

Paper

๐Ÿ“„ BEAT: A Foundation Model for Human Behavioral Sequences Published on Zenodo โ€” DOI: 10.5281/zenodo.20774886

Citation

@article{dhanani2026beat,
  title     = {BEAT: A Foundation Model for Human Behavioral Sequences},
  author    = {Dhanani, Brijesh},
  year      = {2026},
  doi       = {10.5281/zenodo.20774886},
  url       = {https://doi.org/10.5281/zenodo.20774886},
  publisher = {Zenodo}
}

License

Apache 2.0

Downloads last month
58
Safetensors
Model size
86.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support