SMaLL-100 โ INT8 ONNX (general, 100 languages)
Self-exported INT8 ONNX of alirezamsh/small100 (distilled M2M-100, Apache-2.0) for on-device CPU inference (onnxruntime). Non-merged encoder/decoder. General model โ NOT a ja/vi fine-tune.
encoder.onnx: input_ids, attention_mask -> last_hidden_statedecoder.onnx: input_ids, encoder_attention_mask, encoder_hidden_states -> logitstokenizer.json: m2m100_418M tokenizer (same vocab as small100)
Scheme: encoder input = [tgt_lang_id, ...sp_pieces, eos=2]; decode greedily from eos=2 (no forced bos).