|
--- |
|
inference: false |
|
tags: |
|
- SeamlessM4T |
|
license: cc-by-nc-4.0 |
|
library_name: fairseq2 |
|
--- |
|
|
|
# SeamlessM4T - On-Device |
|
SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text. |
|
|
|
SeamlessM4T covers: |
|
- 📥 101 languages for speech input |
|
- ⌨️ 96 Languages for text input/output |
|
- 🗣️ 35 languages for speech output. |
|
|
|
Apart from [SeamlessM4T-LARGE (2.3B)](https://huggingface.co/facebook/seamless-m4t-large) and [SeamlessM4T-MEDIUM (1.2B)](https://huggingface.co/facebook/seamless-m4t-medium) models, we are also developing a small model (281M) targeting for on-device inference. |
|
[This folder](https://huggingface.co/facebook/seamless-m4t-unity-small) contains an example to run an exported small model covering most tasks (ASR/S2TT/S2ST). The model could be executed on popular mobile devices with Pytorch Mobile (https://pytorch.org/mobile/home/). |
|
|
|
## Overview |
|
| Model | Checkpoint | Num Params | Disk Size | Supported Tasks | Supported Languages| |
|
|---------|------------|----------|-------------|------------|-------------------------| |
|
| UnitY-Small|[🤗 Model card](https://huggingface.co/facebook/seamless-m4t-unity-small) - [checkpoint](https://huggingface.co/facebook/seamless-m4t-unity-small/resolve/main/unity_on_device.ptl) | 281M | 862MB | S2ST, S2TT, ASR |eng, fra, hin, por, spa| |
|
| UnitY-Small-S2T |[🤗 Model card](https://huggingface.co/facebook/seamless-m4t-unity-small-s2t) - [checkpoint](https://huggingface.co/facebook/seamless-m4t-unity-small-s2t/resolve/main/unity_on_device_s2t.ptl) | 235M | 637MB | S2TT, ASR |eng, fra,hin, por, spa| |
|
|
|
UnitY-Small-S2T is a pruned version of UnitY-Small without 2nd pass unit decoding. |
|
|
|
Note: If using pytorch runtime in python, only **pytorch<=1.11.0** is supported for **UnitY-Small(281M)**. We tested UnitY-Small-S2T(235M), it works with later versions. |
|
|
|
## Inference |
|
To use exported model, users don't need seamless_communication or fairseq2 dependency. |
|
|
|
```python |
|
import torchaudio |
|
import torch |
|
audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio |
|
|
|
s2t_model = torch.jit.load("unity_on_device_s2t.ptl") # Load exported S2T model |
|
text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT |
|
print(f"{lang}:{text}") |
|
|
|
s2st_model = torch.jit.load("unity_on_device.ptl") |
|
text, units, waveform = s2st_model(audio_input, tgt_lang=TGT_LANG) # S2ST model also returns waveform |
|
print(f"{lang}:{text}") |
|
torchaudio.save(f"{OUTPUT_FOLDER}/{lang}.wav", waveform.unsqueeze(0), sample_rate=16000) # Save output waveform to local file |
|
``` |
|
|
|
Also running the exported model doesn't need python runtime. For example, you could load this model in C++ following [this tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html), or building your own on-device applications similar to [this example](https://github.com/pytorch/ios-demo-app/tree/master/SpeechRecognition) |
|
|
|
# Citation |
|
If you use SeamlessM4T in your work or any models/datasets/artifacts published in SeamlessM4T, please cite : |
|
|
|
```bibtex |
|
@article{seamlessm4t2023, |
|
title={SeamlessM4T—Massively Multilingual \& Multimodal Machine Translation}, |
|
author={{Seamless Communication}, Lo\"{i}c Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-juss\`{a} \footnotemark[3], Onur \,{C}elebi,Maha Elbayad,Cynthia Gao, Francisco Guzm\'an, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang}, |
|
journal={ArXiv}, |
|
year={2023} |
|
} |
|
``` |
|
# License |
|
|
|
seamless_communication is CC-BY-NC 4.0 licensed |