EmojiLM

This is a BART model pre-trained on the Text2Emoji dataset to translate emojis into texts.

For instance, "πŸ•πŸ˜" will be translated into "I love pizza".

An example implementation for translation:

from transformers import BartTokenizer, BartForConditionalGeneration

def translate(sentence, **argv):
    inputs = tokenizer(sentence, return_tensors="pt")
    generated_ids = generator.generate(inputs["input_ids"], **argv)
    decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    return decoded

path = "KomeijiForce/bart-base-emojilm-e2t"
tokenizer = BartTokenizer.from_pretrained(path)
generator = BartForConditionalGeneration.from_pretrained(path)

sentence = "πŸ£πŸ±πŸ˜‹"
decoded = translate(sentence, num_beams=4, do_sample=True, max_length=100)
print(decoded)

You will probably get some output like "Sushi is my go-to comfort food."

If you find this model & dataset resource useful, please consider cite our paper:

@article{DBLP:journals/corr/abs-2311-01751,
  author       = {Letian Peng and
                  Zilong Wang and
                  Hang Liu and
                  Zihan Wang and
                  Jingbo Shang},
  title        = {EmojiLM: Modeling the New Emoji Language},
  journal      = {CoRR},
  volume       = {abs/2311.01751},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2311.01751},
  doi          = {10.48550/ARXIV.2311.01751},
  eprinttype    = {arXiv},
  eprint       = {2311.01751},
  timestamp    = {Tue, 07 Nov 2023 18:17:14 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2311-01751.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train KomeijiForce/bart-base-emojilm-e2t