--- language: - et - en pipeline_tag: conversational --- # LLammas-translate 🐑 Llama-2-7B finetuned in three stages: 1. 5B tokens of CulturaX (75% Estonain, 25% English) 2. 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions, 25% of the examples are given in opposite direction (Estonian->Englih) 3. Alpaca-cleaned, Alpaca-est, OASST1 top-1 English conversations, CoT and FLAN-V2 following open-instruct (both 10,000), WMT18 English-Estonian translation development data (as documents), general MTee validation English-Estonian held-out data Alpaca-est is an instruction dataset generated for Estonian with *gpt-3.5-turbo-0613*, following Alpaca. Using the model in a conversational pipeline: ``` from transformers import pipeline, Conversation import torch pipe = pipeline("conversational", model="tartuNLP/Llammas", torch_dtype=torch.bfloat16, device_map="auto") messages = [ {"role": "user", "content": "Tere!"}, {"role": "assistant", "content": "Tere! Kas saaksin teid kuidagi aidata?"}, {"role": "user", "content": "Kuidas alustada kirja kirjutamist?"} ] conversation = Conversation(messages) conversation = pipe(conversation) ``` Conversational format: ``` <|user|> Tere! <|assistant|> Tere! Kas saaksin teid kuidagi aidata? <|user|> Kuidas alustada kirja kirjutamist? <|assistant|> Kirja kirjutamiseks alustage tervitusega, nĂ€iteks "Tere!" vĂ”i "Tere hommikust!". SeejĂ€rel tutvustage ennast ja mainige, kellega kirjutate. Kirjeldage oma mĂ”tteid vĂ”i kĂŒsimusi, mida soovite arutada. LĂ”petage kiri viisakalt, nĂ€iteks "TĂ€nan teid tĂ€helepanu eest!" vĂ”i "Parimate soovidega!"