Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ _This is a Welsh version of the [ALMA](https://github.com/fe1ixxu/ALMA) LLM-base
|
|
15 |
Mae'r model LLM yn seiliedig ar Lama-2-13B, gyda hyfforddiant parhaus ar ddata Gymreig [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) am 3 Epoch
|
16 |
ac yna hyfforddiant cywrain pellach ar ddata Cofnod y Cynulliad a ddarparir gan [TechIaith](https://huggingface.co/datasets/techiaith/cofnodycynulliad_en-cy).
|
17 |
|
18 |
-
Mae fersiwn sydd wedi ei gywasgu i 4.0bpw er mwyn llwytho mewn cof GPU o 10GB ar gael [yma](https://huggingface.co/BangorAI/ALMA-Cymraeg-13B-0.1-4.0bpw-exl2).
|
19 |
|
20 |
### Fformat Sgwrs
|
21 |
|
@@ -34,6 +34,32 @@ Cyfieithwch y testun Saesneg canlynol i'r Gymraeg.
|
|
34 |
#### Esiampl
|
35 |
|
36 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
```
|
38 |
|
39 |
## Hawlfraint
|
|
|
15 |
Mae'r model LLM yn seiliedig ar Lama-2-13B, gyda hyfforddiant parhaus ar ddata Gymreig [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) am 3 Epoch
|
16 |
ac yna hyfforddiant cywrain pellach ar ddata Cofnod y Cynulliad a ddarparir gan [TechIaith](https://huggingface.co/datasets/techiaith/cofnodycynulliad_en-cy).
|
17 |
|
18 |
+
Mae fersiwn cyflymach sydd wedi ei gywasgu i 4.0bpw er mwyn llwytho mewn cof GPU o 10GB ar gael [yma](https://huggingface.co/BangorAI/ALMA-Cymraeg-13B-0.1-4.0bpw-exl2).
|
19 |
|
20 |
### Fformat Sgwrs
|
21 |
|
|
|
34 |
#### Esiampl
|
35 |
|
36 |
```python
|
37 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
38 |
+
|
39 |
+
device = "cuda"
|
40 |
+
|
41 |
+
model = AutoModelForCausalLM.from_pretrained("./trosi13b", torch_dtype=torch.float16, load_in_8bit=True)
|
42 |
+
tokenizer = AutoTokenizer.from_pretrained("./trosi13b")
|
43 |
+
|
44 |
+
prompt = """Cyfieithwch y testun Saesneg canlynol i'r Gymraeg.
|
45 |
+
### Saesneg:
|
46 |
+
For the first time, GPs no longer have to physically print, sign and hand a green paper prescription form to the patient or wait for it to be taken to the pharmacy. Instead, the prescription is sent electronically from the surgery via the IT system to the patient’s chosen pharmacy - even without the patient needing to visit the surgery to pick up a repeat prescription form.
|
47 |
+
|
48 |
+
### Cymraeg:
|
49 |
+
"""
|
50 |
+
|
51 |
+
model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
|
52 |
+
|
53 |
+
generated_ids = model.generate(**model_inputs,
|
54 |
+
eos_token_id=tokenizer.eos_token_id,
|
55 |
+
top_k=90,
|
56 |
+
top_p=1.0,
|
57 |
+
temperature=0.3,
|
58 |
+
repetition_penalty=1.2,
|
59 |
+
max_new_tokens=500,
|
60 |
+
do_sample=True)
|
61 |
+
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])
|
62 |
+
|
63 |
```
|
64 |
|
65 |
## Hawlfraint
|