Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,95 @@ tags:
|
|
9 |
- papyrology
|
10 |
- epigraphy
|
11 |
- philology
|
12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
- papyrology
|
10 |
- epigraphy
|
11 |
- philology
|
12 |
+
---
|
13 |
+
|
14 |
+
# Papy_1_Llama-3.1-8B-Instruct_date
|
15 |
+
|
16 |
+
This is a fine-tuned version of the Llama-3.1-8B-Instruct model, specialized in assigning a date to Greek documentary papyri. On a test set of 1,856 unseen papyri, its predictions were, on average, 21.7 years away from the actual date spans.
|
17 |
+
|
18 |
+
## Usage
|
19 |
+
|
20 |
+
To run the model, use the following code:
|
21 |
+
|
22 |
+
```python
|
23 |
+
import json
|
24 |
+
from transformers import pipeline, AutoTokenizer, LlamaForCausalLM
|
25 |
+
import torch
|
26 |
+
|
27 |
+
papyrus_edition = """
|
28 |
+
ετουσ τεταρτου αυτοκρατοροσ καισαροσ ουεσπασιανου σεβαστου ------------------
|
29 |
+
ομολογει παυσιριων απολλωνιου του παuσιριωνοσ μητροσ ---------------τωι γεγονοτι αυτωι
|
30 |
+
εκ τησ γενομενησ και μετηλλαχυιασ αυτου γυναικοσ -------------------------
|
31 |
+
απο τησ αυτησ πολεωσ εν αγυιαι συγχωρειν ειναι ----------------------------------
|
32 |
+
--------------------σ αυτωι εξ ησ συνεστιν ------------------------------------
|
33 |
+
----τησ αυτησ γενεασ την υπαρχουσαν αυτωι οικιαν ------------
|
34 |
+
------------------ ---------καὶ αιθριον και αυλη απερ ο υιοσ διοκοροσ --------------------------
|
35 |
+
--------εγραψεν του δ αυτου διοσκορου ειναι ------------------------------------
|
36 |
+
---------- και προ κατενγεγυηται τα δικαια --------------------------------------
|
37 |
+
νησ κατα τουσ τησ χωρασ νομουσ· εαν δε μη ---------------------------------------
|
38 |
+
υπ αυτου τηι του διοσκορου σημαινομενηι -----------------------------------ενοικισμωι του
|
39 |
+
ημισουσ μερουσ τησ προκειμενησ οικιασ --------------------------------- διοσκοροσ την τουτων αποχην
|
40 |
+
---------------------------------------------μηδ υπεναντιον τουτοισ επιτελειν μηδε
|
41 |
+
------------------------------------------------ ανασκευηι κατ αυτησ τιθεσθαι ομολογιαν μηδε
|
42 |
+
----------------------------------- επιτελεσαι η χωρισ του κυρια ειναι τα διομολογημενα
|
43 |
+
παραβαινειν, εκτεινειν δε τον παραβησομενον τωι υιωι διοσκορωι η τοισ παρ αυτου καθ εκαστην
|
44 |
+
εφοδον το τε βλαβοσ και επιτιμον αργυριου δραχμασ 0 και εισ το δημοσιον τασ ισασ και μηθεν
|
45 |
+
ησσον· δ -----ιων ομολογιαν συνεχωρησεν·
|
46 |
+
"""
|
47 |
+
|
48 |
+
model_id = "Ericu950/Papy_1_Llama-3.1-8B-Instruct_date"
|
49 |
+
|
50 |
+
model = LlamaForCausalLM.from_pretrained(
|
51 |
+
model_id,
|
52 |
+
device_map="auto",
|
53 |
+
)
|
54 |
+
|
55 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
56 |
+
|
57 |
+
generation_pipeline = pipeline(
|
58 |
+
"text-generation",
|
59 |
+
model=model,
|
60 |
+
tokenizer=tokenizer,
|
61 |
+
device_map="auto",
|
62 |
+
)
|
63 |
+
|
64 |
+
system_prompt = "Date this papyrus fragment to an exact year!"
|
65 |
+
|
66 |
+
input_messages = [
|
67 |
+
{"role": "system", "content": system_prompt},
|
68 |
+
{"role": "user", "content": papyrus_edition},
|
69 |
+
]
|
70 |
+
|
71 |
+
terminators = [
|
72 |
+
tokenizer.eos_token_id,
|
73 |
+
tokenizer.convert_tokens_to_ids("<|eot_id|>")
|
74 |
+
]
|
75 |
+
|
76 |
+
outputs = generation_pipeline(
|
77 |
+
input_messages,
|
78 |
+
max_new_tokens=4,
|
79 |
+
num_beams=20,
|
80 |
+
num_return_sequences=1,
|
81 |
+
early_stopping=True,
|
82 |
+
)
|
83 |
+
|
84 |
+
beam_contents = []
|
85 |
+
for output in outputs:
|
86 |
+
generated_text = output.get('generated_text', [])
|
87 |
+
for item in generated_text:
|
88 |
+
if item.get('role') == 'assistant':
|
89 |
+
beam_contents.append(item.get('content'))
|
90 |
+
|
91 |
+
real_response = "71 or 72 AD"
|
92 |
+
|
93 |
+
print(f"Year: {real_response}")
|
94 |
+
for i, content in enumerate(beam_contents, start=1):
|
95 |
+
print(f"Suggestion {i}: {content}")
|
96 |
+
```
|
97 |
+
|
98 |
+
You should get this output:
|
99 |
+
```
|
100 |
+
Year: 71 or 72 AD
|
101 |
+
Suggestion 1: 71
|
102 |
+
```
|
103 |
+
|