Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,33 @@ tags:
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
|
20 |
|
21 |
-
|
|
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
|
20 |
|
21 |
+
## Overview
|
22 |
+
The Llama 8B model, finetuned using the Unsloth framework, is designed to convert simple English language, including sentences with spelling mistakes, into factual statements in the formal and precise language used by the European Court of Human Rights (ECHR). This model leverages a multi-step finetuning process to ensure high accuracy and contextual relevance in the legal domain.
|
23 |
|
24 |
+
## Finetuning Steps
|
25 |
+
|
26 |
+
### 1. Summarising Facts While Maintaining Legal Language
|
27 |
+
- **Objective:** To create a summary of the facts from ECHR judgments while preserving the formal and precise language used by the court.
|
28 |
+
- **Process:** The model was initially trained on a dataset of ECHR judgments, focusing on extracting and summarizing the key facts. This step ensured that the model could accurately capture the legal nuances and formal tone of the ECHR.
|
29 |
+
|
30 |
+
### 2. Creating Simple Language Conversions
|
31 |
+
- **Objective:** To generate five distinct simple English versions of each summarized fact, including sentences with common spelling mistakes.
|
32 |
+
- **Process:** For each factual summary, the model generated five different simple English versions. These versions were designed to be more accessible and understandable to a general audience, while still capturing the essential information. The inclusion of spelling mistakes in some of these versions was intentional to simulate real-world user input.
|
33 |
+
|
34 |
+
### 3. Synthetic Dataset Creation
|
35 |
+
- **Objective:** To create a comprehensive dataset for training the model to convert simple language into legal facts.
|
36 |
+
- **Process:** Each simple English version (including those with spelling mistakes) was paired with its corresponding legal summary. This resulted in a synthetic dataset where each factual summary was associated with multiple simple language versions. This dataset was then used to further finetune the model.
|
37 |
+
|
38 |
+
## Model Capabilities
|
39 |
+
- **Input:** Simple English sentences, including those with spelling mistakes.
|
40 |
+
- **Output:** Factual statements in the formal and precise language used by the ECHR.
|
41 |
+
- **Accuracy:** High accuracy in converting simple language to legal facts, maintaining the formal tone and legal context.
|
42 |
+
- **Flexibility:** Ability to handle a variety of input styles, including informal language and common spelling errors.
|
43 |
+
|
44 |
+
## Use Cases
|
45 |
+
- **Legal Documentation:** Assisting legal professionals in converting informal client statements into formal legal documents.
|
46 |
+
- **Educational Tools:** Helping students and researchers understand complex legal concepts by providing clear, formal explanations.
|
47 |
+
- **Automated Fact Checking:** Supporting fact-checking processes by converting user inputs into verifiable legal facts.
|
48 |
+
|
49 |
+
## Conclusion
|
50 |
+
The Llama 8B model, finetuned with Unsloth, is a powerful tool for converting simple English language, including sentences with spelling mistakes, into the formal and precise language used by the European Court of Human Rights. Through a rigorous multi-step finetuning process, the model ensures high accuracy and contextual relevance, making it a valuable resource in legal and educational settings.
|