Update README.md
Browse files
README.md
CHANGED
@@ -1,22 +1,145 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# vinallama-legal-chat
|
3 |
+
|
4 |
+
[![Model Card](https://img.shields.io/badge/Hugging%20Face-Model%20Card-blue)](https://huggingface.co/username/vinallama-legal-chat)
|
5 |
+
|
6 |
+
## Description
|
7 |
+
|
8 |
+
**vinallama-legal-chat** is a fine-tuned version of vinallama-2-7b, specifically trained for Vietnamese legal conversations. This model is designed to assist in providing accurate legal advice and information in Vietnamese, making it a valuable tool for legal professionals and individuals seeking legal guidance.
|
9 |
+
|
10 |
+
## Installation
|
11 |
+
|
12 |
+
To use this model, you will need to install the following dependencies:
|
13 |
+
|
14 |
+
```bash
|
15 |
+
pip install transformers
|
16 |
+
pip install torch # or tensorflow depending on your preference
|
17 |
+
```
|
18 |
+
|
19 |
+
## Usage
|
20 |
+
|
21 |
+
Here is how you can load and use the model in your code:
|
22 |
+
|
23 |
+
```python
|
24 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
25 |
+
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained("username/vinallama-legal-chat")
|
27 |
+
model = AutoModelForCausalLM.from_pretrained("username/vinallama-legal-chat")
|
28 |
+
|
29 |
+
# Example usage
|
30 |
+
chat_template = """
|
31 |
+
<<SYS>>
|
32 |
+
Bạn là một chuyên viên tư vấn pháp luật Việt Nam. Bạn có nhiều năm kinh nghiệm và kiến thức chuyên sâu. Bạn sẽ cung cấp câu trả lời về pháp luật, tư vấn luật pháp cho các câu hỏi của User.
|
33 |
+
<</SYS>>
|
34 |
+
## user:
|
35 |
+
Tạm trú là gì?
|
36 |
+
|
37 |
+
## assistant:
|
38 |
+
"""
|
39 |
+
|
40 |
+
inputs = tokenizer(chat_template, return_tensors="pt")
|
41 |
+
outputs = model.generate(**inputs)
|
42 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
43 |
+
|
44 |
+
print(response)
|
45 |
+
```
|
46 |
+
|
47 |
+
### Inference
|
48 |
+
|
49 |
+
Provide example code for performing inference with your model:
|
50 |
+
|
51 |
+
```python
|
52 |
+
# Example inference
|
53 |
+
user_input = "Tạm trú là gì?"
|
54 |
+
chat_template = f"""
|
55 |
+
<<SYS>>
|
56 |
+
Bạn là một chuyên viên tư vấn pháp luật Việt Nam. Bạn có nhiều năm kinh nghiệm và kiến thức chuyên sâu. Bạn sẽ cung cấp câu trả lời về pháp luật, tư vấn luật pháp cho các câu hỏi của User.
|
57 |
+
<</SYS>>
|
58 |
+
## user:
|
59 |
+
{user_input}
|
60 |
+
|
61 |
+
## assistant:
|
62 |
+
"""
|
63 |
+
|
64 |
+
inputs = tokenizer(chat_template, return_tensors="pt")
|
65 |
+
outputs = model.generate(**inputs)
|
66 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
67 |
+
|
68 |
+
print(response)
|
69 |
+
```
|
70 |
+
|
71 |
+
### Training
|
72 |
+
|
73 |
+
If your model can be trained further, provide instructions for training:
|
74 |
+
|
75 |
+
```python
|
76 |
+
# Example training code
|
77 |
+
from transformers import Trainer, TrainingArguments
|
78 |
+
|
79 |
+
training_args = TrainingArguments(
|
80 |
+
output_dir="./results",
|
81 |
+
evaluation_strategy="epoch",
|
82 |
+
per_device_train_batch_size=8,
|
83 |
+
per_device_eval_batch_size=8,
|
84 |
+
num_train_epochs=3,
|
85 |
+
weight_decay=0.01,
|
86 |
+
)
|
87 |
+
|
88 |
+
trainer = Trainer(
|
89 |
+
model=model,
|
90 |
+
args=training_args,
|
91 |
+
train_dataset=train_dataset,
|
92 |
+
eval_dataset=eval_dataset,
|
93 |
+
)
|
94 |
+
|
95 |
+
trainer.train()
|
96 |
+
```
|
97 |
+
|
98 |
+
## Training Details
|
99 |
+
|
100 |
+
### Training Data
|
101 |
+
|
102 |
+
The model was fine-tuned on a dataset of Vietnamese legal conversations. This dataset includes a variety of legal questions and answers, covering a wide range of legal topics to ensure comprehensive legal advice.
|
103 |
+
|
104 |
+
### Training Procedure
|
105 |
+
|
106 |
+
The model was fine-tuned using a standard training approach, optimizing for accuracy and relevance in legal responses. Training was conducted on [describe hardware, e.g., GPUs, TPUs] over [number of epochs] epochs with [any relevant hyperparameters].
|
107 |
+
|
108 |
+
## Evaluation
|
109 |
+
|
110 |
+
### Metrics
|
111 |
+
|
112 |
+
The model was evaluated using the following metrics:
|
113 |
+
|
114 |
+
- **Accuracy**: X%
|
115 |
+
- **Relevance**: Y%
|
116 |
+
- **Comprehensiveness**: Z%
|
117 |
+
|
118 |
+
### Comparison
|
119 |
+
|
120 |
+
The performance of vinallama-legal-chat was benchmarked against other legal advice models, demonstrating superior accuracy and relevance in the Vietnamese legal domain.
|
121 |
+
|
122 |
+
## Limitations and Biases
|
123 |
+
|
124 |
+
While vinallama-legal-chat is highly effective, it may have limitations in the following areas:
|
125 |
+
- It may not be up-to-date with the latest legal changes.
|
126 |
+
- There may be biases present in the training data that could affect responses.
|
127 |
+
|
128 |
+
## How to Contribute
|
129 |
+
|
130 |
+
We welcome contributions! Please see our [contributing guidelines](link_to_contributing_guidelines) for more information on how to contribute to this project.
|
131 |
+
|
132 |
+
## License
|
133 |
+
|
134 |
+
This model is licensed under the [MIT License](LICENSE).
|
135 |
+
|
136 |
+
## Acknowledgements
|
137 |
+
|
138 |
+
We would like to thank the contributors and the creators of the datasets used for training this model.
|
139 |
+
```
|
140 |
+
|
141 |
+
### Tips for Completing the Template
|
142 |
+
|
143 |
+
1. **Replace placeholders** (like `username`, `training data`, `evaluation metrics`) with your actual data.
|
144 |
+
2. **Include any additional information** specific to your model or training process.
|
145 |
+
3. **Keep the document updated** as the model evolves or more information becomes available.
|