Pawitsapak commited on
Commit
cccb59a
1 Parent(s): be940ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -92
README.md CHANGED
@@ -1,79 +1,44 @@
1
  ---
2
- datasets: []
3
- language: []
4
- library_name: sentence-transformers
 
5
  pipeline_tag: sentence-similarity
6
  tags:
7
- - sentence-transformers
8
- - sentence-similarity
9
- - feature-extraction
10
  widget: []
 
 
 
11
  ---
12
 
13
- # SentenceTransformer
14
 
15
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
16
 
17
- ## Model Details
18
 
19
- ### Model Description
20
- - **Model Type:** Sentence Transformer
21
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
22
- - **Maximum Sequence Length:** 8192 tokens
23
- - **Output Dimensionality:** 1024 tokens
24
- - **Similarity Function:** Cosine Similarity
25
- <!-- - **Training Dataset:** Unknown -->
26
- <!-- - **Language:** Unknown -->
27
- <!-- - **License:** Unknown -->
28
 
29
- ### Model Sources
30
 
31
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
32
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
33
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
34
 
35
- ### Full Model Architecture
 
 
 
36
 
37
- ```
38
- SentenceTransformer(
39
- (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
40
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
41
- (2): Normalize()
42
- )
43
- ```
44
-
45
- ## Usage
46
-
47
- ### Direct Usage (Sentence Transformers)
48
-
49
- First install the Sentence Transformers library:
50
-
51
- ```bash
52
- pip install -U sentence-transformers
53
- ```
54
-
55
- Then you can load this model and run inference.
56
- ```python
57
- from sentence_transformers import SentenceTransformer
58
-
59
- # Download from the 🤗 Hub
60
- model = SentenceTransformer("sentence_transformers_model_id")
61
- # Run inference
62
- sentences = [
63
- 'The weather is lovely today.',
64
- "It's so sunny outside!",
65
- 'He drove to the stadium.',
66
- ]
67
- embeddings = model.encode(sentences)
68
- print(embeddings.shape)
69
- # [3, 1024]
70
-
71
- # Get the similarity scores for the embeddings
72
- similarities = model.similarity(embeddings, embeddings)
73
- print(similarities.shape)
74
- # [3, 3]
75
- ```
76
 
 
 
77
  <!--
78
  ### Direct Usage (Transformers)
79
 
@@ -109,36 +74,8 @@ You can finetune this model on your own dataset.
109
 
110
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
111
  -->
112
-
113
- ## Training Details
114
-
115
- ### Framework Versions
116
- - Python: 3.10.14
117
- - Sentence Transformers: 3.0.1
118
- - Transformers: 4.34.0
119
- - PyTorch: 2.1.0+cu121
120
- - Accelerate: 0.21.0
121
- - Datasets: 2.21.0
122
- - Tokenizers: 0.14.1
123
-
124
  ## Citation
125
 
126
  ### BibTeX
127
-
128
- <!--
129
- ## Glossary
130
-
131
- *Clearly define terms in order to be accessible across audiences.*
132
- -->
133
-
134
- <!--
135
- ## Model Card Authors
136
-
137
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
138
- -->
139
-
140
- <!--
141
- ## Model Card Contact
142
-
143
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
144
- -->
 
1
  ---
2
+ datasets:
3
+ - airesearch/WangchanX-Legal-ThaiCCL-RAG
4
+ language:
5
+ - th
6
  pipeline_tag: sentence-similarity
7
  tags:
8
+ - legal
9
+ - RAG
 
10
  widget: []
11
+ license: mit
12
+ base_model:
13
+ - BAAI/bge-m3
14
  ---
15
 
16
+ ## WangchanX-Legal-ThaiCCL-Retriever: A Thai Legal Text Retriever
17
 
18
+ This model card describes WangchanX-Legal-ThaiCCL-Retriever, a retriever model fine-tuned from the bge-m3 model on the WangchanX-Legal-ThaiCCL-RAG dataset. It is designed to retrieve relevant legal text sections in response to legal questions posed in Thai, specifically focusing on Corporate and Commercial Law (CCL).
19
 
20
+ **Model Details:**
21
 
22
+ * **Base Model:** [bge-m3](https://huggingface.co/BAAI/bge-m3)
23
+ * **Fine-tuned Dataset:** [WangchanX-Legal-ThaiCCL-RAG dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG)
24
+ * **Language:** Thai
25
+ * **Maximum Sequence Length:** 8192 tokens
26
+ * **Output Dimensionality:** 1024 tokens
27
+ * **License:** MIT
 
 
 
28
 
29
+ **WangchanX-Legal-ThaiCCL-RAG**
30
 
31
+ This dataset focuses on supporting Thai legal question-answering systems using Retrieval-Augmented Generation (RAG), focusing on Corporate and Commercial Law.
 
 
32
 
33
+ **Intended Use Cases:**
34
+ This model is designed for use as a retriever model within a larger RAG pipeline.
35
+ * **Legal Question Answering:** Serving as a core component in a larger question-answering system that provides answers to user queries about Thai law.
36
+ * **Legal Information Retrieval:** Enabling efficient retrieval of information from Thai legal texts.
37
 
38
+ <!-- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ This model is designed for use as a retriever model within a larger RAG pipeline. Given a legal question in Thai, it will retrieve the most relevant sections from the Thai CCL corpus. You can integrate this model into your application using the Hugging Face Transformers library.
41
+ -->
42
  <!--
43
  ### Direct Usage (Transformers)
44
 
 
74
 
75
  *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
76
  -->
77
+ <!--
 
 
 
 
 
 
 
 
 
 
 
78
  ## Citation
79
 
80
  ### BibTeX
81
+ -->