kanwal-mehreen18 commited on
Commit
d56315c
1 Parent(s): f791334

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +208 -167
README.md CHANGED
@@ -25,185 +25,226 @@ language:
25
  - vi
26
  - zh
27
  ---
28
- # Model Card for Model ID
29
-
30
- <!-- Provide a quick summary of what the model is/does. -->
31
-
32
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
33
 
34
  ## Model Details
35
 
36
  ### Model Description
 
 
 
 
 
37
 
38
- <!-- Provide a longer summary of what this model is. -->
39
-
40
-
41
-
42
- - **Developed by:** [More Information Needed]
43
- - **Funded by [optional]:** [More Information Needed]
44
- - **Shared by [optional]:** [More Information Needed]
45
- - **Model type:** [More Information Needed]
46
- - **Language(s) (NLP):** [More Information Needed]
47
- - **License:** [More Information Needed]
48
- - **Finetuned from model [optional]:** [More Information Needed]
49
-
50
- ### Model Sources [optional]
51
-
52
- <!-- Provide the basic links for the model. -->
53
-
54
- - **Repository:** [More Information Needed]
55
- - **Paper [optional]:** [More Information Needed]
56
- - **Demo [optional]:** [More Information Needed]
57
 
58
  ## Uses
59
-
60
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
61
-
62
- ### Direct Use
63
-
64
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
65
-
66
- [More Information Needed]
67
-
68
- ### Downstream Use [optional]
69
-
70
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
71
-
72
- [More Information Needed]
73
-
74
- ### Out-of-Scope Use
75
-
76
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
77
-
78
- [More Information Needed]
79
-
80
- ## Bias, Risks, and Limitations
81
-
82
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
83
-
84
- [More Information Needed]
85
-
86
- ### Recommendations
87
-
88
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
89
-
90
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
91
-
92
- ## How to Get Started with the Model
93
-
94
- Use the code below to get started with the model.
95
-
96
- [More Information Needed]
97
 
98
  ## Training Details
99
-
100
- ### Training Data
101
-
102
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
103
-
104
- [More Information Needed]
105
-
106
- ### Training Procedure
107
-
108
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
109
-
110
- #### Preprocessing [optional]
111
-
112
- [More Information Needed]
113
-
114
-
115
- #### Training Hyperparameters
116
-
117
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
118
-
119
- #### Speeds, Sizes, Times [optional]
120
-
121
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
122
-
123
- [More Information Needed]
124
 
125
  ## Evaluation
126
 
127
- <!-- This section describes the evaluation protocols and provides the results. -->
128
-
129
  ### Testing Data, Factors & Metrics
130
-
131
- #### Testing Data
132
-
133
- <!-- This should link to a Dataset Card if possible. -->
134
-
135
- [More Information Needed]
136
-
137
- #### Factors
138
-
139
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
140
-
141
- [More Information Needed]
142
-
143
- #### Metrics
144
-
145
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
146
-
147
- [More Information Needed]
148
 
149
  ### Results
150
-
151
- [More Information Needed]
152
-
153
- #### Summary
154
-
155
-
156
-
157
- ## Model Examination [optional]
158
-
159
- <!-- Relevant interpretability work for the model goes here -->
160
-
161
- [More Information Needed]
162
-
163
- ## Technical Specifications [optional]
164
-
165
- ### Model Architecture and Objective
166
-
167
- [More Information Needed]
168
-
169
- ### Compute Infrastructure
170
-
171
- [More Information Needed]
172
-
173
- #### Hardware
174
-
175
- [More Information Needed]
176
-
177
- #### Software
178
-
179
- [More Information Needed]
180
-
181
- ## Citation [optional]
182
-
183
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
184
-
185
- **BibTeX:**
186
-
187
- [More Information Needed]
188
-
189
- **APA:**
190
-
191
- [More Information Needed]
192
-
193
- ## Glossary [optional]
194
-
195
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
196
-
197
- [More Information Needed]
198
-
199
- ## More Information [optional]
200
-
201
- [More Information Needed]
202
-
203
- ## Model Card Authors [optional]
204
-
205
- [More Information Needed]
206
-
207
- ## Model Card Contact
208
-
209
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  - vi
26
  - zh
27
  ---
28
+ # Model Checkpoints for Multilingual Machine-Generated Text Portion Detection
 
 
 
 
29
 
30
  ## Model Details
31
 
32
  ### Model Description
33
+ - Developed by: 1-800-SHARED-TASKS
34
+ - Funded by: Cohere's Research Compute Grant (July 2024)
35
+ - Model type: Transformer-based for multilingual text portion detection
36
+ - Languages (NLP): 23 languages (expanding to 102)
37
+ - License: Non-commercial; derivatives must remain non-commercial with proper attribution
38
 
39
+ ### Model Sources
40
+ - **Code Repository:** [Github Placeholder]
41
+ - **Paper:** [ACL Anthology Placeholder]
42
+ - **Presentation:** [Multi-lingual Machine-Generated Text Portion(s) Detection](https://static1.squarespace.com/static/659ac5de66fdf20e1d607f2e/t/66d977a49597da76b6c260a1/1725527974250/MMGTD-Cohere.pdf)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Uses
45
+ The dataset is suitable for machine-generated text portion detection, token classification tasks, and other linguistic tasks. The methods applied here aim to improve the accuracy of detecting which portions of text are machine-generated, particularly in multilingual contexts. The dataset could be beneficial for research and development in areas like AI-generated text moderation, natural language processing, and understanding the integration of AI in content generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Training Details
48
+ The model was trained on a dataset consisting of approximately 330k text samples from LLMs Command-R-Plus (100k) and Aya-23-35B (230k). The dataset includes 10k samples per language for each LLM, with a distribution of 10% fully human-written texts, 10% entirely machine-generated texts, and 80% mixed cases.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Evaluation
51
 
 
 
52
  ### Testing Data, Factors & Metrics
53
+ The model was evaluated on a multilingual dataset covering 23 languages. Metrics include Accuracy, Precision, Recall, and F1 Score at the word level (character level for Japanese and Chinese).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### Results
56
+ Here are the word-level metrics for each language and ** character-level metrics for Japanese (JPN) and Chinese (ZHO):
57
+
58
+ <table>
59
+ <tr>
60
+ <th>Language</th>
61
+ <th>Accuracy</th>
62
+ <th>Precision</th>
63
+ <th>Recall</th>
64
+ <th>F1 Score</th>
65
+ </tr>
66
+ <tr>
67
+ <td>ARA</td>
68
+ <td>0.923</td>
69
+ <td>0.832</td>
70
+ <td style="background-color: #e0e0e0;">0.992</td>
71
+ <td>0.905</td>
72
+ </tr>
73
+ <tr>
74
+ <td>CES</td>
75
+ <td>0.884</td>
76
+ <td>0.869</td>
77
+ <td style="background-color: #e0e0e0;">0.975</td>
78
+ <td>0.919</td>
79
+ </tr>
80
+ <tr>
81
+ <td>DEU</td>
82
+ <td>0.917</td>
83
+ <td>0.895</td>
84
+ <td style="background-color: #e0e0e0;">0.983</td>
85
+ <td>0.937</td>
86
+ </tr>
87
+ <tr>
88
+ <td>ELL</td>
89
+ <td>0.929</td>
90
+ <td>0.905</td>
91
+ <td style="background-color: #e0e0e0;">0.984</td>
92
+ <td>0.943</td>
93
+ </tr>
94
+ <tr>
95
+ <td>ENG</td>
96
+ <td>0.917</td>
97
+ <td>0.818</td>
98
+ <td style="background-color: #e0e0e0;">0.986</td>
99
+ <td>0.894</td>
100
+ </tr>
101
+ <tr>
102
+ <td>FRA</td>
103
+ <td>0.927</td>
104
+ <td>0.929</td>
105
+ <td style="background-color: #e0e0e0;">0.966</td>
106
+ <td>0.947</td>
107
+ </tr>
108
+ <tr>
109
+ <td>HEB</td>
110
+ <td>0.963</td>
111
+ <td>0.961</td>
112
+ <td style="background-color: #e0e0e0;">0.988</td>
113
+ <td>0.974</td>
114
+ </tr>
115
+ <tr>
116
+ <td>HIN</td>
117
+ <td>0.890</td>
118
+ <td>0.736</td>
119
+ <td style="background-color: #e0e0e0;">0.975</td>
120
+ <td>0.839</td>
121
+ </tr>
122
+ <tr>
123
+ <td>IND</td>
124
+ <td>0.861</td>
125
+ <td>0.794</td>
126
+ <td style="background-color: #e0e0e0;">0.988</td>
127
+ <td>0.881</td>
128
+ </tr>
129
+ <tr>
130
+ <td>ITA</td>
131
+ <td>0.941</td>
132
+ <td>0.906</td>
133
+ <td style="background-color: #e0e0e0;">0.989</td>
134
+ <td>0.946</td>
135
+ </tr>
136
+ <tr>
137
+ <td>JPN**</td>
138
+ <td>0.832</td>
139
+ <td>0.747</td>
140
+ <td style="background-color: #e0e0e0;">0.965</td>
141
+ <td>0.842</td>
142
+ </tr>
143
+ <tr>
144
+ <td>KOR</td>
145
+ <td>0.937</td>
146
+ <td>0.918</td>
147
+ <td style="background-color: #e0e0e0;">0.992</td>
148
+ <td>0.954</td>
149
+ </tr>
150
+ <tr>
151
+ <td>NLD</td>
152
+ <td>0.916</td>
153
+ <td>0.872</td>
154
+ <td style="background-color: #e0e0e0;">0.985</td>
155
+ <td>0.925</td>
156
+ </tr>
157
+ <tr>
158
+ <td>PES</td>
159
+ <td>0.822</td>
160
+ <td>0.668</td>
161
+ <td style="background-color: #e0e0e0;">0.972</td>
162
+ <td>0.792</td>
163
+ </tr>
164
+ <tr>
165
+ <td>POL</td>
166
+ <td>0.903</td>
167
+ <td>0.884</td>
168
+ <td style="background-color: #e0e0e0;">0.986</td>
169
+ <td>0.932</td>
170
+ </tr>
171
+ <tr>
172
+ <td>POR</td>
173
+ <td>0.805</td>
174
+ <td>0.679</td>
175
+ <td style="background-color: #e0e0e0;">0.987</td>
176
+ <td>0.804</td>
177
+ </tr>
178
+ <tr>
179
+ <td>RON</td>
180
+ <td>0.931</td>
181
+ <td>0.924</td>
182
+ <td style="background-color: #e0e0e0;">0.985</td>
183
+ <td>0.953</td>
184
+ </tr>
185
+ <tr>
186
+ <td>RUS</td>
187
+ <td>0.885</td>
188
+ <td>0.818</td>
189
+ <td style="background-color: #e0e0e0;">0.971</td>
190
+ <td>0.888</td>
191
+ </tr>
192
+ <tr>
193
+ <td>SPA</td>
194
+ <td>0.888</td>
195
+ <td>0.809</td>
196
+ <td style="background-color: #e0e0e0;">0.990</td>
197
+ <td>0.890</td>
198
+ </tr>
199
+ <tr>
200
+ <td>TUR</td>
201
+ <td>0.849</td>
202
+ <td>0.735</td>
203
+ <td style="background-color: #e0e0e0;">0.981</td>
204
+ <td>0.840</td>
205
+ </tr>
206
+ <tr>
207
+ <td>UKR</td>
208
+ <td>0.768</td>
209
+ <td>0.637</td>
210
+ <td style="background-color: #e0e0e0;">0.987</td>
211
+ <td>0.774</td>
212
+ </tr>
213
+ <tr>
214
+ <td>VIE</td>
215
+ <td>0.866</td>
216
+ <td>0.757</td>
217
+ <td style="background-color: #e0e0e0;">0.975</td>
218
+ <td>0.853</td>
219
+ </tr>
220
+ <tr>
221
+ <td>ZHO**</td>
222
+ <td>0.803</td>
223
+ <td>0.698</td>
224
+ <td style="background-color: #e0e0e0;">0.976</td>
225
+ <td>0.814</td>
226
+ </tr>
227
+ </table>
228
+
229
+ ## **Authors**
230
+
231
+ **Core Contributors**
232
+
233
+ - Ram Kadiyala [[contact@rkadiyala.com](mailto:contact@rkadiyala.com)]
234
+ - Siddartha Pullakhandam [[pullakh2@uwm.edu](mailto:pullakh2@uwm.edu)]
235
+ - Kanwal Mehreen [[kanwal@traversaal.ai](mailto:kanwal@traversaal.ai)]
236
+ - Ashay Srivastava [[ashays06@umd.edu](mailto:ashays06@umd.edu)]
237
+ - Subhasya TippaReddy [[subhasyat@usf.edu](mailto:subhasyat@usf.edu)]
238
+
239
+
240
+ **Extended Crew**
241
+ - Arvind Reddy Bobbili [[abobbili@cougarnet.uh.edu](mailto:abobbili@cougarnet.uh.edu)]
242
+ - Drishti Sharma [ ]
243
+ - Suraj Chandrashekhar [[stelugar@umd.edu](mailto:stelugar@umd.edu)]
244
+ - Modabbir Adeeb [[madeeb@umd.edu](mailto:madeeb@umd.edu)]
245
+ - Srinadh Vura [ ]
246
+
247
+
248
+ ## **Contact**
249
+
250
+ [![Gmail](https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:contact@rkadiyala.com)