qwenzoo commited on
Commit
c6ca08f
1 Parent(s): 1af0b4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -15
README.md CHANGED
@@ -40,7 +40,7 @@ widget:
40
 
41
  example_title: "Example real"
42
  ---
43
- # Model Card for Model ID
44
 
45
  A fine-tuned Galactica model to detect machine-generated scientific papers based on their abstract, introduction, and conclusion.
46
 
@@ -58,13 +58,12 @@ A fine-tuned Galactica model to detect machine-generated scientific papers based
58
  - **License:** [More Information Needed]
59
  - **Finetuned from model [optional]:** Galactica
60
 
61
- ### Model Sources [optional]
62
 
63
  <!-- Provide the basic links for the model. -->
64
 
65
  - **Repository:** https://github.com/qwenzo/-IDMGSP
66
- - **Paper [optional]:** [More Information Needed]
67
- - **Demo [optional]:** [More Information Needed]
68
 
69
  ## Uses
70
 
@@ -72,15 +71,12 @@ A fine-tuned Galactica model to detect machine-generated scientific papers based
72
 
73
  ### Direct Use
74
 
75
- ```{python}
76
  from transformers import AutoTokenizer, OPTForSequenceClassification, pipeline
77
 
78
  model = OPTForSequenceClassification.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
79
-
80
  tokenizer = AutoTokenizer.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
81
-
82
  reader = pipeline("text-classification", model=model, tokenizer = tokenizer)
83
-
84
  reader(
85
  '''
86
  Abstract:
@@ -116,10 +112,6 @@ Conclusion:
116
 
117
  ### Recommendations
118
 
119
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
120
-
121
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
122
-
123
  ## How to Get Started with the Model
124
 
125
  Use the code below to get started with the model.
@@ -130,9 +122,12 @@ Use the code below to get started with the model.
130
 
131
  ### Training Data
132
 
133
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
134
 
135
- [More Information Needed]
 
 
136
 
137
  ### Training Procedure
138
 
@@ -145,7 +140,7 @@ Use the code below to get started with the model.
145
 
146
  #### Training Hyperparameters
147
 
148
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
149
 
150
  #### Speeds, Sizes, Times [optional]
151
 
 
40
 
41
  example_title: "Example real"
42
  ---
43
+ # Model Card for a fine-tuned Galactica model for detecting scientific papers
44
 
45
  A fine-tuned Galactica model to detect machine-generated scientific papers based on their abstract, introduction, and conclusion.
46
 
 
58
  - **License:** [More Information Needed]
59
  - **Finetuned from model [optional]:** Galactica
60
 
61
+ ### Model Sources
62
 
63
  <!-- Provide the basic links for the model. -->
64
 
65
  - **Repository:** https://github.com/qwenzo/-IDMGSP
66
+ - **Paper:** [More Information Needed]
 
67
 
68
  ## Uses
69
 
 
71
 
72
  ### Direct Use
73
 
74
+ ```python
75
  from transformers import AutoTokenizer, OPTForSequenceClassification, pipeline
76
 
77
  model = OPTForSequenceClassification.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 
78
  tokenizer = AutoTokenizer.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 
79
  reader = pipeline("text-classification", model=model, tokenizer = tokenizer)
 
80
  reader(
81
  '''
82
  Abstract:
 
112
 
113
  ### Recommendations
114
 
 
 
 
 
115
  ## How to Get Started with the Model
116
 
117
  Use the code below to get started with the model.
 
122
 
123
  ### Training Data
124
 
125
+ The provided table displays the sample counts from each source utilized in constructing the training dataset.
126
+ The dataset could be found in https://huggingface.co/datasets/tum-nlp/IDMGSP.
127
 
128
+ | Dataset | arXiv (real) | ChatGPT (fake) | GPT-2 (fake) | SCIgen (fake) | Galactica (fake) | GPT-3 (fake) |
129
+ |------------------------------|--------------|----------------|--------------|----------------|------------------|--------------|
130
+ | Standard train (TRAIN) | 8k | 2k | 2k | 2k | 2k | - |
131
 
132
  ### Training Procedure
133
 
 
140
 
141
  #### Training Hyperparameters
142
 
143
+ [More Information Needed]
144
 
145
  #### Speeds, Sizes, Times [optional]
146