tum-nlp
/

IDMGSP-Galactica-TRAIN

@@ -40,7 +40,7 @@ widget:
   example_title: "Example real"
 ---
-# Model Card for Model ID
 A fine-tuned Galactica model to detect machine-generated scientific papers based on their abstract, introduction, and conclusion.
@@ -58,13 +58,12 @@ A fine-tuned Galactica model to detect machine-generated scientific papers based
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** Galactica
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/qwenzo/-IDMGSP
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -72,15 +71,12 @@ A fine-tuned Galactica model to detect machine-generated scientific papers based
 ### Direct Use
-```{python}
 from transformers import AutoTokenizer, OPTForSequenceClassification, pipeline
 model = OPTForSequenceClassification.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 tokenizer = AutoTokenizer.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 reader = pipeline("text-classification", model=model, tokenizer = tokenizer)
 reader(
 '''
 Abstract:
@@ -116,10 +112,6 @@ Conclusion:
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
@@ -130,9 +122,12 @@ Use the code below to get started with the model.
 ### Training Data
-<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
@@ -145,7 +140,7 @@ Use the code below to get started with the model.
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]

   example_title: "Example real"
 ---
+# Model Card for a fine-tuned Galactica model for detecting scientific papers
 A fine-tuned Galactica model to detect machine-generated scientific papers based on their abstract, introduction, and conclusion.
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** Galactica
+### Model Sources
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/qwenzo/-IDMGSP
+- **Paper:** [More Information Needed]
 ## Uses
 ### Direct Use
+```python
 from transformers import AutoTokenizer, OPTForSequenceClassification, pipeline
 model = OPTForSequenceClassification.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 tokenizer = AutoTokenizer.from_pretrained("tum-nlp/IDMGSP-Galactica-TRAIN")
 reader = pipeline("text-classification", model=model, tokenizer = tokenizer)
 reader(
 '''
 Abstract:
 ### Recommendations
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 ### Training Data
+The provided table displays the sample counts from each source utilized in constructing the training dataset.
+The dataset could be found in https://huggingface.co/datasets/tum-nlp/IDMGSP.
+| Dataset                      | arXiv (real) | ChatGPT (fake) | GPT-2 (fake) | SCIgen (fake) | Galactica (fake) | GPT-3 (fake) |
+|------------------------------|--------------|----------------|--------------|----------------|------------------|--------------|
+| Standard train (TRAIN)       | 8k           | 2k             | 2k           | 2k             | 2k               | -            |
 ### Training Procedure
 #### Training Hyperparameters
+[More Information Needed]
 #### Speeds, Sizes, Times [optional]