pffaundez
/

trueparagraph.ai-ELECTRA

@@ -6,88 +6,48 @@ tags:
 model-index:
 - name: trueparagraph.ai-ELECTRA
   results: []
-language:
-- en
-metrics:
-- accuracy
-pipeline_tag: text-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/659ee7cec0c53b7cb5c0afea/1LoHRRtIawlqdVameWeLu.png)
 # trueparagraph.ai-ELECTRA
-This model is a fine-tuned version of [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator) on the "16K-trueparagraph-STEM" dataset.
 ## Model description
-ELECTRA is a transformer-based model pre-trained using a novel approach called "Replaced Token Detection". The model is pre-trained to distinguish "real" input tokens from "fake" input tokens generated by another neural network. This fine-tuned version of ELECTRA is specifically trained on paragraphs from the STEM domain to detect AI-generated text.
-Key characteristics:
-- **Architecture**: Transformer-based model
-- **Pre-training objective**: Replaced Token Detection
-- **Fine-tuning objective**: Binary classification (Human-written vs AI-generated)
 ## Intended uses & limitations
-### Intended uses
-- **AI Text Detection**: Identifying paragraphs in the STEM domain that are generated by AI versus those written by humans.
-- **Educational Tools**: Assisting educators in detecting AI-generated content in academic submissions.
-- **Research**: Analyzing the effectiveness of AI-generated content detection in STEM-related texts.
-### Limitations
-- **Domain Specificity**: The model is fine-tuned specifically on STEM paragraphs and may not perform as well on texts from other domains.
-- **Generalization**: While the model is effective at detecting AI-generated text in STEM, it may not generalize well to other types of AI-generated content outside of its training data.
-- **Biases**: The model may inherit biases present in the training data, which could affect its performance and fairness.
 ## Training and evaluation data
-The model was fine-tuned on the "16K-trueparagraph-STEM" dataset, which consists of 16,000 paragraphs from various STEM domains. The dataset includes both human-written and AI-generated paragraphs to provide a balanced training set for the model.
-### Dataset Details
-- **Size**: 16,000 paragraphs
-- **Sources**: Academic papers, research articles, and other STEM-related documents.
-- **Balance**: Approximately 50% human-written paragraphs and 50% AI-generated paragraphs.
 ## Training procedure
-### Preprocessing
-- **Tokenization**: Texts were tokenized using the ELECTRA tokenizer.
-- **Truncation/Padding**: All inputs were truncated or padded to a maximum length of 512 tokens.
-### Hyperparameters
-- **Optimizer**: AdamW
-- **Learning Rate**: 5e-5
-- **Batch Size**: 16
-- **Number of Epochs**: 3
-### Training
-- **Loss Function**: Binary Cross-Entropy Loss
-- **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score, ROC-AUC
-### Hardware
-- **Environment**: Training was conducted on a single NVIDIA Tesla V100 GPU.
-- **Training Time**: Approximately 4 hours.
-### Evaluation
-- The model was evaluated on a hold-out validation set consisting of 10% of the total dataset.
-- **Validation Results**:
-  - **Accuracy**: 0.93
-  - **Precision**: 0.90
-  - **Recall**: 0.98
-  - **F1-Score**: 0.94
-  - **ROC-AUC**: 0.93
-### Post-processing
-- The final model weights were saved and uploaded to Hugging Face Model Hub.
-- A model card was created to document the training and evaluation processes, intended uses, and limitations of the model.
 ### Framework versions
 - Transformers 4.42.4
 - Pytorch 2.3.1+cu121
 - Datasets 2.20.0
-- Tokenizers 0.19.1

 model-index:
 - name: trueparagraph.ai-ELECTRA
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # trueparagraph.ai-ELECTRA
+This model is a fine-tuned version of [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator) on the None dataset.
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 16
+- eval_batch_size: 16
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 5
+### Training results
 ### Framework versions
 - Transformers 4.42.4
 - Pytorch 2.3.1+cu121
 - Datasets 2.20.0
+- Tokenizers 0.19.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3efa0dabcded9064205ee39190f3b629f9e01e8ebd599549ba138d17de32fbc
 size 437959248

 version https://git-lfs.github.com/spec/v1
+oid sha256:7dd9339ff1a4653b727e224507a8e7f867ce6478ef2c5757c01b800dcc5cbd12
 size 437959248

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cd6e8b19d712eb01a8d67dc134b5e8c8ce33fff792732007c903cd3a85a6cef4
 size 5112

 version https://git-lfs.github.com/spec/v1
+oid sha256:d6704151b875d24de420af95d5bf962930aea9d7eb4133d1a6e754d31c7a7d92
 size 5112