qq8933
/

PPRM-gemma-2-2b-it

PEFT

Safetensors

llama-factory

lora

Generated from Trainer

Model card Files Files and versions Community

qq8933 commited on 8 days ago

Commit

06d5b88

•

1 Parent(s): 2056078

Update README.md

Browse files

Files changed (1) hide show

README.md +92 -15

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - lora
 - generated_from_trainer
 model-index:
-- name: PRM_DPO_GEMMA_ZD_8_18_1
   results: []
 ---
@@ -18,17 +18,97 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the prm_dpo dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -48,9 +128,6 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 1.0
-### Training results
 ### Framework versions

 - lora
 - generated_from_trainer
 model-index:
+- name: PPRM-gemma-2-2b-it
   results: []
 ---
 This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the prm_dpo dataset.
+# Citation
+```
+@article{zhang2024llama,
+  title={LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning},
+  author={Zhang, Di and Wu, Jianbo and Lei, Jingdi and Che, Tong and Li, Jiatong and Xie, Tong and Huang, Xiaoshui and Zhang, Shufei and Pavone, Marco and Li, Yuqiang and others},
+  journal={arXiv preprint arXiv:2410.02884},
+  year={2024}
+}
+@article{zhang2024accessing,
+  title={Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B},
+  author={Zhang, Di and Li, Jiatong and Huang, Xiaoshui and Zhou, Dongzhan and Li, Yuqiang and Ouyang, Wanli},
+  journal={arXiv preprint arXiv:2406.07394},
+  year={2024}
+}
+```
+## Model usage
+`server.py`
+```
+import json
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Initialize FastAPI
+app = FastAPI()
+# Device configuration
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Model and tokenizer loading (as you provided)
+model_name = "google/gemma-2-2b-it"
+lora_checkpoint_path = "qq8933/PPRM-gemma-2-2b-it"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+base_model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map='cuda')
+model = PeftModel.from_pretrained(base_model, lora_checkpoint_path, device_map='cuda')
+yes_token_id = tokenizer.convert_tokens_to_ids("yes")
+no_token_id = tokenizer.convert_tokens_to_ids("no")
+# Request model
+class InputRequest(BaseModel):
+    text: str
+# Predict function
+def predict(qeustion,answer_1,answer_2):
+    prompt_template = """Problem:\n\n{}\n\nFirst Answer:\n\n{}\n\nSecond Answer:\n\n{}\n\nIs First Answer better than Second Answer?\n\n"""
+    input_text = prompt_template.format(qeustion,answer_1,answer_2)
+    input_text = tokenizer.apply_chat_template(
+        [{'role': 'user', 'content': input_text}], tokenize=False, add_generation_prompt=True
+    )
+    inputs = tokenizer(input_text, return_tensors="pt").to(device)
+    with torch.no_grad():
+        generated_outputs = model.generate(
+            **inputs, max_new_tokens=2, output_scores=True, return_dict_in_generate=True
+        )
+        scores = generated_outputs.scores
+        first_token_logits = scores[0]
+        yes_logit = first_token_logits[0, yes_token_id].item()
+        no_logit = first_token_logits[0, no_token_id].item()
+        return {
+            "yes_logit": yes_logit,
+            "no_logit": no_logit,
+            "logit_difference": yes_logit - no_logit
+        }
+# Define API endpoint
+@app.post("/predict")
+async def get_prediction(input_request: InputRequest):
+    payload = json.loads(input_request.text)
+    qeustion,answer_1,answer_2 = payload['qeustion'],payload['answer_1'],payload['answer_2']
+    try:
+        result = predict(qeustion,answer_1,answer_2)
+        return result
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+```
+```
+uvicorn server:app --host 0.0.0.0 --port $MASTER_PORT --workers 1
+```
 ## Training procedure
 - lr_scheduler_type: linear
 - num_epochs: 1.0
 ### Framework versions