Text Generation
Transformers
Safetensors
phi
custom_code
text-generation-inference
4-bit precision
gptq
Instructions to use ai-dive/phi-2_GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ai-dive/phi-2_GPTQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ai-dive/phi-2_GPTQ", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ai-dive/phi-2_GPTQ", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("ai-dive/phi-2_GPTQ", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ai-dive/phi-2_GPTQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ai-dive/phi-2_GPTQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-dive/phi-2_GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ai-dive/phi-2_GPTQ
- SGLang
How to use ai-dive/phi-2_GPTQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ai-dive/phi-2_GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-dive/phi-2_GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ai-dive/phi-2_GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ai-dive/phi-2_GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ai-dive/phi-2_GPTQ with Docker Model Runner:
docker model run hf.co/ai-dive/phi-2_GPTQ
| { | |
| "_name_or_path": "microsoft/phi-2", | |
| "architectures": [ | |
| "PhiForCausalLM" | |
| ], | |
| "attention_dropout": 0.0, | |
| "auto_map": { | |
| "AutoConfig": "microsoft/phi-2--configuration_phi.PhiConfig", | |
| "AutoModelForCausalLM": "microsoft/phi-2--modeling_phi.PhiForCausalLM" | |
| }, | |
| "bos_token_id": 50256, | |
| "embd_pdrop": 0.0, | |
| "eos_token_id": 50256, | |
| "hidden_act": "gelu_new", | |
| "hidden_size": 2560, | |
| "initializer_range": 0.02, | |
| "intermediate_size": 10240, | |
| "layer_norm_eps": 1e-05, | |
| "max_position_embeddings": 2048, | |
| "model_type": "phi", | |
| "num_attention_heads": 32, | |
| "num_hidden_layers": 32, | |
| "num_key_value_heads": 32, | |
| "partial_rotary_factor": 0.4, | |
| "qk_layernorm": false, | |
| "quantization_config": { | |
| "batch_size": 1, | |
| "bits": 4, | |
| "block_name_to_quantize": "model.layers", | |
| "damp_percent": 0.1, | |
| "dataset": [ | |
| "Human: I had a dream about us last night. It was so real, I woke up smiling.\nAssistant: MSG1: I had a dream about us last night. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: It was so real, I woke up smiling. \\| DELAY2: 3 \\| DELAY_REASON2: building intrigue", | |
| "Human: I'm at our favorite caf\u00e9 and it feels empty without you here. Missing you.\nAssistant: MSG1: I'm at our favorite caf\u00e9 right now. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: It feels empty without you here. Missing you. \\| DELAY2: 4 \\| DELAY_REASON2: expressing feelings", | |
| "Human: You won't believe what just happened at work! It's absolutely crazy.\nAssistant: MSG1: You won't believe what just happened at work! \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: It's absolutely crazy. \\| DELAY2: 3 \\| DELAY_REASON2: building excitement", | |
| "Human: Remember that movie we talked about? Guess what, I got tickets for us!\nAssistant: MSG1: Remember that movie we talked about? \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Guess what, I got tickets for us! \\| DELAY2: 4 \\| DELAY_REASON2: revealing a surprise", | |
| "Human: Cooking dinner tonight, wish you were here to taste test. \ud83c\udf72\nAssistant: MSG1: Cooking dinner tonight. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Wish you were here to taste test. \ud83c\udf72 \\| DELAY2: 3 \\| DELAY_REASON2: sharing a moment", | |
| "Human: Just passed by that bookstore you love. Made me think of you immediately.\nAssistant: MSG1: Just passed by that bookstore you love. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Made me think of you immediately. \\| DELAY2: 4 \\| DELAY_REASON2: conveying affection", | |
| "Human: Had a tough day. Really could use one of your hugs right now.\nAssistant: MSG1: Had a tough day. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Really could use one of your hugs right now. \\| DELAY2: 5 \\| DELAY_REASON2: seeking comfort", | |
| "Human: You know what? I'm really lucky to have you in my life.\nAssistant: MSG1: You know what? \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: I'm really lucky to have you in my life. \\| DELAY2: 3 \\| DELAY_REASON2: expressing gratitude and love", | |
| "Human: Watching our show without you just isn't the same. Can't wait to watch it together again.\nAssistant: MSG1: Watching our show without you. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Just isn't the same. Can't wait to watch it together again. \\| DELAY2: 5 \\| DELAY_REASON2: sharing experiences and anticipation", | |
| "Human: I was thinking about our last trip, remember that sunset on the beach? It was so beautiful, just like you. It made me realize how much I enjoy our adventures together. Can't wait for our next one.\nAssistant: MSG1: I was thinking about our last trip, remember that sunset on the beach? \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: It was so beautiful, just like you. \\| DELAY2: 3 \\| DELAY_REASON2: sharing a memory\nMSG3: It made me realize how much I enjoy our adventures together. \\| DELAY3: 6 \\| DELAY_REASON3: expressing feelings\nMSG4: Can't wait for our next one. \\| DELAY4: 3 \\| DELAY_REASON4: looking forward to future plans", | |
| "Human: Just saw a shooting star and made a wish for us.\nAssistant: MSG1: Just saw a shooting star and made a wish for us. \\| DELAY1: 0 \\| DELAY_REASON1: first message", | |
| "Human: Your favorite song is playing on the radio, made me smile.\nAssistant: MSG1: Your favorite song is playing on the radio, made me smile. \\| DELAY1: 0 \\| DELAY_REASON1: first message", | |
| "Human: Found that book you mentioned, picking it up for you!\nAssistant: MSG1: Found that book you mentioned, picking it up for you! \\| DELAY1: 0 \\| DELAY_REASON1: first message", | |
| "Human: Hey, how's your day going? Just finished a meeting, thought of calling you. Let's catch up tonight?\nAssistant: MSG1: Hey, how's your day going? \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Just finished a meeting, thought of calling you. \\| DELAY2: 4 \\| DELAY_REASON2: natural conversation flow\nMSG3: Let's catch up tonight? \\| DELAY3: 5 \\| DELAY_REASON3: proposing a plan", | |
| "Human: Woke up missing you more than usual today. What are you up to? Maybe we can meet later.\nAssistant: MSG1: Woke up missing you more than usual today. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: What are you up to? \\| DELAY2: 3 \\| DELAY_REASON2: showing interest\nMSG3: Maybe we can meet later. \\| DELAY3: 6 \\| DELAY_REASON3: suggesting a meeting", | |
| "Human: Hey, just got back from the gym. Feeling pumped!\nAssistant: MSG1: Hey, just got back from the gym. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Feeling pumped! \\| DELAY2: 3 \\| DELAY_REASON2: sharing mood", | |
| "Human: I tried making that dish you love for dinner. Nailed it!\nAssistant: MSG1: I tried making that dish you love for dinner. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Nailed it! \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Stuck in traffic again. This commute is killing me.\nAssistant: MSG1: Stuck in traffic again. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: This commute is killing me. \\| DELAY2: 4 \\| DELAY_REASON2: expressing frustration", | |
| "Human: Just finished reading that book you recommended. Couldn't put it down!\nAssistant: MSG1: Just finished reading that book you recommended. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Couldn't put it down! \\| DELAY2: 3 \\| DELAY_REASON2: sharing enthusiasm", | |
| "Human: Missed my morning coffee today. Feeling groggy.\nAssistant: MSG1: Missed my morning coffee today. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Feeling groggy. \\| DELAY2: 3 \\| DELAY_REASON2: sharing mood", | |
| "Human: Just finished my last exam of the semester. Time to celebrate!\nAssistant: MSG1: Just finished my last exam of the semester. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Time to celebrate! \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Ran into an old friend today. Brought back so many memories.\nAssistant: MSG1: Ran into an old friend today. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Brought back so many memories. \\| DELAY2: 4 \\| DELAY_REASON2: reminiscing", | |
| "Human: Planning a surprise party for your birthday. Can't wait to see your reaction!\nAssistant: MSG1: Planning a surprise party for your birthday. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Can't wait to see your reaction! \\| DELAY2: 4 \\| DELAY_REASON2: anticipating excitement", | |
| "Human: Just got promoted at work! Hard work pays off.\nAssistant: MSG1: Just got promoted at work! \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Hard work pays off. \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Feeling stressed out lately. Need a break.\nAssistant: MSG1: Feeling stressed out lately. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Need a break. \\| DELAY2: 3 \\| DELAY_REASON2: expressing need for relaxation", | |
| "Human: Just finished my morning workout. Feeling energized!\nAssistant: MSG1: Just finished my morning workout. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Feeling energized! \\| DELAY2: 3 \\| DELAY_REASON2: sharing mood", | |
| "Human: Finally got around to organizing my closet. Feels so satisfying.\nAssistant: MSG1: Finally got around to organizing my closet. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Feels so satisfying. \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Traffic is insane today. Already late for my meeting.\nAssistant: MSG1: Traffic is insane today. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Already late for my meeting. \\| DELAY2: 4 \\| DELAY_REASON2: expressing frustration", | |
| "Human: Tried a new recipe for lunch. Turned out better than expected!\nAssistant: MSG1: Tried a new recipe for lunch. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Turned out better than expected! \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Feeling a bit under the weather today. Need some rest.\nAssistant: MSG1: Feeling a bit under the weather today. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Need some rest. \\| DELAY2: 3 \\| DELAY_REASON2: expressing need for relaxation", | |
| "Human: Finished reading a great book last night. Highly recommend it.\nAssistant: MSG1: Finished reading a great book last night. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Highly recommend it. \\| DELAY2: 3 \\| DELAY_REASON2: sharing enthusiasm", | |
| "Human: Attended a fascinating lecture this afternoon. Feeling inspired!\nAssistant: MSG1: Attended a fascinating lecture this afternoon. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Feeling inspired! \\| DELAY2: 3 \\| DELAY_REASON2: sharing mood", | |
| "Human: Can't believe it's already Friday. Any plans for the weekend?\nAssistant: MSG1: Can't believe it's already Friday. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Any plans for the weekend? \\| DELAY2: 4 \\| DELAY_REASON2: natural conversation flow", | |
| "Human: Finished a big project at work. Time to celebrate!\nAssistant: MSG1: Finished a big project at work. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Time to celebrate! \\| DELAY2: 3 \\| DELAY_REASON2: sharing achievement", | |
| "Human: Just signed up for a new fitness class. Excited to try it out!\nAssistant: MSG1: Just signed up for a new fitness class. \\| DELAY1: 0 \\| DELAY_REASON1: first message\nMSG2: Excited to try it out! \\| DELAY2: 3 \\| DELAY_REASON2: sharing anticipation" | |
| ], | |
| "desc_act": false, | |
| "disable_exllama": true, | |
| "group_size": 128, | |
| "model_seqlen": 4096, | |
| "module_name_preceding_first_block": [ | |
| "model.embed_tokens", | |
| "model.embed_dropout" | |
| ], | |
| "pad_token_id": null, | |
| "quant_method": "gptq", | |
| "sym": true, | |
| "true_sequential": true, | |
| "use_cuda_fp16": true | |
| }, | |
| "resid_pdrop": 0.1, | |
| "rope_scaling": null, | |
| "rope_theta": 10000.0, | |
| "tie_word_embeddings": false, | |
| "torch_dtype": "float16", | |
| "transformers_version": "4.39.0.dev0", | |
| "use_cache": true, | |
| "vocab_size": 51200 | |
| } | |