Instructions to use smaram68/aws-ec2-pricing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use smaram68/aws-ec2-pricing with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T") model = PeftModel.from_pretrained(base_model, "smaram68/aws-ec2-pricing") - Transformers
How to use smaram68/aws-ec2-pricing with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="smaram68/aws-ec2-pricing")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("smaram68/aws-ec2-pricing", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use smaram68/aws-ec2-pricing with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "smaram68/aws-ec2-pricing" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smaram68/aws-ec2-pricing", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/smaram68/aws-ec2-pricing
- SGLang
How to use smaram68/aws-ec2-pricing with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "smaram68/aws-ec2-pricing" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smaram68/aws-ec2-pricing", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "smaram68/aws-ec2-pricing" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smaram68/aws-ec2-pricing", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use smaram68/aws-ec2-pricing with Docker Model Runner:
docker model run hf.co/smaram68/aws-ec2-pricing
TinyLlama-1.1B EC2 Instance Q&A (LoRA Adapter)
A LoRA adapter that fine-tunes TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
to answer factual questions about AWS EC2 instance specifications โ API names,
compute family, memory, vCPU counts, and on-demand hourly pricing.
Model Details
Model Description
This is a parameter-efficient (PEFT/LoRA) adapter, not a standalone model. It must be loaded on top of the TinyLlama-1.1B base model.
- Developed by: Independent fine-tune
- Model type: Causal language model adapter (LoRA)
- Language(s): English
- License: Apache 2.0 (inherits from TinyLlama)
- Finetuned from model:
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
Uses
Direct Use
Answering structured factual questions about AWS EC2 instances, e.g.:
- "What is the API name for A1 Large?"
- "How much memory does a1.2xlarge have?"
- "What is the on-demand hourly price for a1.metal?"
- "Which compute family is a1.medium part of?"
Downstream Use
Suitable as a starting point for:
- Domain-specific AWS documentation assistants
- Demos of LoRA fine-tuning on tabular-derived Q&A data
- Further fine-tuning with reasoning or comparison-style questions
Out-of-Scope Use
- General-purpose conversational AI
- Reasoning, comparison, or arithmetic over EC2 instances ("which is cheaper?", "sort by memory") โ the training data does not cover these patterns
- Real-time AWS pricing (training data is a static snapshot; AWS prices change)
- Any safety-critical or compliance decision based on the model's output
Bias, Risks, and Limitations
- Static data. Pricing and specs were captured from a single snapshot. Real AWS pricing varies by region, billing model, and time, and may be outdated.
- Hallucination on unseen instances. The model only saw the EC2 rows present in training; queries about unfamiliar instance types may return plausible-looking but incorrect answers.
- Template-bound generalization. Training used 21 fixed question templates per EC2 row. Paraphrased or out-of-template questions may degrade quality.
- Inherited limitations. The base TinyLlama-1.1B has limited reasoning and factual breadth compared to larger models. This adapter does not change that.
Recommendations
Verify any price- or capacity-sensitive output against AWS's official pricing page before acting on it. Treat outputs as suggestions, not authoritative answers.
How to Get Started with the Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE_ID = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
ADAPTER = "./tinyllama-ec2-lora"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
base = AutoModelForCausalLM.from_pretrained(
BASE_ID,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
prompt = "Question: What is the API name for A1 Large?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=120,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 15