Instructions to use smaram68/aws-ec2-pricing with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use smaram68/aws-ec2-pricing with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T")
model = PeftModel.from_pretrained(base_model, "smaram68/aws-ec2-pricing")

Transformers

How to use smaram68/aws-ec2-pricing with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="smaram68/aws-ec2-pricing")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("smaram68/aws-ec2-pricing", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use smaram68/aws-ec2-pricing with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "smaram68/aws-ec2-pricing"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "smaram68/aws-ec2-pricing",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/smaram68/aws-ec2-pricing

SGLang

How to use smaram68/aws-ec2-pricing with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "smaram68/aws-ec2-pricing" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "smaram68/aws-ec2-pricing",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "smaram68/aws-ec2-pricing" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "smaram68/aws-ec2-pricing",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use smaram68/aws-ec2-pricing with Docker Model Runner:
```
docker model run hf.co/smaram68/aws-ec2-pricing
```

TinyLlama-1.1B EC2 Instance Q&A (LoRA Adapter)

A LoRA adapter that fine-tunes TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T to answer factual questions about AWS EC2 instance specifications — API names, compute family, memory, vCPU counts, and on-demand hourly pricing.

Model Details

Model Description

This is a parameter-efficient (PEFT/LoRA) adapter, not a standalone model. It must be loaded on top of the TinyLlama-1.1B base model.

Developed by: Independent fine-tune
Model type: Causal language model adapter (LoRA)
Language(s): English
License: Apache 2.0 (inherits from TinyLlama)
Finetuned from model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Uses

Direct Use

Answering structured factual questions about AWS EC2 instances, e.g.:

"What is the API name for A1 Large?"
"How much memory does a1.2xlarge have?"
"What is the on-demand hourly price for a1.metal?"
"Which compute family is a1.medium part of?"

Downstream Use

Suitable as a starting point for:

Domain-specific AWS documentation assistants
Demos of LoRA fine-tuning on tabular-derived Q&A data
Further fine-tuning with reasoning or comparison-style questions

Out-of-Scope Use

General-purpose conversational AI
Reasoning, comparison, or arithmetic over EC2 instances ("which is cheaper?", "sort by memory") — the training data does not cover these patterns
Real-time AWS pricing (training data is a static snapshot; AWS prices change)
Any safety-critical or compliance decision based on the model's output

Bias, Risks, and Limitations

Static data. Pricing and specs were captured from a single snapshot. Real AWS pricing varies by region, billing model, and time, and may be outdated.
Hallucination on unseen instances. The model only saw the EC2 rows present in training; queries about unfamiliar instance types may return plausible-looking but incorrect answers.
Template-bound generalization. Training used 21 fixed question templates per EC2 row. Paraphrased or out-of-template questions may degrade quality.
Inherited limitations. The base TinyLlama-1.1B has limited reasoning and factual breadth compared to larger models. This adapter does not change that.

Recommendations

Verify any price- or capacity-sensitive output against AWS's official pricing page before acting on it. Treat outputs as suggestions, not authoritative answers.

How to Get Started with the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_ID = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
ADAPTER = "./tinyllama-ec2-lora"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
base = AutoModelForCausalLM.from_pretrained(
    BASE_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

prompt = "Question: What is the API name for A1 Large?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=120,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Downloads last month: 15

Model tree for smaram68/aws-ec2-pricing

Base model

TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Adapter

(95)

this model