Instructions to use aungkomyint/tara1.2-quest with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aungkomyint/tara1.2-quest with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="aungkomyint/tara1.2-quest")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("aungkomyint/tara1.2-quest") model = AutoModelForMultimodalLM.from_pretrained("aungkomyint/tara1.2-quest") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use aungkomyint/tara1.2-quest with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "aungkomyint/tara1.2-quest" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aungkomyint/tara1.2-quest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/aungkomyint/tara1.2-quest
- SGLang
How to use aungkomyint/tara1.2-quest with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "aungkomyint/tara1.2-quest" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aungkomyint/tara1.2-quest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "aungkomyint/tara1.2-quest" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "aungkomyint/tara1.2-quest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use aungkomyint/tara1.2-quest with Docker Model Runner:
docker model run hf.co/aungkomyint/tara1.2-quest
Tara 1.2 Quest
Tara 1.2 Quest is a tiny experimental question-generation model. It is the successor experiment to aungkomyint/tara1.1, with the task narrowed to producing JSON question lists for a user topic, keyword, sentence, or short request.
The model is designed to answer in this shape:
{"questions":["...","...","..."]}
It is not a general chat assistant. It is a small research model for structured question generation.
Model Details
- Model name:
tara1.2-quest - Internal checkpoint:
tara-1.2-quest-assistant-7c-json-base4v2 - Architecture:
LlamaForCausalLM - Approximate size: 5M parameters
- Context length: 512 tokens
- Vocabulary size: 4,108
- Weights format:
safetensors - License: Apache-2.0
Intended Use
Use this model to generate exploratory questions from a topic or short prompt.
Examples:
cookinghow to smilestarting a small businessdatabase migration riskhow should I think about learning biology?
The expected output is JSON with a questions array.
Quick Start
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "aungkomyint/tara1.2-quest"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.eval()
def generate_questions(user_text):
prompt = f"User: {user_text.strip()}\nAssistant:\n"
inputs = tokenizer(prompt, return_tensors="pt")
inputs.pop("token_type_ids", None)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=160,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.18,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
text = tokenizer.decode(output[0], skip_special_tokens=True)
reply = text.split("Assistant:", 1)[-1].strip()
return reply
reply = generate_questions("cooking")
print(reply)
try:
data = json.loads(reply)
print("question count:", len(data.get("questions", [])))
except json.JSONDecodeError:
print("Model did not return valid JSON for this sample.")
Example Output
Prompt:
User: cooking
Assistant:
Sample output:
{
"questions": [
"What constraints around time, money, energy, or rules shape cooking?",
"Which part of cooking is most uncertain right now?",
"How would a beginner and an expert frame cooking differently?",
"Why does cooking matter for the larger goal?",
"What evidence would make your thinking about cooking more reliable?"
]
}
Outputs are stochastic when sampling is enabled. Validate JSON in your application.
Training Summary
This checkpoint was trained as a small SFT experiment on top of Tara 1.2 base model work. The goal was to test whether a very small causal language model could learn a narrow JSON question-generation interface.
The training data focused on:
- JSON-formatted question lists
- topic-to-question mapping
- short user prompts
- exploratory and planning-style questions
Evaluation Summary
Local benchmark results showed:
- Strong JSON-format tendency under the expected prompt format.
- Useful simple topic-to-question behavior on common topics.
- Weak semantic grounding on harder prompts.
- Repetition and template overfitting.
- Poor handling of negation and negative constraints.
This release is therefore best treated as an educational/research checkpoint, not a production assistant.
Limitations
- The model is very small, about 5M parameters.
- It can drift off topic.
- It can repeat question templates.
- It may output invalid JSON for some prompts or decoding settings.
- It does not reliably understand negation such as "do not", "not", or "avoid".
- It should not be used for legal, medical, financial, safety, or high-stakes advice.
- It is not designed for multi-turn chat.
Recommended Decoding
Good starting settings:
temperature=0.7
top_p=0.9
repetition_penalty=1.18
max_new_tokens=160
For stricter output, use greedy decoding, but expect more repetition.
Relationship To Tara 1.1
tara1.2-quest is a successor experiment to aungkomyint/tara1.1. Tara 1.1 was a broader tiny assistant experiment. Tara 1.2 Quest narrows the behavior to JSON question generation.
Citation
If you use this model, cite it as:
Aung Ko Myint. Tara 1.2 Quest. 2026. Hugging Face model checkpoint.
- Downloads last month
- -