Instructions to use aungkomyint/tara1.2-quest with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aungkomyint/tara1.2-quest with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aungkomyint/tara1.2-quest")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("aungkomyint/tara1.2-quest")
model = AutoModelForMultimodalLM.from_pretrained("aungkomyint/tara1.2-quest")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aungkomyint/tara1.2-quest with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aungkomyint/tara1.2-quest"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aungkomyint/tara1.2-quest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aungkomyint/tara1.2-quest

SGLang

How to use aungkomyint/tara1.2-quest with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aungkomyint/tara1.2-quest" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aungkomyint/tara1.2-quest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aungkomyint/tara1.2-quest" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aungkomyint/tara1.2-quest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aungkomyint/tara1.2-quest with Docker Model Runner:
```
docker model run hf.co/aungkomyint/tara1.2-quest
```

Tara 1.2 Quest

Tara 1.2 Quest is a tiny experimental question-generation model. It is the successor experiment to aungkomyint/tara1.1, with the task narrowed to producing JSON question lists for a user topic, keyword, sentence, or short request.

The model is designed to answer in this shape:

{"questions":["...","...","..."]}

It is not a general chat assistant. It is a small research model for structured question generation.

Model Details

Model name: tara1.2-quest
Internal checkpoint: tara-1.2-quest-assistant-7c-json-base4v2
Architecture: LlamaForCausalLM
Approximate size: 5M parameters
Context length: 512 tokens
Vocabulary size: 4,108
Weights format: safetensors
License: Apache-2.0

Intended Use

Use this model to generate exploratory questions from a topic or short prompt.

Examples:

cooking
how to smile
starting a small business
database migration risk
how should I think about learning biology?

The expected output is JSON with a questions array.

Quick Start

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aungkomyint/tara1.2-quest"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.eval()

def generate_questions(user_text):
    prompt = f"User: {user_text.strip()}\nAssistant:\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs.pop("token_type_ids", None)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=160,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.18,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    text = tokenizer.decode(output[0], skip_special_tokens=True)
    reply = text.split("Assistant:", 1)[-1].strip()
    return reply

reply = generate_questions("cooking")
print(reply)

try:
    data = json.loads(reply)
    print("question count:", len(data.get("questions", [])))
except json.JSONDecodeError:
    print("Model did not return valid JSON for this sample.")

Example Output

Prompt:

User: cooking
Assistant:

Sample output:

{
  "questions": [
    "What constraints around time, money, energy, or rules shape cooking?",
    "Which part of cooking is most uncertain right now?",
    "How would a beginner and an expert frame cooking differently?",
    "Why does cooking matter for the larger goal?",
    "What evidence would make your thinking about cooking more reliable?"
  ]
}

Outputs are stochastic when sampling is enabled. Validate JSON in your application.

Training Summary

This checkpoint was trained as a small SFT experiment on top of Tara 1.2 base model work. The goal was to test whether a very small causal language model could learn a narrow JSON question-generation interface.

The training data focused on:

JSON-formatted question lists
topic-to-question mapping
short user prompts
exploratory and planning-style questions

Evaluation Summary

Local benchmark results showed:

Strong JSON-format tendency under the expected prompt format.
Useful simple topic-to-question behavior on common topics.
Weak semantic grounding on harder prompts.
Repetition and template overfitting.
Poor handling of negation and negative constraints.

This release is therefore best treated as an educational/research checkpoint, not a production assistant.

Limitations

The model is very small, about 5M parameters.
It can drift off topic.
It can repeat question templates.
It may output invalid JSON for some prompts or decoding settings.
It does not reliably understand negation such as "do not", "not", or "avoid".
It should not be used for legal, medical, financial, safety, or high-stakes advice.
It is not designed for multi-turn chat.

Recommended Decoding

Good starting settings:

temperature=0.7
top_p=0.9
repetition_penalty=1.18
max_new_tokens=160

For stricter output, use greedy decoding, but expect more repetition.

Relationship To Tara 1.1

tara1.2-quest is a successor experiment to aungkomyint/tara1.1. Tara 1.1 was a broader tiny assistant experiment. Tara 1.2 Quest narrows the behavior to JSON question generation.

Citation

If you use this model, cite it as:

Aung Ko Myint. Tara 1.2 Quest. 2026. Hugging Face model checkpoint.

Downloads last month: -

Safetensors

Model size

4.99M params

Tensor type

F32

Model tree for aungkomyint/tara1.2-quest

Base model

aungkomyint/tara10m-sft-v1-2k

Finetuned

aungkomyint/tara1.1

Finetuned

(1)

this model