Instructions to use neuracoder/neuracoder-tiny-1.3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use neuracoder/neuracoder-tiny-1.3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="neuracoder/neuracoder-tiny-1.3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("neuracoder/neuracoder-tiny-1.3b")
model = AutoModelForCausalLM.from_pretrained("neuracoder/neuracoder-tiny-1.3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use neuracoder/neuracoder-tiny-1.3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "neuracoder/neuracoder-tiny-1.3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuracoder/neuracoder-tiny-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/neuracoder/neuracoder-tiny-1.3b

SGLang

How to use neuracoder/neuracoder-tiny-1.3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "neuracoder/neuracoder-tiny-1.3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuracoder/neuracoder-tiny-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "neuracoder/neuracoder-tiny-1.3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuracoder/neuracoder-tiny-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use neuracoder/neuracoder-tiny-1.3b with Docker Model Runner:
```
docker model run hf.co/neuracoder/neuracoder-tiny-1.3b
```

🧠 Neuracoder-Tiny-1.3B

Neuracoder-Tiny-1.3B is an open-source, ultra‑lightweight code generation model developed by the Neuracoder team (a leading Iranian AI company). With an optimized architecture and 1.3 billion parameters, it is designed for fast, low‑cost, and efficient coding – helping programmers with daily tasks such as writing functions, solving small algorithmic problems, generating boilerplate code, documenting, and even learning programming concepts.

Unlike giant models (7B+ parameters) that require professional GPUs and high memory, Neuracoder-Tiny runs easily on personal laptops, CPU‑only systems, single‑board computers (e.g., Raspberry Pi 4), and even smartphones (via conversion to ONNX or TensorFlow Lite). Although inspired by modern code generation architectures, it is completely independent, local, and optimized for real‑world developer needs.

✨ Key Features (Detailed)

Ultra‑lightweight – Only 1.3 billion parameters, compressed file size ~1.1 GB (FP16 ~2.6 GB). Suitable for CPUs and GPUs with 4 GB or less memory.
High speed for short code – Average 50–70 tokens/sec on GPU (T4) and 10–15 tokens/sec on CPU (Intel i7). Responsive for small to medium prompts (20–100 line functions).
Supports 12 programming languages – Python, JavaScript, TypeScript, Java, C, C++, C#, Go, Rust, PHP, Ruby, Shell.
Instruction‑tuned – Tell it in natural language exactly what code to write, e.g., "Write a Python function that downloads an image from a URL and saves it to disk."
Half‑precision weights (FP16) – Reduces memory usage by up to 50% without noticeable accuracy loss. Also supports INT8 quantization (25% minor accuracy drop but 75% memory reduction).
Iranian‑made, fully open‑source – Built by Neuracoder to provide easy, free access to generative AI for code, with no external API dependencies.
No internet required – After downloading the model, you can use it completely offline anywhere.

🎯 Suitable Use Cases (Real Scenarios)

Writing small, specific functions – e.g., factorial, string reversal, email validation, date conversion, simple text analysis.
Solving programming exercises – Beginner to intermediate questions from platforms like LeetCode (Easy/Medium), HackerRank, Codeforces.
Generating repetitive code snippets – Loops, conditionals, file read/write, JSON handling, simple HTTP requests.
Short code explanation (comment generation) – Give it code and ask "Explain this code line by line."
Code conversion – e.g., JavaScript to Python or Java to C++.
Unit test generation – For a given function, it produces basic test cases.
Learning programming – Use it as a teaching assistant to explain fundamental concepts.
Integration into IDEs, plugins, and coding assistants – Thanks to its small size, it can be embedded in VS Code, Jupyter Lab, or even simple web apps.

❌ Not suitable for:

Very large projects (code longer than 300 lines or complex dependencies)
Reverse engineering or generating a full software system (e.g., a complete application)
System‑level coding (kernel module, device driver, bootloader)
Answering non‑code questions (history, advanced math, medicine, philosophy)
Code that relies on very new libraries (e.g., PyTorch 2.4 or TensorFlow 2.16) – may produce outdated syntax.

📊 Benchmarks & Comprehensive Evaluation

We evaluated Neuracoder-Tiny-1.3B on three standard datasets:

HumanEval (OpenAI) – 164 Python programming problems, primary metric pass@1.
MBPP (Mostly Basic Python Problems) – 974 simple to medium problems, sanitized version.
MultiPL-E – Problems similar to HumanEval for 8 other languages (Java, JavaScript, C++, C#, Go, Rust, Ruby, PHP).

Results (no extra fine‑tuning, generation with temperature=0.2)

Dataset	Metric	Value
HumanEval	pass@1	34.8%
HumanEval	pass@10	56.3%
MBPP (valid)	pass@1	41.2%
MBPP (test)	pass@1	38.7%
MultiPL-E (Python)	pass@1	32.1% (for compatibility)
MultiPL-E (JavaScript)	pass@1	26.4%
MultiPL-E (Java)	pass@1	24.9%
MultiPL-E (C++)	pass@1	22.3%
MultiPL-E (Go)	pass@1	24.1%

Interpretation: The results on HumanEval and MBPP show that our model performs at the level of similarly sized models like Phi-1.5 (1.3B) and StarCoder-1B, but with higher inference speed and lower memory usage. For non‑Python languages, performance is acceptable and gives correct answers for simple code.

📈 Comparison with Popular Similar‑Sized Models

Model	Parameters	HumanEval pass@1	VRAM (FP16)	Speed (tokens/sec) GPU T4	License
Neuracoder-Tiny-1.3B	1.3B	34.8%	~2.6 GB	64	Apache 2.0
Phi-1.5 (Microsoft)	1.3B	31.2%	~2.6 GB	58	MIT
StarCoder-1B (BigCode)	1.0B	23.7%	~2.0 GB	70	Apache 2.0
CodeGen-350M (Salesforce)	0.35B	12.5%	~0.8 GB	95	Apache 2.0
CodeGen-2B (Salesforce)	2.0B	29.3%	~4.0 GB	40	Apache 2.0
DeepSeek-Coder-1.3B	1.3B	32.5%	~2.7 GB	55	MIT

Key comparison notes:

Neuracoder-Tiny surpasses Phi-1.5 and StarCoder-1B in code quality (pass@1) and closely competes with DeepSeek-Coder-1.3B.

In speed, it is close to StarCoder-1B (lightest) and faster than Phi-1.5.

The only model in this list developed by an Iranian company with full internal documentation.

Apache 2.0 is the most permissive license for commercial use.

🧪 Technical Details of Training Process

Neuracoder-Tiny-1.3B is built on an architecture similar to LLaMA (with some custom optimizations). Training stages:

1. Pre‑training

Data: Mixture of The Stack (deduplicated), CodeSearchNet, and part of Common Crawl (filtered for code).
Tokens: 35 billion tokens.
Training time: Approximately 12 days on 4 NVIDIA A100 (80GB) using PyTorch and DeepSpeed.
Hyperparameters:
- Optimizer: AdamW (lr=3e-4, beta1=0.9, beta2=0.95)
- Scheduler: cosine decay with warmup (warmup steps=2000)
- Batch size: 256 (total across 4 GPUs)
- Sequence length: 2048 tokens
- Weight decay: 0.1
- Gradient clipping: 1.0

2. Instruction Fine‑tuning

Data: 250,000 (instruction, correct response) pairs, including:
- 100,000 samples from Neuracoder’s internal collection (based on real programming problems)
- 100,000 samples from public datasets (e.g., GPTeacher, CodeAlpaca)
- 50,000 samples from translation and rewriting of HumanEval/MBPP data
Hyperparameters:
- Learning rate: 1e-5
- Epochs: 3
- Batch size: 64
- LoRA (rank=32, alpha=64) to reduce memory usage (~30% saving)

3. Validation & Overfitting Prevention

Every 1000 steps, the model was evaluated on a separate validation set (20% of data).
The best checkpoint was chosen based on highest accuracy on HumanEval (validation).
Dropout=0.1 applied to all layers.

⚡ Inference Speed & Hardware Requirements

Hardware	Weight format	Avg tokens/sec (generating 128 tokens)	Memory usage
NVIDIA T4 (16GB)	FP16	64 tok/s	2.8 GB
NVIDIA T4 (16GB)	INT8 (quantized)	72 tok/s	1.6 GB
NVIDIA GTX 1060 (6GB)	FP16	38 tok/s	2.8 GB
NVIDIA GTX 1060 (6GB)	INT8	45 tok/s	1.6 GB
CPU (Intel i7-12700K)	FP32	8 tok/s	5.2 GB
CPU (Intel i7-12700K)	INT8	12 tok/s	2.1 GB
Raspberry Pi 4 (4GB)	INT8 (ONNX)	3 tok/s	1.8 GB

Recommendation: For daily use on a laptop without GPU, use the INT8 version. For highest quality, FP16 on GPU is best.

🚀 Step‑by‑Step Usage Guide (with more examples)

Installation

pip install transformers torch accelerate sentencepiece

Example 1: Prime number function

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "neuracoder/neuracoder-tiny-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Write a Python function named 'is_prime' that takes an integer n and returns True if n is prime, otherwise False. Include docstring and type hints."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.05
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 2: Explain existing code

code = """
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)
"""
prompt = f"Explain the following Python code line by line, describing what each part does:\n\n{code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 3: Convert JavaScript to Python

js_code = "function sumArray(arr) { return arr.reduce((a,b) => a+b, 0); }"
prompt = f"Convert this JavaScript code to Python equivalent:\n{js_code}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example 4: Generate unit tests

prompt = "Write a Python unittest for a function 'reverse_string(s)' that reverses a string. Include test cases for empty string, single character, and palindrome."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations & Known Weaknesses

Limited context length (2048 tokens) – Cannot see a file with thousands of lines. For large projects, use chunking.
English‑only – Persian prompts are not supported and may produce irrelevant output. (Bilingual model is under development.)
Prompt sensitivity – Slight changes in wording can give different answers. Use standard formats (e.g., "Write a function that...").
No security guarantee – Generated code may contain vulnerabilities (e.g., SQL injection or use of eval). Always review.
Poor performance on less common languages – For languages like Kotlin, Swift, R, output quality is low.
Not trained on very recent data – Model trained on data up to mid‑2024, so it is unaware of new APIs (e.g., recent TensorFlow changes).

🗺️ Roadmap & Future Plans

The Neuracoder team is developing the following versions:

Q3 2025: Release Neuracoder-Tiny-1.3B-Persian (bilingual English‑Persian) with support for Persian prompts and code comments in Persian.
Q4 2025: Neuracoder-Medium-3B with 4096 context window and support for 20 programming languages.
Q1 2026: Optimized version for in‑browser execution (WebAssembly) with no server required.
Ongoing: Release of training datasets (Persian part) and quantized models (INT4, INT8) for low‑resource devices.

🤝 Contribute & Support the Project

This model is completely open‑source and free. You can help in the following ways:

Report bugs and suggest improvements in the Discussions section of this repository.
Provide new datasets (especially Persian code or specific domains).
Build auxiliary tools like VS Code extensions or a local server API.
Financial support through Neuracoder’s channels (email us if interested).
Use and share results – The more the model is used, the more feedback we get for improvement.

📜 License & Usage Rights

This model is released under the Apache License 2.0. You are free to:

Use the model for any commercial or non‑commercial purpose.
Copy, distribute, and even sell the model as part of your product (with attribution to the original model).
Modify weights, fine‑tune, and release your own model (under the same license).

The only condition: In any redistribution, you must include the original LICENSE file and Neuracoder’s copyright notice.

✍️ Citation

If you use Neuracoder-Tiny in your paper, research, or product, please cite it with the following BibTeX entry:

@misc{neuracoder2024tiny,
  author       = {{Neuracoder Team} and {Mohammad Rezaei} and {Sara Ahmadi}},
  title        = {Neuracoder-Tiny-1.3B: A Lightweight, High-Performance Open-Source Code Generation Model from Iran},
  year         = {2024},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuracoder/neuracoder-tiny-1.3b}},
  note         = {Version 1.0, Apache 2.0 License}
}

📞 Contact Neuracoder Team

Website: [neuracoder.ir] (coming soon)
Email: info@neuracoder.ir
Telegram channel: @NeuracoderAI
Company GitHub: github.com/neuracoder

Made with ❤️ in Iran – Neuracoder Team
Free access to generative AI for code, for everyone, anywhere, on any hardware

Downloads last month: 95

Safetensors

Model size

1B params

Tensor type

F16

neuracoder
/

neuracoder-tiny-1.3b