oncody
/

Nepalaya-R

@@ -1,154 +1,145 @@
 ---
 license: mit
 library_name: transformers
-base_model:
-  - deepseek-ai/DeepSeek-V3.2-Exp-Base
-base_model_relation: finetune
 ---
-# DeepSeek-V3.2-Exp
-<!-- markdownlint-disable first-line-h1 -->
-<!-- markdownlint-disable html -->
-<!-- markdownlint-disable no-duplicate-header -->
-<div align="center">
-  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
-</div>
-<hr>
-<div align="center" style="line-height: 1;">
-  <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
-    <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-  <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
-    <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-  <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
-    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-</div>
-<div align="center" style="line-height: 1;">
-  <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
-    <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-  <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
-    <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-  <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
-    <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-</div>
-<div align="center" style="line-height: 1;">
-  <a href="LICENSE" style="margin: 2px;">
-    <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
-  </a>
-</div>
-## Introduction
-We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.
-This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences.
-<div align="center">
- <img src="assets/cost.png" >
-</div>
-- DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality.
-- To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus.
-| Benchmark | DeepSeek-V3.1-Terminus | DeepSeek-V3.2-Exp |
-| :--- | :---: | :---: |
-| **Reasoning Mode w/o Tool Use** | | |
-| MMLU-Pro | 85.0 | 85.0 |
-| GPQA-Diamond | 80.7 | 79.9 |
-| Humanity's Last Exam | 21.7 | 19.8 |
-| LiveCodeBench | 74.9 | 74.1 |
-| AIME 2025 | 88.4 | 89.3 |
-| HMMT 2025 | 86.1 | 83.6 |
-| Codeforces | 2046 | 2121 |
-| Aider-Polyglot | 76.1 | 74.5 |
-| **Agentic Tool Use** | | |
-| BrowseComp | 38.5 | 40.1 |
-| BrowseComp-zh | 45.0 | 47.9 |
-| SimpleQA | 96.8 | 97.1 |
-| SWE Verified | 68.4 | 67.8 |
-| SWE-bench Multilingual | 57.8 | 57.9 |
-| Terminal-bench | 36.7 | 37.7 |
-## Update
-- 2025.11.17: **We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.** Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.
-## How to Run Locally
-### HuggingFace
-We provide an updated inference demo code in the [inference](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/tree/main/inference) folder to help the community quickly get started with our model and understand its architectural details.
-First convert huggingface model weights to the the format required by our inference demo. Set `MP` to match your available GPU count:
 ```bash
-cd inference
-export EXPERTS=256
-python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
 ```
-Launch the interactive chat interface and start exploring DeepSeek's capabilities:
 ```bash
-export CONFIG=config_671B_v3.2.json
-torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
 ```
-### SGLang
-#### Installation with Docker
 ```
-# H200
-docker pull lmsysorg/sglang:dsv32
-# MI350
-docker pull lmsysorg/sglang:dsv32-rocm
-# NPUs
-docker pull lmsysorg/sglang:dsv32-a2
-docker pull lmsysorg/sglang:dsv32-a3
-```
-#### Launch Command
-```bash
-python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention
-```
-### vLLM
-vLLM provides day-0 support of DeepSeek-V3.2-Exp. See the [recipes](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2-Exp.html) for up-to-date details.
-## Open-Source Kernels
-For TileLang kernels with **better readability and research-purpose design**, please refer to [TileLang](https://github.com/tile-ai/tilelang/tree/main/examples/deepseek_v32).
-For **high-performance CUDA kernels**, indexer logit kernels (including paged versions) are available in [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM/pull/200). Sparse attention kernels are released in [FlashMLA](https://github.com/deepseek-ai/FlashMLA/pull/98).
-## License
-This repository and the model weights are licensed under the [MIT License](LICENSE).
-## Citation
 ```
-@misc{deepseekai2024deepseekv32,
-      title={DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention},
-      author={DeepSeek-AI},
-      year={2025},
-}
 ```
-## Contact
-If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).

 ---
 license: mit
 library_name: transformers
 ---
+# Nepalaya-R
+Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.
+## About This Model
+This repository contains the Nepalaya-R model implementation with:
+- ✅ Full source code and inference implementations
+- ✅ Tokenizer configuration adapted for Nepalaya-R
+- ✅ Easy-to-use inference scripts
+- ✅ Documentation and setup guides
+## Quick Start
+### Installation
 ```bash
+pip install -r requirements.txt
 ```
+### Download & Setup
+Option 1: Download from Hugging Face
 ```bash
+export HF_TOKEN=your_token
+python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights
 ```
+Option 2: Run Quick Inference
+```bash
+python quick_inference.py --prompt "Your prompt here"
+```
+### Mirror Setup
+To create your own Nepalaya-R repo mirror:
+```bash
+export HF_TOKEN=your_token
+python mirror_to_hf.py \
+  --source source-org/source-model \
+  --dest your-username/Nepalaya-R
 ```
+## Documentation
+- **[SETUP.md](SETUP.md)** - Detailed setup and configuration guide
+- **[GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)** - Deployment instructions
+- **[inference/README.md](inference/README.md)** - Inference code documentation
+## Model Architecture
+Nepalaya-R architecture summary:
+- **Parameters:** 671B
+- **Context Length:** Extended via sparse attention
+- **Training:** Sparse attention based training pipeline
+- **Architecture:** Optimized transformer with mixture-of-experts
+## Key Features
+- Multi-expert routing for efficient inference
+- Sparse attention for long-context processing
+- Chat template support
+- Distributed inference capabilities
+## System Requirements
+- **GPU Memory:** 48GB+ VRAM recommended
+- **RAM:** 64GB+ system memory
+- **Storage:** ~300GB for full model weights
+- **SSD:** Fast storage recommended
+## Usage Examples
+### Basic Generation
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "your-username/Nepalaya-R",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")
+inputs = tokenizer("Hello", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0]))
+```
+### Chat Mode
+```python
+messages = [
+    {"role": "user", "content": "What is machine learning?"}
+]
+inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+```
+## Repository Structure
 ```
+Nepalaya-R/
+├── README.md                          # This file
+├── SETUP.md                           # Setup guide
+├── GITHUB_DEPLOY.md                   # Deployment guide
+├── requirements.txt                   # Python dependencies
+├── config.json                        # Model configuration
+├── tokenizer.json                     # Tokenizer
+├── quick_inference.py                 # Quick inference script
+├── download_model.py                  # Model downloader
+├── mirror_to_hf.py                    # HF mirroring tool
+├── inference/                         # Inference code
+│   ├── generate.py                    # Generation script
+│   ├── model.py                       # Model implementation
+│   ├── convert.py                     # Weight converter
+│   └── config_671B_nepalaya.json      # Inference config
+└── assets/                            # Chat templates
 ```
+## Files Included
+- **Source Code:** Full inference implementation
+- **Configuration:** Model and generation configs
+- **Tokenizer:** Complete tokenizer setup
+- **Documentation:** Setup and usage guides
+- **Utilities:** Download and mirror scripts
+## License
+MIT License - See [LICENSE](LICENSE) file
+## Support
+For documentation, see [SETUP.md](SETUP.md)
+For deployment, see [GITHUB_DEPLOY.md](GITHUB_DEPLOY.md)
+---
+Nepalaya-R model card and repository maintained by the Nepalaya-R project.