Instructions to use Daleth-hb/qwen-cuda2hip-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Daleth-hb/qwen-cuda2hip-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") model = PeftModel.from_pretrained(base_model, "Daleth-hb/qwen-cuda2hip-lora") - Notebooks
- Google Colab
- Kaggle
Model Card: Qwen-CUDA2HIP-LoRA
This model is a specialized LoRA adapter for Qwen2.5-Coder-7B-Instruct designed to automate and explain the migration of NVIDIA CUDA source code to AMD ROCm (HIP). It was developed to bridge the portability gap for HPC developers targeting the AMD Instinct MI300X architecture.
Model Details
Model Description
Qwen-CUDA2HIP-LoRA is fine-tuned using a dual-strategy approach: standard Supervised Fine-Tuning (SFT) for syntax mapping and Reasoning-Aware Fine-Tuning (RAFT) for architectural justification. It doesn't just swap cudaMalloc for hipMalloc; it provides technical context on why certain changes are made, optimized for the CDNA 3 architecture.
- Developed by: Alan Daleth Hernández Barreto
- Model type: LoRA Adapter (PEFT)
- Language(s): English (Technical), C++, CUDA, HIP
- License: Apache 2.0
- Finetuned from model: Qwen/Qwen2.5-Coder-7B-Instruct
Model Sources
- Repository: https://huggingface.co/Daleth-hb/qwen-cuda2hip-lora
- Dataset (RAFT): Daleth-hb/cuda-hip-gpu-RAFT-dataset
- Dataset (CoT): Daleth-hb/cuda-hip-gpu-dataset
Uses
Direct Use
- Automated porting of CUDA kernels to HIP.
- Educational assistance for developers learning the ROCm ecosystem.
- Generating hardware-specific optimizations (vectorization with
float4) for AMD GPUs.
Out-of-Scope Use
- General-purpose chat (the model is heavily biased toward GPU programming).
- Production code generation for mission-critical systems without human expert review.
- Non-CUDA/HIP languages.
Bias, Risks, and Limitations
- Library Parity: While the model excels at core Runtime and Kernel APIs, it may hallucinate mappings for niche or proprietary NVIDIA libraries (e.g., very specific versions of cuDNN or OptiX).
- MI300X Focus: Optimizations are tailored for CDNA 3; they might be sub-optimal for older GCN or RDNA architectures.
How to Get Started with the Model
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Daleth-hb/qwen-cuda2hip-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
prompt = "Convert this CUDA kernel to HIP: __global__ void add(int *a, int *b, int *c) { ... }"
# Proceed with standard generation...
- Downloads last month
- 54
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support