Model Card: Qwen-CUDA2HIP-LoRA

This model is a specialized LoRA adapter for Qwen2.5-Coder-7B-Instruct designed to automate and explain the migration of NVIDIA CUDA source code to AMD ROCm (HIP). It was developed to bridge the portability gap for HPC developers targeting the AMD Instinct MI300X architecture.

Model Details

Model Description

Qwen-CUDA2HIP-LoRA is fine-tuned using a dual-strategy approach: standard Supervised Fine-Tuning (SFT) for syntax mapping and Reasoning-Aware Fine-Tuning (RAFT) for architectural justification. It doesn't just swap cudaMalloc for hipMalloc; it provides technical context on why certain changes are made, optimized for the CDNA 3 architecture.

Developed by: Alan Daleth Hernández Barreto
Model type: LoRA Adapter (PEFT)
Language(s): English (Technical), C++, CUDA, HIP
License: Apache 2.0
Finetuned from model: Qwen/Qwen2.5-Coder-7B-Instruct

Model Sources

Repository: https://huggingface.co/Daleth-hb/qwen-cuda2hip-lora
Dataset (RAFT): Daleth-hb/cuda-hip-gpu-RAFT-dataset
Dataset (CoT): Daleth-hb/cuda-hip-gpu-dataset

Uses

Direct Use

Automated porting of CUDA kernels to HIP.
Educational assistance for developers learning the ROCm ecosystem.
Generating hardware-specific optimizations (vectorization with float4) for AMD GPUs.

Out-of-Scope Use

General-purpose chat (the model is heavily biased toward GPU programming).
Production code generation for mission-critical systems without human expert review.
Non-CUDA/HIP languages.

Bias, Risks, and Limitations

Library Parity: While the model excels at core Runtime and Kernel APIs, it may hallucinate mappings for niche or proprietary NVIDIA libraries (e.g., very specific versions of cuDNN or OptiX).
MI300X Focus: Optimizations are tailored for CDNA 3; they might be sub-optimal for older GCN or RDNA architectures.

How to Get Started with the Model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "Daleth-hb/qwen-cuda2hip-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

prompt = "Convert this CUDA kernel to HIP: __global__ void add(int *a, int *b, int *c) { ... }"
# Proceed with standard generation...

Downloads last month: 54

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daleth-hb/qwen-cuda2hip-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

(661)

this model