microsoft/Phi-3-mini-4k-instruct · Help with merging LoRA layers back onto Phi3

I have used q-LoRA for fine tuning Phi3 on some domain specific knowledge, and I am wonder how to merge the LoRA layers back onto Phi3-4k-instruct. I have tried the following ways:

I want to run inference on CPU on macbook, so I use llama.cpp to transform the LoRA to GGML file so that I can merge the LoRA with Phi3 using Ollama, but I have met the following ERROR:
INFO:lora-to-gguf:model.layers.0.mlp.down_proj => blk.0.ffn_down.weight.loraA (8192, 32) float32 1.00MB
INFO:lora-to-gguf:model.layers.0.mlp.down_proj => blk.0.ffn_down.weight.loraB (3072, 32) float32 0.38MB
INFO:lora-to-gguf:model.layers.0.mlp.gate_up_proj => blk.0.ffn_up.weight.loraA (3072, 32) float32 0.38MB
INFO:lora-to-gguf:model.layers.0.mlp.gate_up_proj => blk.0.ffn_up.weight.loraB (16384, 32) float32 2.00MB
ERROR:lora-to-gguf:Error: could not map tensor name base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight
ERROR:lora-to-gguf: Note: the arch parameter must be specified if the model is not llama

(By the way, I have applied the LoRA to the layers: qkv_proj", "gate_up_proj", "down_proj" of Phi3 model)
I will be grateful if someone can help me on this issue!
2. I use the method merge_and_unload() together with the method save_pretrained() from HuggingFace, but I get back a .safetensors file and a .json file, but I do not know how to use this "new" fine-tuned model on CPU.

Thanks in advance!