Spaces:
Paused
Paused
metadata
title: Kimi 48B Fine-tuned - Inference
emoji: ๐
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
suggested_hardware: l40sx4
๐ Kimi Linear 48B A3B Instruct - Fine-tuned
Professional inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model.
Model Information
- Model: optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune
- Base Model: moonshotai/Kimi-Linear-48B-A3B-Instruct
- Parameters: 48 Billion
- Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
- Architecture: Mixture of Experts (MoE) Transformer
Features
โจ Professional Chat Interface
- Clean, modern UI for seamless conversations
- Chat history with copy functionality
- System prompt customization
โ๏ธ Advanced Generation Settings
- Temperature control for creativity
- Top-P and Top-K sampling
- Repetition penalty adjustment
- Configurable response length
๐ฎ Optimized Performance
- Multi-GPU support (4xL40S recommended)
- Automatic device mapping
- bfloat16 precision for efficiency
- ~96GB VRAM requirement
Usage
- Click "Load Model" - Initialize the model (takes 2-5 minutes)
- Set System Prompt (optional) - Define the assistant's behavior
- Start Chatting - Type your message and hit send
- Adjust Settings - Fine-tune generation parameters as needed
Generation Parameters
Temperature (0.0 - 2.0)
- Low (0.1-0.5): Focused, deterministic responses
- Medium (0.6-0.9): Balanced creativity
- High (1.0-2.0): More creative and diverse outputs
Top P (0.0 - 1.0)
- 0.9 (recommended): Good balance
- Lower values: More focused
- Higher values: More diverse
Max New Tokens
- Maximum length of generated response
- 1024 (default): Good for most use cases
- Increase for longer responses
Hardware Requirements
- Recommended: 4x NVIDIA L40S GPUs (192GB total VRAM)
- Minimum: 4x NVIDIA L4 GPUs (96GB total VRAM)
- Memory: ~96GB VRAM in bfloat16 precision
Fine-tuning Details
This model was fine-tuned using QLoRA with the following configuration:
- LoRA Rank (r): 16
- LoRA Alpha: 32
- Target Modules: q_proj, k_proj, v_proj, o_proj (attention layers only)
- Dropout: 0.05
Support
For issues or questions:
Built with โค๏ธ using Transformers and Gradio