Instructions to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16", filename="mindchat.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 # Run inference directly in the terminal: llama-cli -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 # Run inference directly in the terminal: llama-cli -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 # Run inference directly in the terminal: ./llama-cli -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
Use Docker
docker model run hf.co/Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
- LM Studio
- Jan
- Ollama
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with Ollama:
ollama run hf.co/Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
- Unsloth Studio new
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 to start chatting
- Docker Model Runner
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with Docker Model Runner:
docker model run hf.co/Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
- Lemonade
How to use Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16
Run and chat with the model
lemonade run user.MindChat-Qwen2-7B-GGUF-fp16-{{QUANT_TAG}}List all available models
lemonade list
Model Details
Model Description
- Developed by: AITA
- Model type: Full-Precision Text Generation LLM (FP16 GGUF format)
- Original Model: https://huggingface.co/X-D-Lab/MindChat-Qwen-7B-v2
- Precision: FP16 (non-quantized full-precision version)
Repository
- GGUF Converter: llama.cpp
- Huggingface Hub: https://huggingface.co/Slipstream-Max/MindChat-Qwen2-7B-GGUF-fp16/
Usage
Method 1: llama.cpp Backend Server + Chatbox
Step 1: Start .llama.cpp Server
./llama-server \
-m /path/to/model.gguf \
-c 2048 \ # Context length
--host 0.0.0.0 \ # Allow remote connections
--port 8080 \ # Server port
--n-gpu-layers 35 # GPU acceleration (if available)
Step 2: Connect via Chatbox
- Download Chatbox
- Configure API endpoint:
API URL: http://localhost:8080 Model: (leave empty) API Type: llama.cpp - Set generation parameters:
{ "temperature": 0.7, "max_tokens": 512, "top_p": 0.9 }
Method 2: LM Studio
- Download LM Studio
- Load GGUF file:
- Launch LM Studio
- Search Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
- Configure settings:
Context Length: 2048 GPU Offload: Recommended (enable if available) Batch Size: 512 - Start chatting through the built-in UI
Precision Details
| Filename | Precision | Size | Characteristics |
|---|---|---|---|
| mindchat.gguf | FP16 | [15.5GB] | Full original model precision |
Hardware Requirements
Minimum:
- 24GB RAM (for 7B model)
- CPU with AVX/AVX2 instruction set support
Recommended:
- 32GB RAM
- CUDA-capable GPU (for acceleration)
- Fast SSD storage (due to large model size)
Key Notes
- Requires latest llama.cpp (v3+ recommended)
- Use
--n-gpu-layers 35for GPU acceleration (requires CUDA-enabled build) - Initial loading takes longer (2-5 minutes)
- Requires more memory/storage than quantized versions
- Use
--mlockto prevent swapping
Advantages
- Preserves original model precision
- Ideal for precision-sensitive applications
- No quantization loss
- Suitable for continued fine-tuning
Ethical Considerations
本仓库所有开源代码及模型均遵循GPL-3.0许可认证. 目前开源的MindChat模型可能存在部分局限, 因此我们对此做出如下声明:
MindChat目前仅能提供类似的心理聊天服务, 仍无法提供专业的心理咨询和心理治疗服务, 无法替代专业的心理医生和心理咨询师, 并可能存在固有的局限性, 可能产生错误的、有害的、冒犯性的或其他不良的输出. 用户在关键或高风险场景中应谨慎行事, 不要使用模型作为最终决策参考, 以免导致人身伤害、财产损失或重大损失.
MindChat在任何情况下, 作者、贡献者或版权所有者均不对因软件或使用或其他软件交易而产生的任何索赔、损害赔偿或其他责任(无论是合同、侵权还是其他原因)承担责任.
使用MindChat即表示您同意这些条款和条件, 并承认您了解其使用可能带来的潜在风险. 您还同意赔偿并使作者、贡献者和版权所有者免受因您使用MindChat而产生的任何索赔、损害赔偿或责任的影响.
Citation
@misc{MindChat,
author={Xin Yan, Dong Xue*},
title = {MindChat: Psychological Large Language Model},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/X-D-Lab/MindChat}},
}
- Downloads last month
- 4
We're not able to determine the quantization variants.