AMchat-GGUF / README.md

Update README

bf42b35 7 months ago

4 kB

	---
	license: apache-2.0
	---

	<div align="center">
	<img src="https://raw.githubusercontent.com/AXYZdong/AMchat/main/assets/logo.png" width="200" alt="AMchat Logo"/>
	<div align="center">
	<b><font size="5">AMchat</font></b>
	</div>
	<div align="center">
	<a href="https://github.com/AXYZdong/AMchat">💻Github Repo</a>
	</div>
	</div>

	## AMchat GGUF Model
	AM (Advanced Mathematics) Chat is a large-scale language model that integrates mathematical knowledge, advanced mathematics problems, and their solutions. This model utilizes a dataset that combines Math and advanced mathematics problems with their analyses. It is based on the InternLM2-Math-7B model and has been fine-tuned with xtuner, specifically designed to solve advanced mathematics problems.


	## Latest Release

	2024-08-16

	- Q6_K
	- Q5_K_M
	- Q5_0
	- Q4_0
	- Q3_K_M
	- Q2_K


	2024-08-09

	- F16 Quantization: Achieves a balanced trade-off between model size and performance. Ideal for applications requiring precision with reduced resource consumption.
	- Q8_0 Quantization: Offers a substantial reduction in model size while maintaining high accuracy, making it suitable for environments with stringent memory constraints.
	- Q4_K_M Quantization: Provides the most compact model size with minimal impact on performance, perfect for deployment in resource-constrained settings.


	## Getting Started - Ollama

	To get started with AMchat in [Ollama](https://github.com/ollama/ollama), follow these steps:

	1. Clone the Repository
	```bash
	git lfs install
	git clone https://huggingface.co/axyzdong/AMchat-GGUF
	```

	2. Creat Model
	>Make sure you have installed ollama in advance. https://ollama.com/
	```bash
	ollama create AMchat -f Modelfile
	```


	3. Run
	```bash
	ollama run AMchat
	```

	## Getting Started - llama-cli

	You can use `llama-cli` for conducting inference. For a detailed explanation of `llama-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)

	### Installation

	We recommend building `llama.cpp` from source. The following code snippet provides an example for the Linux CUDA platform. For instructions on other platforms, please refer to the [official guide](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build).

	- Step 1: create a conda environment and install cmake

	```shell
	conda create --name AMchat python=3.10 -y
	conda activate AMchat
	pip install cmake
	```

	- Step 2: clone the source code and build the project

	```shell
	git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
	cd llama.cpp
	cmake -B build -DGGML_CUDA=ON
	cmake --build build --config Release -j
	```

	All the built targets can be found in the sub directory `build/bin`

	In the following sections, we assume that the working directory is at the root directory of `llama.cpp`.

	### Download models

	You can download the appropriate model based on your requirements.
	For instance, `AMchat-q8_0.gguf` can be downloaded as below：

	```shell
	pip install huggingface-hub
	huggingface-cli download axyzdong/AMchat-GGUF AMchat-q8_0.gguf --local-dir . --local-dir-use-symlinks False
	```

	### chat example

	```shell
	build/bin/llama-cli \
	--model AMchat-fp16.gguf \
	--predict 512 \
	--ctx-size 4096 \
	--gpu-layers 24 \
	--temp 0.8 \
	--top-p 0.8 \
	--top-k 50 \
	--seed 1024 \
	--color \
	--prompt "<\|im_start\|>system\nYou are an expert in advanced math and you can answer all kinds of advanced math problems.<\|im_end\|>\n" \
	--interactive \
	--multiline-input \
	--conversation \
	--verbose \
	--logdir workdir/logdir \
	--in-prefix "<\|im_start\|>user\n" \
	--in-suffix "<\|im_end\|>\n<\|im_start\|>assistant\n"
	```

	## Star Us
	If you find AMchat useful, please ⭐ Star this repository and help others discover it!