Text Generation
Transformers
Safetensors
GGUF
Korean
English
llama
3b
korean
from-scratch
orpo
instruction-tuned
preference-aligned
fp8
b200
Eval Results (legacy)
text-generation-inference
Instructions to use pathcosmos/frankenstallm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pathcosmos/frankenstallm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pathcosmos/frankenstallm")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("pathcosmos/frankenstallm") model = AutoModelForCausalLM.from_pretrained("pathcosmos/frankenstallm") - llama-cpp-python
How to use pathcosmos/frankenstallm with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="pathcosmos/frankenstallm", filename="gguf/frankenstallm-3b-Q4_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use pathcosmos/frankenstallm with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf pathcosmos/frankenstallm:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf pathcosmos/frankenstallm:Q4_K_M
Use Docker
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use pathcosmos/frankenstallm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pathcosmos/frankenstallm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- SGLang
How to use pathcosmos/frankenstallm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pathcosmos/frankenstallm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pathcosmos/frankenstallm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pathcosmos/frankenstallm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use pathcosmos/frankenstallm with Ollama:
ollama run hf.co/pathcosmos/frankenstallm:Q4_K_M
- Unsloth Studio new
How to use pathcosmos/frankenstallm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pathcosmos/frankenstallm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pathcosmos/frankenstallm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pathcosmos/frankenstallm to start chatting
- Docker Model Runner
How to use pathcosmos/frankenstallm with Docker Model Runner:
docker model run hf.co/pathcosmos/frankenstallm:Q4_K_M
- Lemonade
How to use pathcosmos/frankenstallm with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull pathcosmos/frankenstallm:Q4_K_M
Run and chat with the model
lemonade run user.frankenstallm-Q4_K_M
List all available models
lemonade list
| # ============================================================================= | |
| # convert_3b_gguf.sh β 3B λͺ¨λΈ HuggingFace β GGUF λ³ν + λ€μ€ μμν | |
| # | |
| # Usage: | |
| # bash scripts/convert_3b_gguf.sh [options] | |
| # | |
| # Options: | |
| # --input_dir DIR HF ν¬λ§· λͺ¨λΈ λλ ν 리 (default: outputs/hf_korean_3b_orpo) | |
| # --out_dir DIR GGUF μΆλ ₯ λλ ν 리 (default: outputs/gguf) | |
| # --checkpoint DIR 컀μ€ν 체ν¬ν¬μΈνΈ λλ ν 리 (μ§μ μ HF λ³ν μ ν μ€ν) | |
| # --skip_hf_conv HF λ³ν λ¨κ³ 건λλ (μ΄λ―Έ HF ν¬λ§· μ‘΄μ¬ μ) | |
| # --skip_quant μμν λ¨κ³ 건λλ (F16 GGUFλ§ μμ±) | |
| # | |
| # Pipeline: | |
| # 1. [μ ν] 컀μ€ν 체ν¬ν¬μΈνΈ β HF transformers ν¬λ§· (convert_to_hf.py) | |
| # 2. HF β F16 GGUF (llama.cpp/convert_hf_to_gguf.py) | |
| # 3. F16 GGUF β Q4_K_M, Q5_K_M, Q8_0 μμν (llama-quantize) | |
| # | |
| # Outputs: | |
| # outputs/gguf/frankenstallm-3b-f16.gguf | |
| # outputs/gguf/frankenstallm-3b-Q4_K_M.gguf β κΆμ₯ (Ollamaμ©) | |
| # outputs/gguf/frankenstallm-3b-Q5_K_M.gguf | |
| # outputs/gguf/frankenstallm-3b-Q8_0.gguf | |
| # | |
| # μ μ 쑰건: | |
| # - python scripts/convert_to_hf.py λ‘ HF λ³ν μλ£ (λλ --checkpoint μ΅μ ) | |
| # - git, cmake, make μ€μΉ | |
| # - pip install safetensors | |
| # ============================================================================= | |
| set -euo pipefail | |
| # --------------------------------------------------------------------------- | |
| # μΈμ νμ± | |
| # --------------------------------------------------------------------------- | |
| INPUT_DIR="outputs/hf_korean_3b_orpo" | |
| OUT_DIR="outputs/gguf" | |
| CHECKPOINT_DIR="" | |
| SKIP_HF_CONV=false | |
| SKIP_QUANT=false | |
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --input_dir) INPUT_DIR="$2"; shift 2 ;; | |
| --out_dir) OUT_DIR="$2"; shift 2 ;; | |
| --checkpoint) CHECKPOINT_DIR="$2"; shift 2 ;; | |
| --skip_hf_conv) SKIP_HF_CONV=true; shift ;; | |
| --skip_quant) SKIP_QUANT=true; shift ;; | |
| -h|--help) | |
| grep '^#' "$0" | head -40 | sed 's/^# \{0,1\}//' | |
| exit 0 ;; | |
| *) | |
| echo "ERROR: μ μ μλ μ΅μ : $1" | |
| echo "Usage: bash scripts/convert_3b_gguf.sh [--input_dir DIR] [--out_dir DIR] [--checkpoint DIR] [--skip_hf_conv] [--skip_quant]" | |
| exit 1 ;; | |
| esac | |
| done | |
| PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" | |
| LLAMA_CPP_DIR="${LLAMA_CPP_DIR:-$PROJECT_DIR/outputs/llama.cpp}" | |
| MODEL_NAME="frankenstallm-3b" | |
| cd "$PROJECT_DIR" | |
| echo "==================================================================" | |
| echo " 3B λͺ¨λΈ GGUF λ³ν νμ΄νλΌμΈ" | |
| echo " μ λ ₯ HF λλ ν 리 : $INPUT_DIR" | |
| echo " GGUF μΆλ ₯ λλ ν 리: $OUT_DIR" | |
| echo " llama.cpp κ²½λ‘ : $LLAMA_CPP_DIR" | |
| echo "==================================================================" | |
| echo "" | |
| # --------------------------------------------------------------------------- | |
| # Step 0: llama.cpp μ‘΄μ¬ μ¬λΆ νμΈ / ν΄λ‘ | |
| # --------------------------------------------------------------------------- | |
| if [[ ! -d "$LLAMA_CPP_DIR" ]]; then | |
| echo "[SETUP] llama.cpp λλ ν λ¦¬κ° μμ΅λλ€." | |
| echo " λ€μ λͺ λ ΉμΌλ‘ μ€μΉνμΈμ:" | |
| echo "" | |
| echo " git clone --depth 1 https://github.com/ggerganov/llama.cpp $LLAMA_CPP_DIR" | |
| echo "" | |
| echo " λλ LLAMA_CPP_DIR νκ²½λ³μλ‘ κΈ°μ‘΄ κ²½λ‘λ₯Ό μ§μ νμΈμ:" | |
| echo " LLAMA_CPP_DIR=/path/to/llama.cpp bash scripts/convert_3b_gguf.sh" | |
| echo "" | |
| read -r -p "μ§κΈ μλ ν΄λ‘ νμκ² μ΅λκΉ? [y/N] " _yn | |
| if [[ "${_yn:-N}" =~ ^[Yy]$ ]]; then | |
| echo "Cloning llama.cpp ..." | |
| git clone --depth 1 https://github.com/ggerganov/llama.cpp "$LLAMA_CPP_DIR" | |
| else | |
| echo "μ€λ¨ν©λλ€. llama.cppλ₯Ό μ€μΉν λ€ λ€μ μ€ννμΈμ." | |
| exit 1 | |
| fi | |
| fi | |
| # llama.cpp Python μμ‘΄μ± | |
| echo "[SETUP] llama.cpp Python μμ‘΄μ± μ€μΉ μ€ ..." | |
| pip install -r "$LLAMA_CPP_DIR/requirements.txt" --break-system-packages -q | |
| # --------------------------------------------------------------------------- | |
| # Step 1: 컀μ€ν 체ν¬ν¬μΈνΈ β HF ν¬λ§· λ³ν (μ ν) | |
| # --------------------------------------------------------------------------- | |
| if [[ -n "$CHECKPOINT_DIR" && "$SKIP_HF_CONV" == "false" ]]; then | |
| echo "" | |
| echo "[STEP 1] 컀μ€ν 체ν¬ν¬μΈνΈ β HF ν¬λ§· λ³ν" | |
| echo " 체ν¬ν¬μΈνΈ: $CHECKPOINT_DIR" | |
| echo " μΆλ ₯ : $INPUT_DIR" | |
| echo "" | |
| if [[ ! -d "$CHECKPOINT_DIR" ]]; then | |
| echo "ERROR: 체ν¬ν¬μΈνΈ λλ ν 리λ₯Ό μ°Ύμ μ μμ΅λλ€: $CHECKPOINT_DIR" | |
| exit 1 | |
| fi | |
| python "$PROJECT_DIR/scripts/convert_to_hf.py" \ | |
| --checkpoint "$CHECKPOINT_DIR" \ | |
| --output "$INPUT_DIR" \ | |
| --tokenizer "tokenizer/korean_sp/tokenizer.json" | |
| echo " [OK] HF λ³ν μλ£ β $INPUT_DIR" | |
| elif [[ "$SKIP_HF_CONV" == "true" ]]; then | |
| echo "[STEP 1] HF λ³ν 건λλ (--skip_hf_conv)" | |
| else | |
| echo "[STEP 1] 체ν¬ν¬μΈνΈ λ―Έμ§μ β HF λλ ν 리λ₯Ό μ§μ μ¬μ©ν©λλ€." | |
| fi | |
| # HF λλ ν 리 μ΅μ’ κ²μ¦ | |
| if [[ ! -d "$INPUT_DIR" ]]; then | |
| echo "ERROR: HF λͺ¨λΈ λλ ν 리λ₯Ό μ°Ύμ μ μμ΅λλ€: $INPUT_DIR" | |
| echo " --checkpoint μ΅μ μΌλ‘ 체ν¬ν¬μΈνΈλ₯Ό μ§μ νκ±°λ," | |
| echo " python scripts/convert_to_hf.py λ₯Ό λ¨Όμ μ€ννμΈμ." | |
| exit 1 | |
| fi | |
| if [[ ! -f "$INPUT_DIR/config.json" ]]; then | |
| echo "ERROR: config.json μ΄ μμ΅λλ€: $INPUT_DIR/config.json" | |
| exit 1 | |
| fi | |
| mkdir -p "$OUT_DIR" | |
| # --------------------------------------------------------------------------- | |
| # Step 2: llama.cpp λΉλ (llama-quantize λ°μ΄λ리) | |
| # --------------------------------------------------------------------------- | |
| QUANTIZE_BIN="$LLAMA_CPP_DIR/build/bin/llama-quantize" | |
| if [[ ! -f "$QUANTIZE_BIN" ]]; then | |
| echo "" | |
| echo "[STEP 2] llama.cpp λΉλ μ€ (llama-quantize) ..." | |
| cmake -S "$LLAMA_CPP_DIR" -B "$LLAMA_CPP_DIR/build" \ | |
| -DCMAKE_BUILD_TYPE=Release \ | |
| -DGGML_CUDA=ON \ | |
| 2>&1 | tail -10 | |
| cmake --build "$LLAMA_CPP_DIR/build" --target llama-quantize -j "$(nproc)" \ | |
| 2>&1 | tail -10 | |
| echo " [OK] λΉλ μλ£: $QUANTIZE_BIN" | |
| else | |
| echo "[STEP 2] llama-quantize λ°μ΄λ리 μ΄λ―Έ μ‘΄μ¬ β λΉλ 건λλ" | |
| fi | |
| # --------------------------------------------------------------------------- | |
| # Step 3: HF β F16 GGUF λ³ν | |
| # --------------------------------------------------------------------------- | |
| F16_GGUF="$OUT_DIR/${MODEL_NAME}-f16.gguf" | |
| echo "" | |
| echo "[STEP 3] HF β F16 GGUF λ³ν" | |
| echo " μ λ ₯: $INPUT_DIR" | |
| echo " μΆλ ₯: $F16_GGUF" | |
| echo "" | |
| python "$LLAMA_CPP_DIR/convert_hf_to_gguf.py" "$INPUT_DIR" \ | |
| --outfile "$F16_GGUF" \ | |
| --outtype f16 | |
| echo " [OK] F16 GGUF ν¬κΈ°: $(du -sh "$F16_GGUF" | cut -f1) ($F16_GGUF)" | |
| # --------------------------------------------------------------------------- | |
| # Step 4: λ€μ€ μμν (Q4_K_M, Q5_K_M, Q8_0) | |
| # --------------------------------------------------------------------------- | |
| if [[ "$SKIP_QUANT" == "true" ]]; then | |
| echo "" | |
| echo "[STEP 4] μμν 건λλ (--skip_quant)" | |
| else | |
| echo "" | |
| echo "[STEP 4] λ€μ€ μμν μμ ..." | |
| if [[ ! -f "$QUANTIZE_BIN" ]]; then | |
| echo "[WARN] llama-quantize λ°μ΄λ리λ₯Ό μ°Ύμ μ μμ΅λλ€: $QUANTIZE_BIN" | |
| echo " μμνλ₯Ό 건λλλλ€. F16 GGUFλ§ μμ±λμμ΅λλ€." | |
| echo " μλ λΉλ: cmake --build $LLAMA_CPP_DIR/build --target llama-quantize" | |
| else | |
| # Q4_K_M β κ°μ₯ μμ ν¬κΈ°, νμ§/μλ κ· ν (Ollama κΈ°λ³Έ κΆμ₯) | |
| Q4KM_GGUF="$OUT_DIR/${MODEL_NAME}-Q4_K_M.gguf" | |
| echo " β Q4_K_M μμν: $Q4KM_GGUF ..." | |
| "$QUANTIZE_BIN" "$F16_GGUF" "$Q4KM_GGUF" Q4_K_M | |
| echo " ν¬κΈ°: $(du -sh "$Q4KM_GGUF" | cut -f1)" | |
| # Q5_K_M β μ€κ° ν¬κΈ°, λ λμ νμ§ | |
| Q5KM_GGUF="$OUT_DIR/${MODEL_NAME}-Q5_K_M.gguf" | |
| echo " β Q5_K_M μμν: $Q5KM_GGUF ..." | |
| "$QUANTIZE_BIN" "$F16_GGUF" "$Q5KM_GGUF" Q5_K_M | |
| echo " ν¬κΈ°: $(du -sh "$Q5KM_GGUF" | cut -f1)" | |
| # Q8_0 β κ°μ₯ λμ νμ§ (F16 κ·Όμ¬) | |
| Q8_GGUF="$OUT_DIR/${MODEL_NAME}-Q8_0.gguf" | |
| echo " β Q8_0 μμν: $Q8_GGUF ..." | |
| "$QUANTIZE_BIN" "$F16_GGUF" "$Q8_GGUF" Q8_0 | |
| echo " ν¬κΈ°: $(du -sh "$Q8_GGUF" | cut -f1)" | |
| echo "" | |
| echo " [OK] λͺ¨λ μμν μλ£" | |
| fi | |
| fi | |
| # --------------------------------------------------------------------------- | |
| # μλ£ μμ½ | |
| # --------------------------------------------------------------------------- | |
| echo "" | |
| echo "==================================================================" | |
| echo " 3B GGUF λ³ν μλ£" | |
| echo "" | |
| echo " μΆλ ₯ νμΌ λͺ©λ‘:" | |
| ls -lh "$OUT_DIR/${MODEL_NAME}"*.gguf 2>/dev/null | awk '{print " " $5 " " $9}' || \ | |
| echo " (νμΌ λͺ©λ‘ νμΈ: ls -lh $OUT_DIR/)" | |
| echo "" | |
| echo " λ€μ λ¨κ³:" | |
| echo " bash scripts/deploy_3b_ollama.sh" | |
| echo " bash scripts/quality_gate.sh deploy" | |
| echo "==================================================================" | |