File path upper case issue

by sleepyeldrazi - opened 2 days ago

Full disclosure I am using the unsloth Q6_K_XL and not the full precision version, however this seems like a potential general problem, therefore I'm reporting it here.

I have file paths with words what are valid english words, often starting with a capital first letter (Odyssey in ~/workspace/odyssey for example). The model, after told to perform tasks at ~/workspace/odyssey, repeatedly tried using cd to enter the folder, with the command: cd ~/workspace/Odyssey . It eventually used ls and found the name and corrected itself, but it also happened with workspace, and before that it mistook sleepy-spark for sleeper-spark.

Again, could be a quant thing, but that quant is relatively high precision. Can someone with more vram test similar scenarios to validate if it is a quant thing, or some small issue in the model itself?

pannaga10

Google org about 3 hours ago

•

edited about 2 hours ago

Hi @sleepyeldrazi
I tried reproducing this using the google/gemma-4-12b-it model and wrote a few simple test cases covering the scenarios you described.

Here are the test cases I ran in a colab file (https://colab.sandbox.google.com/gist/pannaga-33/76e4c406a52c24e3cf8ec20d26e49eb8/untitled4.ipynb) . It worked as expected .
If you are still facing this, one thing worth trying in the meantime is adding this to your system prompt:
"Read all file paths and directory names as literal case-sensitive strings. Never change the capitalisation or spelling of any path provided in the prompt."
If the issue persists even with that, could you share a minimal code snippet where I can reproduce it on my end? That would help us narrow things down further.

Thanks!

sleepyeldrazi

about 1 hour ago

•

edited about 1 hour ago

My setup used llama.cpp (latest main at the time) as the inference server, pi.dev as the harness (with 2 extra tools added, one for subagents, one for web search, everything else stock) and the model was the unsloth Q6_K_XL quant variant. I am attaching partial pictures from the first conversation where all this happened, I'd be happy to share the full logs through a private channel, but on a public discussion like this, this is about as much as I'm willing to.

I am pulling the full precision weights (this repo) to my dgx spark as I am typing this to try and verify if it is quant specific or not. While I am not running it directly in code as your example did, both llama.cpp and lm studio (which use gguf quants) are explicitly mentioned in the official blog post, and the pi harness is about at system prompt/tool light as harnesses get nowadays, so I'd argue that it is still worth finding the cause of this.

The trend of writing it out correctly once and fumbling it a little later has happened multiple times for similar names for me so far and happened on the IQ4_NL quant as well. Here are also my llama.cpp flags, in case i am doing something wrong there:

    -c 131072 \
    --n-gpu-layers 999 \
    --threads 12 \
    --flash-attn on \
    --batch-size 2048 \
    --ubatch-size 2048 \
    --parallel 1 \
    --temp 1.0 \
    --top-p 0.95 \
    --top-k 64 \
    --min-p 0.00 \
    --repeat-penalty 1.0 \
    --presence-penalty 0.0 \
    --no-mmap \
    --fit off \
    --jinja \
    -n -1 \
    --prio 2 \
    --mlock

And lest I forget, thank you for taking the time and looking into this!

Here is also my full sys prompt in pi:

System Prompt
You are an expert coding assistant operating inside pi, a coding agent harness. You help users by reading files, executing commands, editing code, and writing new files.

Available tools:
- read: Read file contents
- bash: Execute bash commands (ls, grep, find, etc.)
- edit: Make precise file edits with exact text replacement, including multiple disjoint edits in one call
- write: Create or overwrite files
- web_search: Search the web. Returns up to 20 results. Follow-up with web_fetch for content from promising URLs.
- web_fetch: Fetch a URL as Markdown (paginated).

In addition to the tools above, you may have access to other custom tools depending on the project.

Guidelines:
- Use bash for file operations like ls, rg, find
- Use read to examine files instead of cat or sed.
- Use edit for precise changes (edits[].oldText must match exactly)
- When changing multiple separate locations in one file, use one edit call with multiple entries in edits[] instead of multiple edit calls
- Each edits[].oldText is matched against the original file, not after earlier edits are applied. Do not emit overlapping or nested edits. Merge nearby changes into one edit.
- Keep edits[].oldText as small as possible while still being unique in the file. Do not pad with large unchanged regions.
- Use write only for new files or complete rewrites.
- Once you have a promising result, switch to web_fetch instead of spending more searches.
- Always web_fetch sites you plan on quoting or using information from.
- web_fetch returns large pages in parts. When the output shows (part 1/N), call web_fetch again with part=2 to read the next chunk.
- Set maxLength based on needs (20,000 default). Lower for quick checks, higher for docs.
- If web_fetch returns empty content or warns about JS-rendering, retry with web_fetch_js.
- Be concise in your responses
- Show file paths clearly when working with files

Pi documentation (read only when the user asks about pi itself, its SDK, extensions, themes, skills, or TUI):
- Main documentation: /Users/sleepy/.nvm/versions/node/v22.22.2/lib/node_modules/@earendil-works/pi-coding-agent/README.md
- Additional docs: /Users/sleepy/.nvm/versions/node/v22.22.2/lib/node_modules/@earendil-works/pi-coding-agent/docs
- Examples: /Users/sleepy/.nvm/versions/node/v22.22.2/lib/node_modules/@earendil-works/pi-coding-agent/examples (extensions, custom tools, SDK)
- When reading pi docs or examples, resolve docs/... under Additional docs and examples/... under Examples, not the current working directory
- When asked about: extensions (docs/extensions.md, examples/extensions/), themes (docs/themes.md), skills (docs/skills.md), prompt templates (docs/prompt-templates.md), TUI components (docs/tui.md), keybindings (docs/keybindings.md), SDK integrations (docs/sdk.md), custom providers (docs/custom-provider.md), adding models (docs/models.md), pi packages (docs/packages.md)
- When working on pi topics, read the docs and examples, and follow .md cross-references before implementing
- Always read pi .md files completely and follow links to related docs (e.g., tui.md for TUI API details)

<project_context>

Project-specific instructions and guidelines:

<project_instructions path="/Users/sleepy/workspace/odysseus/AGENTS.md">
# AGENTS.md — AI-First Development Baseline

## Architecture

- Organize code by feature/domain, not by technical role.
- Every feature directory MUST contain a `README.md` with: what it owns, what it does NOT own, key entry points, external dependencies, test locations.
- Split files that exceed ~400 lines or contain multiple concerns.
- Features communicate through narrow, typed interfaces. Never import another feature's internals directly.
- Build extensible systems with explicit extension points (hooks, interfaces, event buses). Avoid tight coupling that makes the system brittle to change.

## Agent Rules

- If you cannot verify something, STOP and ask. Never guess framework versions, business logic, or patterns.
- Before adding any library or framework: check existing manifests, read existing code for established patterns, verify the latest stable version, and confirm non-trivial choices with the user.
- For non-trivial changes: read files first, localize the issue, make a plan, present it briefly, then implement after confirmation.
- After making changes: run the relevant test(s), run linter/typechecker if available, verify the change works.
- Never say "this should work" — run it.
- Do not estimate timeframes or delivery dates.
- Prefer structural fixes over symptomatic patches. One-off workarounds for specific configs create compounding debt. Fix the root cause.
- When core architecture is flawed, surface the structural issue to the user instead of working around it with patches.

## Output Discipline

- Strip filler, pleasantries, hedging, and restatements. Technical substance stays; fluff dies.
- No "Sure!", "Great question!", "I'd be happy to...", "It's worth noting that..."
- No restating the user's request before answering it.
- No unsolicited suggestions beyond what was asked.
- Short declarative sentences. Fragments OK. Get to the point.
- Do not over-explain when the answer is obvious.
- Code blocks, file paths, and error strings: exact, unchanged.

## Prohibited

- Changing code in files you have not read.
- Using raw/manual patterns when the project already has an abstraction (ORM, router, state management, etc.).
- Leaving `TODO`, `FIXME`, or placeholder implementations. Implement it or ask.
- Putting secrets in client-side code.
- Writing raw SQL when the project uses an ORM.
- Bypassing auth checks.
- Adding cross-feature dependencies without explicit interfaces.
- Writing new helpers without searching for existing ones first.
- After two consecutive failures on the same action, return to planning — do not retry blindly.

</project_instructions>

</project_context>


The following skills provide specialized instructions for specific tasks.
Use the read tool to load a skill's file when the task matches its description.
When a skill file references a relative path, resolve it against the skill directory (parent of SKILL.md / dirname of the path) and use that absolute path in tool commands.

<available_skills>
  <skill>
    <name>git-orchestrator</name>
    <description>Load when acting as a Git Orchestrator — a bookkeeper/manager of subagent coders. Manages issues, delegates coding tasks to subagents, dispatches PR reviewers, enforces quality gates. Does NOT implement code directly.</description>
    <location>/Users/sleepy/.pi/agent/skills/git-orchestrator/SKILL.md</location>
  </skill>
  <skill>
    <name>how-to-git-locally</name>
    <description>Read when working with git repositories. Contains instructions for using the local Forgejo (formerly Gitea) instance instead of pushing to upstream repos.</description>
    <location>/Users/sleepy/.pi/agent/skills/how-to-git-locally/SKILL.md</location>
  </skill>
  <skill>
    <name>html-output</name>
    <description>Produce structured visual content as a single self-contained HTML file with light/dark toggle. Use for summaries, comparisons, dashboards, maps, reference pages, plans — anything that benefits from readability and interactivity.</description>
    <location>/Users/sleepy/.pi/agent/skills/html-output/SKILL.md</location>
  </skill>
  <skill>
    <name>skill-finder</name>
    <description>Full skill library. Use if the user mentions they &apos;have a skill for that&apos;.</description>
    <location>/Users/sleepy/.pi/agent/skills/skill-lookup/SKILL.md</location>
  </skill>
</available_skills>
Current date: 2026-06-05
Current working directory: /Users/sleepy/workspace/odysseus

sleepyeldrazi

2 minutes ago

Update: Ran the full precision as a gguf (locally converted) on the dgx spark, I could not replicate the issue. A quick analysis of the two quants that had the issue shows that token_embd.weight in both of them is quantized, at Q4_K in IQ4_NL and Q8_0 in the Q6_K_XL. That is the best hypothesis as to what is causing it that I can currently find. I will attempt a manual quantization keeping token_embd.weight at full precision to test this theory.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment