Unable to load in ollama built from PR branch

#10
by gigq - opened

I built llama.cpp in ollama from the branch in the PR, yet whenever I load the IQ4_KM gguf file I get the "invalid file magic" error. Does what I have here look right for ollama on Windows 11?

PS C:\Users\Justin\Workspace\ollama> cd .\llm\llama.cpp\
PS C:\Users\Justin\Workspace\ollama\llm\llama.cpp> git status
HEAD detached at ea1aeba4
nothing to commit, working tree clean
PS C:\Users\Justin\Workspace\ollama\llm\llama.cpp> cd ..
PS C:\Users\Justin\Workspace\ollama\llm> cd ..
PS C:\Users\Justin\Workspace\ollama> $env:CGO_ENABLED="1"
PS C:\Users\Justin\Workspace\ollama> go generate ./...
Submodule path '../llama.cpp': checked out 'ea1aeba48b42f5f6ec41412ef271f5d708b0fa2f'
Updated 0 paths from the index
Checking for MinGW...

CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Application     gcc.exe                                            0.0.0.0    C:\ProgramData\mingw64\mingw64\bin\gcc.exe
Application     mingw32-make.exe                                   0.0.0.0    C:\ProgramData\mingw64\mingw64\bin\mingw32-make.exe
Building static library
...
Generating build details from Git
  -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.44.0.windows.1")
  build_info.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\common\build_info.dir\Release\build_info.lib
  ggml.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\ggml.dir\Release\ggml.lib
  llama.cpp
C:\Users\Justin\Workspace\ollama\llm\llama.cpp\llama.cpp(14181,9): warning C4297: 'llama_load_model_from_file': function assumed not to throw an exception but does [C:\Users\Jus
tin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\llama.vcxproj]
  C:\Users\Justin\Workspace\ollama\llm\llama.cpp\llama.cpp(14181,9):
  __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function

  Auto build dll exports
     Creating library C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/cuda_v12.3/Release/llama.lib and object C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/c
  uda_v12.3/Release/llama.exp
  llama.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\bin\Release\llama.dll
  llava.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\examples\llava\llava.dir\Release\llava.lib
  common.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\common\Release\common.lib
     Creating library C:/Users/Justin/Workspace/ollama/llm/build/windows/amd64/cuda_v12.3/ext_server/Release/ollama_llama_server.lib and object C:/Users/Justin/Workspace/ollama/
  llm/build/windows/amd64/cuda_v12.3/ext_server/Release/ollama_llama_server.exp
  ollama_llama_server.vcxproj -> C:\Users\Justin\Workspace\ollama\llm\build\windows\amd64\cuda_v12.3\bin\Release\ollama_llama_server.exe
gzip not installed, not compressing files
Updated 1 path from the index
Updated 1 path from the index

go generate completed.  LLM runners: cpu cpu_avx cpu_avx2 cuda_v12.3 cuda_v12.4
PS C:\Users\Justin\Workspace\ollama> go build .
PS C:\Users\Justin\Workspace\ollama> .\ollama.exe serve
time=2024-04-08T19:24:05.373-05:00 level=INFO source=images.go:793 msg="total blobs: 53"
time=2024-04-08T19:24:06.287-05:00 level=INFO source=images.go:800 msg="total unused blobs removed: 2"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.EmbeddingsHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-04-08T19:24:06.288-05:00 level=INFO source=routes.go:1121 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-04-08T19:24:06.298-05:00 level=INFO source=payload.go:28 msg="extracting embedded files" dir=C:\Users\Justin\AppData\Local\Temp\ollama4257825441\runners
time=2024-04-08T19:24:06.329-05:00 level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v12.3]"
PS C:\Users\Justin\Workspace\ollama> .\ollama.exe create commandrplus -f C:\Users\Justin\Downloads\ModelFile
transferring model data
creating model layer
Error: invalid file magic

@gigq Not sure where the problem is, but you should take a look at this -> https://www.reddit.com/r/LocalLLaMA/comments/1bymeyw/command_r_plus_104b_working_with_ollama_using/

Thanks, yeah oddly with that compile of ollama and llama.cpp I'm able to load his model from https://ollama.com/sammcj/cohereforai_c4ai-command-r-plus but not yours. I was actually trying to load yours because while the sammcj one loads the output is a little off. Small prompts seem ok, but as soon as I have any real context I start getting repeating output from the LLM or even just garbage word list output. Just wanted to try another model to see if the issue was the model or my compile.

@gigq You might want to report this to sammcj, I see in the link you provided he used my matrix on his FP16 weights so maybe the issue is there? I always use the imatrix of the weights they have been trained on, but I'm not sure if this is a hard requirement.

Sign up or log in to comment