Can't find zephyr-7b-beta cache using optimum cli list command.

#21
by Anurag2132 - opened

I am a beginner, facing issues with finding and loading the cache files that i need for zephyr-7b-beta. I am using the commands given on the guides, but getting issues like repo not found. Can someone please help with that. As in give the exact commands to find and load the models I mentioned.

AWS Inferentia and Trainium org

Hi, please make sure you have the latest version of optimum-neuron installed:

$ pip install -U optimum-neuron

Then type:

$ optimum-cli neuron cache lookup HuggingFaceH4/zephyr-7b-beta

*** 0 entrie(s) found in cache for HuggingFaceH4/zephyr-7b-beta for training.*** 

*** 12 entrie(s) found in cache for HuggingFaceH4/zephyr-7b-beta for inference.*** 
...

Hey, thank you for the response. I get this when I try that: optimum-cli neuron cache lookup HuggingFaceH4/zephyr-7b-beta
usage: optimum-cli neuron cache [-h] {create,set,add,list,synchronize} ...
optimum-cli neuron cache: error: argument {create,set,add,list,synchronize}: invalid choice: 'lookup' (choose from 'create', 'set', 'add', 'list', 'synchronize')

does lookup not work on Inf2?

AWS Inferentia and Trainium org

You don't seem to have the latest version of optimum-neuron (0.0.20).

$ pip show optimum-neuron
Name: optimum-neuron
Version: 0.0.20
...
$ optimum-cli neuron cache -h
usage: optimum-cli neuron cache [-h] {create,set,add,synchronize,lookup} ...

positional arguments:
  {create,set,add,synchronize,lookup}
    create              Create a model repo on the Hugging Face Hub to store Neuron X compilation files.
    set                 Set the name of the Neuron cache repo to use locally (trainium only).
    add                 Add a model to the cache of your choice (trainium only).
    synchronize         Synchronize the neuronx compiler cache with a hub cache repo.
    lookup              Lookup the neuronx compiler hub cache for the specified model id.

options:
  -h, --help            show this help message and exit

Thank you, updating optimum worked.
Is there also a way to download or load the neff files to my local environment so that I don't have to export a model? Sorry if it is a stupid question, this is not really my domain..

AWS Inferentia and Trainium org

If you export the model for one of the cached configuration (batch_size, sequence_length, auto_cast_type, num_cores), then the cached NEFFS will be fetched automatically (you'll see messages on the console).

dacorvo changed discussion status to closed

Sign up or log in to comment