Fix NaNs in model output

#14

Running the example code in the model card may result in NaNs in the embedding output due to the non-persistent freqs_cis buffer not being initialized properly when the pre-trained model is loaded. This PR ensures that the frequencies are initialized and loaded onto the device.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment