New base model to fine-tune Hathor on: Replete-AI/Replete-Coder-Instruct-8b-Merged

#2
by Joseph717171 - opened

@Nitral-AI Try Replete-AI/Replete-Coder-Instruct-8b-Merged; its Replete-AI/Llama3-8B-Instruct-Replete-Adapted TIES merged with meta-llama/Meta-Llama-3-8B-Instruct. It's very coherent, retaining Meta-Llama-3-8B-Instruct's beviours, and it is more willing to (E)RP with prompt injection. I feel it would make a great additional test bed for Hathor. Give it a go, if you feel so inclined. As always... Cheers! 😁

Here are GGUF Quantized versions of it to try:

@Nitral-AI Try Replete-AI/Replete-Coder-Instruct-8b-Merged; its Replete-AI/Llama3-8B-Instruct-Replete-Adapted TIES merged with meta-llama/Meta-Llama-3-8B-Instruct. It's very coherent, retaining Meta-Llama-3-8B-Instruct's beviours, and it is more willing to (E)RP with prompt injection. I feel it would make a great additional test bed for Hathor. Give it a go, if you feel so inclined. As always... Cheers! 😁

Here are GGUF Quantized versions of it to try:

ill take a look at this one over this weekend when i have some more time here.

@Nitral-AI Try Replete-AI/Replete-Coder-Instruct-8b-Merged; its Replete-AI/Llama3-8B-Instruct-Replete-Adapted TIES merged with meta-llama/Meta-Llama-3-8B-Instruct. It's very coherent, retaining Meta-Llama-3-8B-Instruct's beviours, and it is more willing to (E)RP with prompt injection. I feel it would make a great additional test bed for Hathor. Give it a go, if you feel so inclined. As always... Cheers! 😁

Here are GGUF Quantized versions of it to try:

Planning on restructuring the datasets before i do another FT, but there may be a merge model that takes inspiration from this going up as a test model. (Plan is to include Replete-Coder-Llama3-8B, Llama-3-Instruct-8B-SPPO-Iter3, Hathor_Stable-v0.2)

https://discord.com/channels/1212518493514367076/1212538781488652328/1260714601071509684

@Nitral-AI @rombodawg said that you can fix the adapter onto Llama-3-Instruct-8B-SPPO-Iter3 using Unsloth then we wouldn’t need to merge Llama-3-Instruct-8B-SPPO-Iter3 along with RepleteCoderLlama3-8B. πŸ€”

Just run the code to finetune llama-3 using unsloth using Llama-3-Instruct-8B-SPPO-Iter3 as the base model, and use this as the dataset Replete-AI/code_bagel_hermes-2.5-1000. then instead of running

trainer_stats = trainer.train()

Run this

trainer.train(resume_from_checkpoint = "checkpoint-1000")

And make sure in your google drive you make a folder named "checkpoint-1000" where you put the adapter files. If you get any errors just delete the "optimizer.pt" file and run it again.

Oh and I would reduce the per_device_train_batch_size to 1 and gradient_accumulation_steps to 1, and num_train_epochs to 1 and warmup_steps 5

You can just use this colab doc as a reffrence
https://colab.research.google.com/drive/1VAaxMQJN9-78WLsPU0GWg5tEkasXoTP9?usp=sharing

https://discord.com/channels/1212518493514367076/1212538781488652328/1260714601071509684

@Nitral-AI @rombodawg said that you can fix the adapter onto Llama-3-Instruct-8B-SPPO-Iter3 using Unsloth then we wouldn’t need to merge Llama-3-Instruct-8B-SPPO-Iter3 along with RepleteCoderLlama3-8B. πŸ€”

Just run the code to finetune llama-3 using unsloth using Llama-3-Instruct-8B-SPPO-Iter3 as the base model, and use this as the dataset Replete-AI/code_bagel_hermes-2.5-1000. then instead of running

trainer_stats = trainer.train()

Run this

trainer.train(resume_from_checkpoint = "checkpoint-1000")

And make sure in your google drive you make a folder named "checkpoint-1000" where you put the adapter files. If you get any errors just delete the "optimizer.pt" file and run it again.

Oh and I would reduce the per_device_train_batch_size to 1 and gradient_accumulation_steps to 1, and num_train_epochs to 1 and warmup_steps 5

You can just use this colab doc as a reffrence
https://colab.research.google.com/drive/1VAaxMQJN9-78WLsPU0GWg5tEkasXoTP9?usp=sharing

I dont see the reason to expend additional resources (that i dont have atm) to finetune in a manner that even replete themselves went the route of merging. Especially for a model thats supposed to tell me whether using there dataset is worthwhile or not. The version of hathor im putting inside is an unreleased dpo of 0.2 stable (very experimental), i still dont know whether or not SPPO is worth it in end use, and i have my own programming sets that I'm much more familiar with than this coding model and i know work well with hathor's data already.

If i was going to do this myself, it would SFT over L3-instruct as the base with all the datasets in question reformed to L3-Instruct, either DPO or SPPO from there. (Albeit im kinda hoping for a longer context L3 before i do another finetune anyways.)

You are better off adapting it, i only merged the intruct model, after adding the adapter to the instruct model and merging onto the original

https://discord.com/channels/1212518493514367076/1212538781488652328/1260714601071509684

@Nitral-AI @rombodawg said that you can fix the adapter onto Llama-3-Instruct-8B-SPPO-Iter3 using Unsloth then we wouldn’t need to merge Llama-3-Instruct-8B-SPPO-Iter3 along with RepleteCoderLlama3-8B. πŸ€”

Just run the code to finetune llama-3 using unsloth using Llama-3-Instruct-8B-SPPO-Iter3 as the base model, and use this as the dataset Replete-AI/code_bagel_hermes-2.5-1000. then instead of running

trainer_stats = trainer.train()

Run this

trainer.train(resume_from_checkpoint = "checkpoint-1000")

And make sure in your google drive you make a folder named "checkpoint-1000" where you put the adapter files. If you get any errors just delete the "optimizer.pt" file and run it again.

Oh and I would reduce the per_device_train_batch_size to 1 and gradient_accumulation_steps to 1, and num_train_epochs to 1 and warmup_steps 5

You can just use this colab doc as a reffrence
https://colab.research.google.com/drive/1VAaxMQJN9-78WLsPU0GWg5tEkasXoTP9?usp=sharing

I dont see the reason to expend additional resources (that i dont have atm) to finetune in a manner that even replete themselves went the route of merging. Especially for a model thats supposed to tell me whether using there dataset is worthwhile or not. The version of hathor im putting inside is an unreleased dpo of 0.2 stable (very experimental), i still dont know whether or not SPPO is worth it in end use, and i have my own programming sets that I'm much more familiar with than this coding model and i know work well with hathor's data already.

If i was going to do this myself, it would SFT over L3-instruct as the base with all the datasets in question reformed to L3-Instruct, either DPO or SPPO from there. (Albeit im kinda hoping for a longer context L3 before i do another finetune anyways.)

You do realize you can add the adapter onto your model completely for free with google colab right? You dont need to "Expend additional recourses". Google colab has a free version that you can do this on any google account.

https://discord.com/channels/1212518493514367076/1212538781488652328/1260714601071509684

@Nitral-AI @rombodawg said that you can fix the adapter onto Llama-3-Instruct-8B-SPPO-Iter3 using Unsloth then we wouldn’t need to merge Llama-3-Instruct-8B-SPPO-Iter3 along with RepleteCoderLlama3-8B. πŸ€”

Just run the code to finetune llama-3 using unsloth using Llama-3-Instruct-8B-SPPO-Iter3 as the base model, and use this as the dataset Replete-AI/code_bagel_hermes-2.5-1000. then instead of running

trainer_stats = trainer.train()

Run this

trainer.train(resume_from_checkpoint = "checkpoint-1000")

And make sure in your google drive you make a folder named "checkpoint-1000" where you put the adapter files. If you get any errors just delete the "optimizer.pt" file and run it again.

Oh and I would reduce the per_device_train_batch_size to 1 and gradient_accumulation_steps to 1, and num_train_epochs to 1 and warmup_steps 5

You can just use this colab doc as a reffrence
https://colab.research.google.com/drive/1VAaxMQJN9-78WLsPU0GWg5tEkasXoTP9?usp=sharing

I dont see the reason to expend additional resources (that i dont have atm) to finetune in a manner that even replete themselves went the route of merging. Especially for a model thats supposed to tell me whether using there dataset is worthwhile or not. The version of hathor im putting inside is an unreleased dpo of 0.2 stable (very experimental), i still dont know whether or not SPPO is worth it in end use, and i have my own programming sets that I'm much more familiar with than this coding model and i know work well with hathor's data already.

If i was going to do this myself, it would SFT over L3-instruct as the base with all the datasets in question reformed to L3-Instruct, either DPO or SPPO from there. (Albeit im kinda hoping for a longer context L3 before i do another finetune anyways.)

You do realize you can add the adapter onto your model completely for free with google colab right? You dont need to "Expend additional recourses". Google colab has a free version that you can do this on any google account.

Sorry to get back to this so late, yes i was aware as to collab having a free tier. However i was not aware 4bit unsloth- qlora and gallore would make it feasible to do so on a free tier of collab. Thank you for pointing that out. I will try and do so again here at some point, but i haven't had the time to fight with python dependency hell to get it working properly.

Create a new conda environment just for this. Install all the necessary packages - dependency hell mitigated. πŸ˜‹

Sign up or log in to comment