Shisa.AI

company

AI & ML interests

None defined yet.

Recent Activity

leonardlin  updated a model about 2 months ago
kinokokoro/mistral-nemo-webnovels
leonardlin  updated a model 7 months ago
shisa-ai/shisa-v1-qwen2-7b
leonardlin  updated a model 7 months ago
shisa-ai/shisa-v1-llama3-70b-gguf
View all activity

shisa-ai's activity

leonardlin 
posted an update 6 months ago
view post
Post
1897
My weekened project ended up being doing some testing between torchtune, axolotl, and unsloth. I *think* it's a 1:1 comparison of what LoRA fine-tuning performance looks like between the different hardware I have in my dev boxes (4090, 3090, 7900 XTX, W7900) with a few other interesting tidbits.

Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx
leonardlin 
posted an update 7 months ago
leonardlin 
posted an update 7 months ago
view post
Post
1935
Interesting, I've just seen the my first HF spam on one of my new model uploads: shisa-ai/shisa-v1-llama3-70b - someone has an SEO spam page as a HF space attached to the model!?! Wild. Who do I report this to?
·
leonardlin 
posted an update 7 months ago
view post
Post
1606
For those with an interest in JA language models, this Llama 3 70B test ablation looks like it is the current strongest publicly released, commercially usable, open model available. A lot of caveats I know, but it also matches gpt-3.5-turbo-0125's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).

shisa-ai/shisa-v1-llama3-70b

augmxnt/ultra-orca-boros-en-ja-v1
  • 2 replies
·
leonardlin 
posted an update 7 months ago