Shisa.AI

company

https://shisa.ai/

shisa-ai

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

leonardlin updated a model about 2 months ago

kinokokoro/mistral-nemo-webnovels

leonardlin updated a model 7 months ago

shisa-ai/shisa-v1-qwen2-7b

leonardlin updated a model 7 months ago

shisa-ai/shisa-v1-llama3-70b-gguf

View all activity

shisa-ai's activity

leonardlin

updated a model about 2 months ago

kinokokoro/mistral-nemo-webnovels

Updated Oct 28 • 2

leonardlin

posted an update 6 months ago

Post

1897

My weekened project ended up being doing some testing between torchtune, axolotl, and unsloth. I *think* it's a 1:1 comparison of what LoRA fine-tuning performance looks like between the different hardware I have in my dev boxes (4090, 3090, 7900 XTX, W7900) with a few other interesting tidbits.

Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx

leonardlin

posted an update 7 months ago

Post

2459

Maybe of interest, I just finished a long writeup of my weekend project exploring Qwen 2 7B Instruct's Chinese censorship: https://huggingface.co/blog/leonardlin/chinese-llm-censorship-analysis

I also have an accompanying model and dataset (and codebase) for those curious to poke around:

* augmxnt/Qwen2-7B-Instruct-deccp

* augmxnt/deccp

leonardlin

updated 10 models 7 months ago

leonardlin

posted an update 7 months ago

Post

1935

Interesting, I've just seen the my first HF spam on one of my new model uploads: shisa-ai/shisa-v1-llama3-70b - someone has an SEO spam page as a HF space attached to the model!?! Wild. Who do I report this to?

4 replies

leonardlin

posted an update 7 months ago

Post

1606

For those with an interest in JA language models, this Llama 3 70B test ablation looks like it is the current strongest publicly released, commercially usable, open model available. A lot of caveats I know, but it also matches gpt-3.5-turbo-0125's JA performance, which is worth noting, and is tuned *exclusively* with the old shisa-v1 dataset (so it's chart position will be very short lived).

shisa-ai/shisa-v1-llama3-70b

augmxnt/ultra-orca-boros-en-ja-v1

2 replies

leonardlin

updated 4 models 7 months ago

shisa-ai/shisa-v1-llama3-8b.2e5

Text Generation • Updated May 19 • 28

shisa-ai/shisa-v1-yi1.5-9b

Text Generation • Updated May 19 • 21 • 1

shisa-ai/shisa-v1-gemma-8b

Text Generation • Updated May 19 • 15

shisa-ai/shisa-v1-swallowmx-13a47b

Text Generation • Updated May 19 • 20

leonardlin

posted an update 7 months ago

Post

1942

With slurm figured out and ablations humming along, I though I'd update and post my understanding of the legal status of training data in Japan. It is in general, much clearer in the US: https://huggingface.co/blog/leonardlin/ai-training-data-in-japan

AI & ML interests

Recent Activity

Team members 1

shisa-ai's activity