Steel PRO

Steelskull

AI & ML interests

A dude making models as a Hobby. WHERE IS AN OLD MODEL? --> https://huggingface.co/SteelStorage

Organizations

Posts 1

view post
Post
1849
Myself @Steelskull and @elinas have been working on a new rendition of the Aethora-15B model, that's built on the Llama 3 architecture, and we've optimized it especially for creative writing tasks ( Both kinds ;D ) while maintaining strong general intelligence capabilities.

Model: L3-Aethora-15B-V2
ZeusLabs/L3-Aethora-15B-V2

Dataset: Aether-Lite-v1.8.1
TheSkullery/Aether-Lite-v1.8.1

What we've built:
A modified DUS (Depth Up Scale) model (originally created by Elinas) by using passthrough to create a 15b model, with specific adjustments (zeroing) to 'o_proj' and 'down_proj', enhancing its efficiency and reducing perplexity

Trained for 17.5 hours on 4 x A100 GPUs (huge thanks to g4rg for sponsoring the compute!)

Uses our Aether-Lite-V1.8.1 dataset with Large 125k high-quality samples
Focuses on creative writing and storytelling, with robust general intelligence

What makes L3-Aethora-15B v2 unique:
Creative Writing: We've really pushed its capabilities in generating engaging narratives, poetry, and adapting to various writing styles, RP and genres.

Versatile Intelligence: While we focused on creative tasks, it still handles scientific discussions, problem-solving, and educational content creation like a champ.

Long Context Understanding: Trained on the full sequence length of 8192 tokens, it maintains coherent conversations over extended interactions.

Carefully Curated Dataset: Alot of work was put into Aether-Lite-V1.8.1, our training dataset. It combines creative writing, instructional content, and specialized knowledge from various high-quality sources. All brought together by a custom data pipeline. (more information on the process is available on the dataset page)

Open Source: We've made both the model and the full dataset available to the community.

We'd love your ideas and recommendations for further improvements!

datasets

None public yet