AetherArchitectural

community

Verified

https://arch.datasets.fyi

AI & ML interests

Aetherarchio, Lewdiculous and FantasiaFoundry's general AI, ML and LLM related community projects. [aetherarchitectural@datasets.fyi]

Recent Activity

Lewdiculous updated a Space 2 days ago

AetherArchitectural/README

Lewdiculous updated a model 3 days ago

AetherArchitectural/EXAONE-3.5-7.8B-Instruct-abliterated-GGUF-ARM-Imatrix-Community

Lewdiculous updated a collection 3 days ago

Quantizations

View all activity

AetherArchitectural's activity

Lewdiculous

updated a Space 2 days ago

Running

🔹

AboutAetherArchitectural

Welcome to the AetherArchitectural Community!

Lewdiculous

updated a model 3 days ago

AetherArchitectural/EXAONE-3.5-7.8B-Instruct-abliterated-GGUF-ARM-Imatrix-Community

Updated 3 days ago • 119 • 4

Lewdiculous

updated a collection 3 days ago

Quantizations

Collection

Newest items at the bottom of the list. • 2 items • Updated 2 days ago • 1

Lewdiculous

posted an update 3 days ago

Post

1385

Hello fellow LLMers, just a quick notice that some of my activity will be moved into the AetherArchitectural Commuity and split with @Aetherarchio .

[here] https://huggingface.co/AetherArchitectural

All activity should be visible in the left side of my profile.

1 reply

Aetherarchio

updated a model 5 days ago

AetherArchitectural/EXAONE-3.5-2.4B-Instruct-abliterated-GGUF-IQ-ARM-Imatrix-Community

Updated 5 days ago • 256 • 3

Aetherarchio

updated a collection 5 days ago

Quantizations

Collection

Newest items at the bottom of the list. • 2 items • Updated 2 days ago • 1

Aetherarchio

updated a model 5 days ago

AetherArchitectural/GGUF-Quantization-Script

Text Generation • Updated 5 days ago • 63

Aetherarchio

in AetherArchitectural/README 5 days ago

Welcome to AetherArchitectural!

#1 opened 5 days ago by

Aetherarchio

updated a Space 5 days ago

Running

🔹

AboutAetherArchitectural

Welcome to the AetherArchitectural Community!

Lewdiculous

in AetherArchitectural/Community-Discussions about 1 month ago

Emphasis DFSM - Nexesenex/kobold.cpp

#18 opened 5 months ago by

Lewdiculous

updated a collection about 1 month ago

Social

Collection

1 item • Updated 3 days ago • 2

Lewdiculous

in AetherArchitectural/Community-Discussions about 1 month ago

Sampling Resources and Conjecture

#2 opened 8 months ago by

Clevyby

Llama-3 SillyTavern Presets Sharing

#5 opened 8 months ago by

Lewdiculous

updated a model about 1 month ago

AetherArchitectural/Community-Discussions

Updated Nov 19 • 21

Lewdiculous

updated a collection about 1 month ago

Tools

Collection

1 item • Updated 3 days ago • 3

FantasiaFoundry

updated a model about 1 month ago

AetherArchitectural/GGUF-Quantization-Script

Text Generation • Updated 5 days ago • 63

Lewdiculous

posted an update 7 months ago

Post

42441

More context for your Pascal GPU or older!

Update: Now available in the official releases of KoboldCpp!
[releases] https://github.com/LostRuins/koboldcpp/releases/latest

These are great news for all the users with GTX 10XX, P40...

Flash Attention implementation for older NVIDIA GPUs without requiring Tensor Cores has come to llama.cpp in the last few days, and should be merged in the next version of KoboldCpp, you can already try it with another fork or by building it.

[Mentioned KCPP fork] https://github.com/Nexesenex/kobold.cpp/releases/latest

[PR] https://github.com/ggerganov/llama.cpp/pull/7188

You should expect less VRAM usage for the same context, allowing you to experience higher contexts with your current GPU.

There have also been reported final tokens/second speed improvements for inference, so that's also grand!

If you have tried it, I'd like to hear your experiences with --flashattention so far, especially for this implementation and for the large number of Pascal (GTX 10XX, P40...) cards.

Discussion linked bellow, with more links to relevant information:

https://huggingface.co/LWDCLS/LLM-Discussions/discussions/11

Cheers!

24 replies

Lewdiculous

posted an update 7 months ago

Post

43903

Updated: Lumimaid and TheSpice-v0.8.3

I have uploaded version 2 (v2) files for the Llama-3-Lumimaid-8B-v0.1-OAS GGUF Imatrix quants.

[model] Lewdiculous/Llama-3-Lumimaid-8B-v0.1-OAS-GGUF-IQ-Imatrix

You can recognize the new files by their v2 prefix.

Imatrix data was generated from the FP16 and conversions directly from the BF16.
Hopefully avoiding any losses in the model conversion, as has been the recently discussed topic on Llama-3 and GGUF lately.

This is more disk and compute intensive so lets hope we get GPU inference support for BF16 models in llama.cpp.

If you are able to test them and noticed any issues compared to the original quants, let me know in the corresponding discussions.

---

Additionally, L3-TheSpice-8b-v0.8.3 GGUF Imatrix quants were also updated.

[model] Lewdiculous/L3-TheSpice-8b-v0.8.3-GGUF-IQ-Imatrix

7 replies

AI & ML interests

Recent Activity

Team members 3

AetherArchitectural's activity

AboutAetherArchitectural

Welcome to AetherArchitectural!

AboutAetherArchitectural

Emphasis DFSM - Nexesenex/kobold.cpp

Sampling Resources and Conjecture

Llama-3 SillyTavern Presets Sharing