4 7

Elize Veldhuizen

Elizezen

AI & ML interests

None yet

Recent Activity

published a model about 2 months ago

Elizezen/SlaughterHouse-exp-nsfw-7B

published a model about 2 months ago

Elizezen/SlaughterHouse-exp-7B

updated a model about 2 months ago

Elizezen/SlaughterHouse-exp-nsfw-7B

View all activity

Organizations

Elizezen's activity

published 2 models about 2 months ago

Elizezen/SlaughterHouse-exp-nsfw-7B

Text Generation • Updated Jan 27 • 339 • 3

Elizezen/SlaughterHouse-exp-7B

Text Generation • Updated Jan 27 • 27 • 1

updated 2 models about 2 months ago

Elizezen/SlaughterHouse-exp-nsfw-7B

Text Generation • Updated Jan 27 • 339 • 3

Elizezen/SlaughterHouse-exp-7B

Text Generation • Updated Jan 27 • 27 • 1

updated a model 3 months ago

Elizezen/Kudryavka-8B-alpha

Updated Dec 25, 2024 • 18 • 2

liked a dataset 5 months ago

botp/RyokoAI_Syosetu711K

Preview • Updated Aug 18, 2023 • 220 • 23

liked 2 models 6 months ago

Aratako/calm3-22b-RP-v2-GGUF

Updated Sep 16, 2024 • 503 • 4

Aratako/calm3-22b-RP-v2

Text Generation • Updated Sep 16, 2024 • 180 • 11

updated 2 models 7 months ago

Elizezen/Himeyuri-v0.1-12B-GGUF

Updated Aug 26, 2024 • 36 • 3

Elizezen/Himeyuri-v0.1-12B

Updated Aug 26, 2024 • 117 • 14

reacted to louisbrulenaudet's post with 🔥 9 months ago

Post

2968

Announcing the creation of the "HF for Legal" organization, an open-source community dedicated to demystifying language models for legal professionals 🤗

Whether you're a practicing attorney, a legal scholar, or a technologist interested in legal applications of AI, HF for Legal may be your hub for exploration, learning, and free innovation ⚗️

On the occasion of this launch, you'll be able to find several notebooks I've been developing over the last few months for TSDAE pre-training of embedding models, the generation of indexes for semantic search, based on the formidable work of @tomaarsen and @nreimers , adapted to the field of French law, or the addition of information retrieval tasks to the MTEB.

Join us in our mission to make AI more accessible and understandable for the legal world, ensuring that the power of language models can be harnessed effectively and ethically.

Link to the org: https://huggingface.co/HFforLegal

Special thanks to @clem for encouraging me to start this organization. Let's hope we can bring together all the enthusiasts who work in this field.

Let's code and share together! 🚀🔗

updated 2 models 9 months ago

Elizezen/Berghof-ERP-7B

Text Generation • Updated Jun 21, 2024 • 117 • 10

Elizezen/Berghof-NSFW-7B

Text Generation • Updated Jun 21, 2024 • 1.6k • 16

reacted to lunarflu's post with ❤️ 10 months ago

Post

1984

cooking up something....anyone interested in a daily activity tracker for HF?

12 replies

liked a model 10 months ago

dddump/Japanese-Chat-Evolve-TEST-7B-NSFW-gguf

Text Generation • Updated May 18, 2024 • 536 • 18

updated 2 models 10 months ago

Elizezen/Berghof-vanilla-7B

Text Generation • Updated May 23, 2024 • 12 • 3

Elizezen/Wolfsschanze-2x7B

Text Generation • Updated May 20, 2024 • 7

liked a dataset 10 months ago

AkitoP/Hscene-Speech

Updated May 25, 2024 • 137 • 53

posted an update 10 months ago

Post

3469

It turned out that the following simple method seems to be actually effective when you want to increase the appearance probability of only one or a very limited number of tokens.

import os

one_token = "♡" # Token to increase the appearance probability
value = 1000000

token = one_token * value

with open("one-token.txt", "w", encoding="utf-8") as f:
    f.write(token)

By training LoRA with unsloth based on the .txt file generated by the code above, you can increase the appearance probability of specific tokens while maintaining the model's performance to great extent. However, it's better to stop the training before train loss becomes 0.0, as it will start spamming the token once it appears even once. In general, you can stop training at a very early stage and it will still work.

It is also possible to reduce the appearance probability of specific tokens by creating an over-learned LoRA with the specific tokens you want to reduce, combining it with the model, and then creating a model that extracts only the difference using the chat vector method and subtracting it from an arbitrary model.

In this case, it is better to set the ratio of chat vector to about five times. It has very little effect on the overall performance, apart from the specific tokens.

new_v = v - (5.0 * chat_vector[i].to(v.device))

New activity in Local-Novel-LLM-project/Vecteus-v1 10 months ago

Weird tendency for the token "紫"

#1 opened 10 months ago by

Elizezen