Glossolalia Dial LoRA

The trained "dial" behind Glossolalia Dial: a rank-16 LoRA on F5-TTS attention plus a co-trained scalar-to-vector network (LevelEmbed) that adds a dial value into the model's timestep embedding. One control grades a typed sentence from clean speech (dial 0) to wordless glossolalia (dial 4), in the same cloned voice.

Files

  • adapter_model.safetensors, adapter_config.json: the rank-16 LoRA (PEFT).
  • level_to_time.safetensors, level_to_time.json: the LevelEmbed MLP that maps the dial value into the time embedding.

Use

Load through the app's patched F5-TTS, which wires the LoRA + LevelEmbed and exposes set_dial(0..4):

Clone the repo first so its patches/ module is on your path, then run from inside it:

git clone https://github.com/akshan-main/glossolalia && cd glossolalia
from f5_tts.api import F5TTS
import patches  # local folder from the cloned repo; adds load_lora + set_dial to F5TTS
tts = F5TTS(model="F5TTS_v1_Base")
tts.load_lora("akshan-main/glossolalia-dial-lora")
tts.set_dial(4)  # 0 clean, 4 full glossolalia
tts.infer(ref_file="voice.wav", ref_text="...", gen_text="she sells seashells by the seashore", file_wave="out.wav")

Training

30k clips made by corrupting 3000 public-domain sentences' phonemes at five rising rates and having base F5-TTS read each in two reference voices. The model only sees the clean sentence plus the dial value and learns the slide from the dial alone. Pipeline: github.com/akshan-main/glossolalia. Inputs: akshan-main/glossolalia-inputs.

Downloads last month
173
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for akshan-main/glossolalia-dial-lora

Base model

SWivid/F5-TTS
Adapter
(2)
this model

Spaces using akshan-main/glossolalia-dial-lora 3