# TODO - FastAPI - Generate: - other params? (temperature, rvc f0_method, min_eos_p) - Bark TTS - Set speech sentiment in request body, then prefix each sentence e.g. `[Happy]` - Should I aim to combine sentences which will fit in the largest clip length? (14s?) - More consistent tone etc in bark output? - Should map the rvc model to a chosen bark voice (incl. default)? - Or set via request body? - Currently hardcoded for bark voice `v2/en_speaker_9` (works well with all tested RVC models, regardless of gender etc) - RVC - Should I re-use the Config details? (GPU info etc) - Set `CUDA_VISIBLE_DEVICES` for bark - Should I load hubert model for each request? Precious VRAM - Smaller container image - Currently used to confirm app dependencies only, don't care that it's 6 GiB # Issues - Splits on sentences. So a single sentence which takes longer than ~14 seconds will be a mess - Significantly slower bark generation when ran via API vs directly in python script - Generation time roughly equals audio length (tested on 3090)