Coild / prepare_dataset.py

Commit History

Add domain and subdomain handling in dataset preparation and admin routes; enhance data submission with new fields
92a1315

loko-dev commited on

Refactor audio processing to implement fade-in, end trim, and fade-out effects for improved sound quality
f7fea0f

loko-dev commited on

Update server-side audio trimming logic to adjust duration from 150ms to 200ms for improved accuracy
820b726

loko-dev commited on

Update audio trimming logic to adjust duration and sample calculations for improved accuracy
7372bcc

loko-dev commited on

Enhance PCM data processing by implementing trimming at both start and end, improving audio quality and consistency
a8e78ba

loko-dev commited on

Enhance audio processing by implementing server-side trimming and updating metadata handling for PCM recordings
7dbf872

loko-dev commited on

Refactor audio saving process to use PCM data with wave module, enhancing compatibility and flexibility in audio file handling
0f1d16d

loko-dev commited on

Remove unused function generate_verified_parquets from prepare_dataset.py to streamline code and improve maintainability
a93c6eb

loko-dev commited on

Adjust sync interval to 3 minutes; refactor SiriWave initialization and resize handling for improved performance
d1ae720

loko-dev commited on

Enhance login functionality and UI; add email visibility option for OAuth2, adjust admin CSS for responsive design, and streamline audio metadata handling in storage
adcee47

loko-dev commited on

Refactor metadata handling in audio recording; streamline local file operations and remove unnecessary metadata storage calls
5ebeb52

loko-dev commited on

Enhance metadata handling and validation in recording storage; add transcription ID management and adjust scheduler timing
1fde105

loko-dev commited on

Update transcriptions table to include user_id and language fields; modify uploaded_at to use TIMESTAMP WITHOUT TIME ZONE. Adjust generate_verified_parquets function to use SYNC_INTERVAL for scheduling.
956c93b

loko-dev commited on

Refactor audio processing to remove Hugging Face upload functionality and add verified metadata handling for local storage; implement daily parquet generation for verified recordings.
a97a572

loko-dev commited on

Implement PostgreSQL metadata storage and create database tables for transcriptions and metadata
26337ba

loko-dev commited on

Refactor user session handling: switch to attribute access for Record objects, ensure session user is a dict, and improve stats handling in dataset preparation
707d5e0

loko-dev commited on

Implement dataset synchronization with Hugging Face: add scheduler for automated uploads, update audio path structure, and enhance metadata handling for Supabase.
cfad1b3

loko-dev commited on

Add age group and accent fields to session form and update related logic
221c6f9

loko-dev commited on

Refactor Supabase client initialization and Hugging Face upload logic to ensure proper authentication checks
b7bf589

loko-dev commited on

Add saving state management to the recorder interface, disabling controls during save operations
6bdf6a3

loko-dev commited on

Refactor dataset preparation to support language-specific audio storage and metadata management; streamline directory creation and improve error handling.
d7c78ed

loko-dev commited on

Refactor dataset handling by updating .gitignore, implementing stats and recordings management in parquet format, and modifying filename generation to use a shortened user ID prefix.
e0735ac

loko-dev commited on

Add database migrations for recordings and profiles, implement user profile creation, and enhance metadata handling
3f937d8

loko-dev commited on

Add initial project setup with Docker, environment configuration, and authentication middleware
5a8e751

loko-dev commited on