Spaces:
Running
Running
Upload 6 files
Browse files- README.md +148 -14
- app.py +133 -0
- basic_pitch_handler.py +72 -0
- demucs_handler.py +80 -0
- requirements.txt +6 -0
- validators.py +37 -0
README.md
CHANGED
@@ -1,14 +1,148 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Audio Processing Pipeline: Stem Separation and MIDI Conversion
|
2 |
+
|
3 |
+
## Project Overview
|
4 |
+
A production-ready web application that separates audio stems and converts them to MIDI using state-of-the-art deep learning models. Built with Gradio and deployed on LightningAI, this pipeline provides an intuitive interface for audio processing tasks.
|
5 |
+
|
6 |
+
## Technical Requirements
|
7 |
+
|
8 |
+
### Dependencies
|
9 |
+
```bash
|
10 |
+
pip install gradio>=4.0.0
|
11 |
+
pip install demucs>=4.0.0
|
12 |
+
pip install basic-pitch>=0.4.0
|
13 |
+
pip install torch>=2.0.0 torchaudio>=2.0.0
|
14 |
+
pip install soundfile>=0.12.1
|
15 |
+
pip install numpy>=1.26.4
|
16 |
+
pip install pretty_midi>=0.2.10
|
17 |
+
```
|
18 |
+
|
19 |
+
### File Structure
|
20 |
+
```
|
21 |
+
project/
|
22 |
+
├── app.py # Main Gradio interface and processing logic
|
23 |
+
├── demucs_handler.py # Audio stem separation handler
|
24 |
+
├── basic_pitch_handler.py # MIDI conversion handler
|
25 |
+
├── validators.py # Audio file validation utilities
|
26 |
+
└── requirements.txt
|
27 |
+
```
|
28 |
+
|
29 |
+
## Implementation Details
|
30 |
+
|
31 |
+
### demucs_handler.py
|
32 |
+
Handles audio stem separation using the Demucs model:
|
33 |
+
- Supports mono and stereo input
|
34 |
+
- Automatic stereo conversion for mono inputs
|
35 |
+
- Efficient tensor processing with PyTorch
|
36 |
+
- Proper error handling and logging
|
37 |
+
- Progress tracking during processing
|
38 |
+
|
39 |
+
### basic_pitch_handler.py
|
40 |
+
Manages MIDI conversion using Spotify's Basic Pitch:
|
41 |
+
- Optimized parameters for music transcription
|
42 |
+
- Support for polyphonic audio
|
43 |
+
- Pitch bend detection
|
44 |
+
- Configurable note duration and frequency ranges
|
45 |
+
- Robust MIDI file generation
|
46 |
+
|
47 |
+
### validators.py
|
48 |
+
Provides comprehensive audio file validation:
|
49 |
+
- Format verification (WAV, MP3, FLAC)
|
50 |
+
- File size limits (30MB default)
|
51 |
+
- Sample rate validation (8kHz-48kHz)
|
52 |
+
- Audio integrity checking
|
53 |
+
- Detailed error reporting
|
54 |
+
|
55 |
+
### app.py
|
56 |
+
Main application interface featuring:
|
57 |
+
- Clean, intuitive Gradio UI
|
58 |
+
- Multi-file upload support
|
59 |
+
- Stem type selection (vocals, drums, bass, other)
|
60 |
+
- Optional MIDI conversion
|
61 |
+
- Persistent file handling
|
62 |
+
- Progress tracking
|
63 |
+
- Comprehensive error handling
|
64 |
+
|
65 |
+
## Key Features
|
66 |
+
|
67 |
+
### Audio Processing
|
68 |
+
- High-quality stem separation using Demucs
|
69 |
+
- Support for multiple audio formats
|
70 |
+
- Automatic audio format conversion
|
71 |
+
- Efficient memory management
|
72 |
+
- Progress tracking during processing
|
73 |
+
|
74 |
+
### MIDI Conversion
|
75 |
+
- Accurate note detection
|
76 |
+
- Polyphonic transcription
|
77 |
+
- Configurable parameters:
|
78 |
+
- Note duration threshold
|
79 |
+
- Frequency range
|
80 |
+
- Onset detection sensitivity
|
81 |
+
- Frame-level pitch activation
|
82 |
+
|
83 |
+
### User Interface
|
84 |
+
- Simple, intuitive design
|
85 |
+
- Real-time processing feedback
|
86 |
+
- Preview capabilities
|
87 |
+
- File download options
|
88 |
+
|
89 |
+
## Deployment
|
90 |
+
|
91 |
+
### Local Development
|
92 |
+
```bash
|
93 |
+
# Clone repository
|
94 |
+
git clone https://github.com/eyov7/Aud2Stm2Mdi.git
|
95 |
+
|
96 |
+
# Install dependencies
|
97 |
+
pip install -r requirements.txt
|
98 |
+
|
99 |
+
# Run application
|
100 |
+
python app.py
|
101 |
+
```
|
102 |
+
|
103 |
+
### Lightning.ai Deployment
|
104 |
+
1. Create new Lightning App
|
105 |
+
2. Upload project files
|
106 |
+
3. Configure compute instance (CPU or GPU)
|
107 |
+
4. Deploy
|
108 |
+
|
109 |
+
## Error Handling
|
110 |
+
Implemented comprehensive error handling for:
|
111 |
+
- Invalid file formats
|
112 |
+
- File size limits
|
113 |
+
- Processing failures
|
114 |
+
- Memory constraints
|
115 |
+
- File system operations
|
116 |
+
- Model inference errors
|
117 |
+
|
118 |
+
|
119 |
+
## Production Features
|
120 |
+
- Robust file validation
|
121 |
+
- Persistent storage management
|
122 |
+
- Proper error logging
|
123 |
+
- Progress tracking
|
124 |
+
- Clean user interface
|
125 |
+
- Download capabilities
|
126 |
+
- Multi-format support
|
127 |
+
|
128 |
+
## Limitations
|
129 |
+
- Maximum file size: 30MB
|
130 |
+
- Supported formats: WAV, MP3, FLAC
|
131 |
+
- Single file processing (no batch)
|
132 |
+
- CPU-only processing by default
|
133 |
+
|
134 |
+
## Notes
|
135 |
+
- Ensure proper audio codec support
|
136 |
+
- Monitor system resources
|
137 |
+
- Regular temporary file cleanup
|
138 |
+
- Consider implementing rate limiting
|
139 |
+
- Add user session management
|
140 |
+
|
141 |
+
## Closing Note
|
142 |
+
This implementation is currently running successfully on Lightning.ai, providing reliable audio stem separation and MIDI conversion capabilities through an intuitive web interface.
|
143 |
+
|
144 |
+
|
145 |
+
|
146 |
+
|
147 |
+
|
148 |
+
|
app.py
ADDED
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import os
|
3 |
+
import tempfile
|
4 |
+
from pathlib import Path
|
5 |
+
from typing import List, Tuple, Optional
|
6 |
+
from concurrent.futures import ThreadPoolExecutor
|
7 |
+
import logging
|
8 |
+
import soundfile as sf
|
9 |
+
import numpy as np
|
10 |
+
import shutil
|
11 |
+
from validators import AudioValidator
|
12 |
+
from demucs_handler import DemucsProcessor
|
13 |
+
from basic_pitch_handler import BasicPitchConverter
|
14 |
+
|
15 |
+
# Suppress TF logging
|
16 |
+
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
|
17 |
+
logging.getLogger('tensorflow').setLevel(logging.ERROR)
|
18 |
+
|
19 |
+
logger = logging.getLogger(__name__)
|
20 |
+
|
21 |
+
# Create a persistent directory for outputs
|
22 |
+
OUTPUT_DIR = Path("/tmp/audio_processor")
|
23 |
+
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
24 |
+
|
25 |
+
def process_single_audio(audio_path: str, stem_type: str, convert_midi: bool) -> Tuple[Tuple[int, np.ndarray], Optional[str]]:
|
26 |
+
try:
|
27 |
+
# Create unique subdirectory for this processing
|
28 |
+
process_dir = OUTPUT_DIR / str(hash(audio_path))
|
29 |
+
process_dir.mkdir(parents=True, exist_ok=True)
|
30 |
+
|
31 |
+
processor = DemucsProcessor()
|
32 |
+
converter = BasicPitchConverter()
|
33 |
+
|
34 |
+
print(f"Starting processing of file: {audio_path}")
|
35 |
+
|
36 |
+
# Process stems
|
37 |
+
sources, sample_rate = processor.separate_stems(audio_path)
|
38 |
+
print(f"Number of sources returned: {sources.shape}")
|
39 |
+
print(f"Stem type requested: {stem_type}")
|
40 |
+
|
41 |
+
# Get the requested stem
|
42 |
+
stem_index = ["drums", "bass", "other", "vocals"].index(stem_type)
|
43 |
+
selected_stem = sources[0, stem_index]
|
44 |
+
|
45 |
+
# Save stem
|
46 |
+
stem_path = process_dir / f"{stem_type}.wav"
|
47 |
+
processor.save_stem(selected_stem, stem_type, str(process_dir), sample_rate)
|
48 |
+
print(f"Saved stem to: {stem_path}")
|
49 |
+
|
50 |
+
# Load the saved audio file for Gradio
|
51 |
+
audio_data, sr = sf.read(str(stem_path))
|
52 |
+
if len(audio_data.shape) > 1:
|
53 |
+
audio_data = audio_data.mean(axis=1) # Convert to mono if stereo
|
54 |
+
|
55 |
+
# Convert to int16 format
|
56 |
+
audio_data = (audio_data * 32767).astype(np.int16)
|
57 |
+
|
58 |
+
# Convert to MIDI if requested
|
59 |
+
midi_path = None
|
60 |
+
if convert_midi:
|
61 |
+
midi_path = process_dir / f"{stem_type}.mid"
|
62 |
+
converter.convert_to_midi(str(stem_path), str(midi_path))
|
63 |
+
print(f"Saved MIDI to: {midi_path}")
|
64 |
+
|
65 |
+
return (sr, audio_data), str(midi_path) if midi_path else None
|
66 |
+
except Exception as e:
|
67 |
+
print(f"Error in process_single_audio: {str(e)}")
|
68 |
+
raise
|
69 |
+
|
70 |
+
def create_interface():
|
71 |
+
processor = DemucsProcessor()
|
72 |
+
converter = BasicPitchConverter()
|
73 |
+
validator = AudioValidator()
|
74 |
+
|
75 |
+
def process_audio(
|
76 |
+
audio_files: List[str],
|
77 |
+
stem_type: str,
|
78 |
+
convert_midi: bool = True,
|
79 |
+
progress=gr.Progress()
|
80 |
+
) -> Tuple[Tuple[int, np.ndarray], Optional[str]]:
|
81 |
+
try:
|
82 |
+
print(f"Starting processing of {len(audio_files)} files")
|
83 |
+
print(f"Selected stem type: {stem_type}")
|
84 |
+
|
85 |
+
# Process single file for now
|
86 |
+
if len(audio_files) > 0:
|
87 |
+
audio_path = audio_files[0] # Take first file
|
88 |
+
print(f"Processing file: {audio_path}")
|
89 |
+
return process_single_audio(audio_path, stem_type, convert_midi)
|
90 |
+
else:
|
91 |
+
raise ValueError("No audio files provided")
|
92 |
+
|
93 |
+
except Exception as e:
|
94 |
+
print(f"Error in audio processing: {str(e)}")
|
95 |
+
raise gr.Error(str(e))
|
96 |
+
|
97 |
+
interface = gr.Interface(
|
98 |
+
fn=process_audio,
|
99 |
+
inputs=[
|
100 |
+
gr.File(
|
101 |
+
file_count="multiple",
|
102 |
+
file_types=AudioValidator.SUPPORTED_FORMATS,
|
103 |
+
label="Upload Audio Files"
|
104 |
+
),
|
105 |
+
gr.Dropdown(
|
106 |
+
choices=["vocals", "drums", "bass", "other"],
|
107 |
+
label="Select Stem",
|
108 |
+
value="vocals"
|
109 |
+
),
|
110 |
+
gr.Checkbox(label="Convert to MIDI", value=True)
|
111 |
+
],
|
112 |
+
outputs=[
|
113 |
+
gr.Audio(label="Separated Stems", type="numpy"),
|
114 |
+
gr.File(label="MIDI Files")
|
115 |
+
],
|
116 |
+
title="Audio Stem Separator & MIDI Converter",
|
117 |
+
description="Upload audio files to separate stems and convert to MIDI",
|
118 |
+
cache_examples=True,
|
119 |
+
allow_flagging="never"
|
120 |
+
)
|
121 |
+
|
122 |
+
return interface
|
123 |
+
|
124 |
+
if __name__ == "__main__":
|
125 |
+
interface = create_interface()
|
126 |
+
interface.launch(
|
127 |
+
share=False,
|
128 |
+
server_name="0.0.0.0",
|
129 |
+
server_port=7860,
|
130 |
+
auth=None,
|
131 |
+
ssl_keyfile=None,
|
132 |
+
ssl_certfile=None
|
133 |
+
)
|
basic_pitch_handler.py
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import logging
|
2 |
+
from basic_pitch.inference import predict
|
3 |
+
from basic_pitch import ICASSP_2022_MODEL_PATH
|
4 |
+
import pretty_midi
|
5 |
+
from typing import Optional, Tuple
|
6 |
+
|
7 |
+
logger = logging.getLogger(__name__)
|
8 |
+
|
9 |
+
class BasicPitchConverter:
|
10 |
+
def __init__(self):
|
11 |
+
self.process_options = {
|
12 |
+
'onset_threshold': 0.5,
|
13 |
+
'frame_threshold': 0.3,
|
14 |
+
'minimum_note_length': 127.70, # in milliseconds
|
15 |
+
'minimum_frequency': 32.7, # C1
|
16 |
+
'maximum_frequency': 2093, # C7
|
17 |
+
'multiple_pitch_bends': True,
|
18 |
+
'melodia_trick': True,
|
19 |
+
'midi_tempo': 120.0
|
20 |
+
}
|
21 |
+
print("Basic Pitch converter initialized") # Keep using print for consistency
|
22 |
+
|
23 |
+
def convert_to_midi(self, audio_path: str, output_path: str, progress: Optional[callable] = None) -> str:
|
24 |
+
"""
|
25 |
+
Convert audio to MIDI using Basic Pitch.
|
26 |
+
|
27 |
+
Args:
|
28 |
+
audio_path: Path to input audio file
|
29 |
+
output_path: Path to save MIDI file
|
30 |
+
progress: Optional callback function for progress updates
|
31 |
+
|
32 |
+
Returns:
|
33 |
+
str: Path to saved MIDI file
|
34 |
+
"""
|
35 |
+
try:
|
36 |
+
print(f"Converting to MIDI: {audio_path}") # Keep debugging output
|
37 |
+
if progress:
|
38 |
+
progress(0.1, "Loading audio for MIDI conversion...")
|
39 |
+
|
40 |
+
# Predict using Basic Pitch with correct parameters
|
41 |
+
model_output, midi_data, note_events = predict(
|
42 |
+
audio_path=audio_path,
|
43 |
+
onset_threshold=self.process_options['onset_threshold'],
|
44 |
+
frame_threshold=self.process_options['frame_threshold'],
|
45 |
+
minimum_note_length=self.process_options['minimum_note_length'],
|
46 |
+
minimum_frequency=self.process_options['minimum_frequency'],
|
47 |
+
maximum_frequency=self.process_options['maximum_frequency'],
|
48 |
+
multiple_pitch_bends=self.process_options['multiple_pitch_bends'],
|
49 |
+
melodia_trick=self.process_options['melodia_trick'],
|
50 |
+
midi_tempo=self.process_options['midi_tempo']
|
51 |
+
)
|
52 |
+
|
53 |
+
if progress:
|
54 |
+
progress(0.7, "Saving MIDI file...")
|
55 |
+
|
56 |
+
print(f"Saving MIDI to: {output_path}") # Keep debugging output
|
57 |
+
|
58 |
+
# Save MIDI file with validation
|
59 |
+
if isinstance(midi_data, pretty_midi.PrettyMIDI):
|
60 |
+
midi_data.write(output_path)
|
61 |
+
print(f"Successfully saved MIDI to {output_path}") # Keep using print
|
62 |
+
return output_path
|
63 |
+
else:
|
64 |
+
raise ValueError("MIDI conversion failed: Invalid MIDI data")
|
65 |
+
|
66 |
+
except Exception as e:
|
67 |
+
print(f"Error in MIDI conversion: {str(e)}") # Keep using print
|
68 |
+
raise
|
69 |
+
|
70 |
+
def set_process_options(self, **kwargs):
|
71 |
+
"""Update processing options"""
|
72 |
+
self.process_options.update(kwargs)
|
demucs_handler.py
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import torch
|
2 |
+
import torchaudio
|
3 |
+
import logging
|
4 |
+
import os
|
5 |
+
from demucs.pretrained import get_model
|
6 |
+
from demucs.apply import apply_model
|
7 |
+
from typing import Tuple
|
8 |
+
|
9 |
+
logger = logging.getLogger(__name__)
|
10 |
+
|
11 |
+
class DemucsProcessor:
|
12 |
+
def __init__(self, model_name="htdemucs"):
|
13 |
+
try:
|
14 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
15 |
+
print(f"Using device: {self.device}")
|
16 |
+
|
17 |
+
self.model = get_model(model_name)
|
18 |
+
print(f"Model name: {model_name}")
|
19 |
+
print(f"Model sources: {self.model.sources}") # This will show available stems
|
20 |
+
print(f"Model sample rate: {self.model.samplerate}")
|
21 |
+
|
22 |
+
self.model.to(self.device)
|
23 |
+
print(f"Model loaded successfully on {self.device}")
|
24 |
+
|
25 |
+
except Exception as e:
|
26 |
+
print(f"Error initializing model: {str(e)}")
|
27 |
+
raise
|
28 |
+
|
29 |
+
def separate_stems(self, audio_path: str, progress=None) -> Tuple[torch.Tensor, int]:
|
30 |
+
try:
|
31 |
+
if progress:
|
32 |
+
progress(0.1, "Loading audio file...")
|
33 |
+
|
34 |
+
# Load audio
|
35 |
+
waveform, sample_rate = torchaudio.load(audio_path)
|
36 |
+
print(f"Audio loaded - Shape: {waveform.shape}")
|
37 |
+
|
38 |
+
if progress:
|
39 |
+
progress(0.3, "Processing stems...")
|
40 |
+
|
41 |
+
# Input validation and logging: Check waveform dimensions
|
42 |
+
if waveform.dim() not in (1, 2):
|
43 |
+
raise ValueError(f"Invalid waveform dimensions: Expected 1D or 2D, got {waveform.dim()}")
|
44 |
+
|
45 |
+
# Handle mono input by duplicating to stereo
|
46 |
+
if waveform.dim() == 1:
|
47 |
+
waveform = waveform.unsqueeze(0)
|
48 |
+
if waveform.shape[0] == 1:
|
49 |
+
waveform = waveform.repeat(2, 1)
|
50 |
+
print("Converted mono to stereo by duplication")
|
51 |
+
|
52 |
+
# Ensure 3D tensor for apply_model (batch, channels, time)
|
53 |
+
waveform = waveform.unsqueeze(0)
|
54 |
+
print(f"Waveform shape before apply_model: {waveform.shape}")
|
55 |
+
|
56 |
+
# Process
|
57 |
+
with torch.no_grad():
|
58 |
+
sources = apply_model(self.model, waveform.to(self.device))
|
59 |
+
print(f"Sources shape after processing: {sources.shape}")
|
60 |
+
print(f"Available stems: {self.model.sources}")
|
61 |
+
|
62 |
+
if progress:
|
63 |
+
progress(0.8, "Finalizing separation...")
|
64 |
+
|
65 |
+
return sources, sample_rate
|
66 |
+
|
67 |
+
except Exception as e:
|
68 |
+
print(f"Error in stem separation: {str(e)}")
|
69 |
+
raise
|
70 |
+
|
71 |
+
def save_stem(self, stem: torch.Tensor, stem_name: str, output_path: str, sample_rate: int):
|
72 |
+
try:
|
73 |
+
torchaudio.save(
|
74 |
+
f"{output_path}/{stem_name}.wav",
|
75 |
+
stem.cpu(),
|
76 |
+
sample_rate
|
77 |
+
)
|
78 |
+
except Exception as e:
|
79 |
+
print(f"Error saving stem: {str(e)}")
|
80 |
+
raise
|
requirements.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio>=4.0.0
|
2 |
+
demucs>=4.0.0
|
3 |
+
basic-pitch>=0.2.6
|
4 |
+
torch>=2.0.0
|
5 |
+
torchaudio>=2.0.0
|
6 |
+
transformers>=4.30.0
|
validators.py
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import logging
|
3 |
+
import torchaudio
|
4 |
+
from typing import Tuple
|
5 |
+
|
6 |
+
logger = logging.getLogger(__name__)
|
7 |
+
|
8 |
+
class AudioValidator:
|
9 |
+
SUPPORTED_FORMATS = ['.mp3', '.wav', '.flac']
|
10 |
+
MAX_FILE_SIZE = 30 * 1024 * 1024 # 30MB
|
11 |
+
|
12 |
+
@staticmethod
|
13 |
+
def validate_audio_file(file_path: str) -> Tuple[bool, str]:
|
14 |
+
try:
|
15 |
+
if not os.path.exists(file_path):
|
16 |
+
return False, "File does not exist"
|
17 |
+
|
18 |
+
file_size = os.path.getsize(file_path)
|
19 |
+
if file_size > AudioValidator.MAX_FILE_SIZE:
|
20 |
+
return False, f"File too large. Maximum size: {AudioValidator.MAX_FILE_SIZE // 1024 // 1024}MB"
|
21 |
+
|
22 |
+
file_ext = os.path.splitext(file_path)[1].lower()
|
23 |
+
if file_ext not in AudioValidator.SUPPORTED_FORMATS:
|
24 |
+
return False, f"Unsupported format. Supported formats: {', '.join(AudioValidator.SUPPORTED_FORMATS)}"
|
25 |
+
|
26 |
+
# Validate audio file integrity
|
27 |
+
try:
|
28 |
+
waveform, sample_rate = torchaudio.load(file_path)
|
29 |
+
if sample_rate < 8000 or sample_rate > 48000:
|
30 |
+
return False, "Invalid sample rate"
|
31 |
+
except Exception as e:
|
32 |
+
return False, f"Invalid audio file: {str(e)}"
|
33 |
+
|
34 |
+
return True, "Valid audio file"
|
35 |
+
except Exception as e:
|
36 |
+
logger.error(f"Error validating audio file: {str(e)}")
|
37 |
+
return False, str(e)
|