Spaces:

jacob-c
/

syllables_matching_experiment

Paused

App Files Files Community

root commited on 6 days ago

Commit

5b33796

1 Parent(s): c95399f

ss

Browse files

Files changed (4) hide show

README.md +27 -23
app.py +0 -0
emotionanalysis.py +558 -36
requirements.txt +0 -1

README.md CHANGED Viewed

@@ -11,37 +11,41 @@ license: mit
 short_description: AI music genre detection and lyrics generation
 ---
-# Music Genre Classifier & Lyrics Generator
-This Hugging Face Space application provides two AI-powered features:
-1. **Music Genre Classification**: Upload a music file and get an analysis of its genre using the [dima806/music_genres_classification](https://huggingface.co/dima806/music_genres_classification) model.
-2. **Lyrics Generation**: Based on the detected genre, the app generates original lyrics using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) that match both the style of the genre and approximate length of the song.
 ## Features
-- Upload any music file for instant genre classification
-- Receive genre predictions with confidence scores
-- Get AI-generated lyrics tailored to the detected music genre
-- Lyrics length is automatically adjusted based on the song duration
-- Simple and intuitive user interface
-## Usage
-1. Visit the live application on Hugging Face Spaces
-2. Upload your music file using the provided interface
-3. Click "Analyze & Generate" to process the audio
-4. View the detected genre and generated lyrics in the output panels
 ## Technical Details
-- Uses MFCC features extraction from audio for genre classification
-- Leverages 4-bit quantization for efficient LLM inference on T4 GPU
-- Implements a specialized prompt engineering approach to generate genre-specific lyrics
-- Automatically scales lyrics length based on audio duration
-## Links
-- [Music Genre Classification Model](https://huggingface.co/dima806/music_genres_classification)
-- [Llama 3.1 8B Instruct Model](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)

 short_description: AI music genre detection and lyrics generation
 ---
+# Music Analysis & Lyrics Generator
+This Hugging Face Space application analyzes music files and generates lyrics that match the musical characteristics.
 ## Features
+- **Music Analysis**: Detects tempo, time signature, key, emotion, and theme
+- **Genre Classification**: Identifies the music genre using a pre-trained classifier
+- **Lyrics Generation**: Creates lyrics that match the style, emotion, and length of your music using Qwen3-32B
+## How to Use
+1. Upload a music file or record audio directly in the app
+2. Click "Analyze and Generate Lyrics"
+3. View the analysis results showing tempo, key, emotion, theme, and genre
+4. Check the generated lyrics tailored to match your music
 ## Technical Details
+This application uses:
+- **MusicAnalyzer**: Custom analysis tool for detecting musical features
+- **Hugging Face Transformers**: Pre-trained models for genre classification and lyrics generation
+- **Gradio**: For the user interface
+- **Librosa**: For audio processing
+## Requirements
+See requirements.txt for detailed dependencies.
+## Limitations
+- Large audio files may take longer to process
+- The quality of lyrics generation depends on the clarity of the audio and the detected musical features
+## Credits
+- Genre classification model: dima806/music_genres_classification
+- LLM for lyrics generation: Qwen/Qwen3-32B

app.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

emotionanalysis.py CHANGED Viewed

@@ -1,5 +1,7 @@
 import librosa
 import numpy as np
 try:
     import matplotlib.pyplot as plt
 except ImportError:
@@ -7,6 +9,7 @@ except ImportError:
 from scipy.stats import mode
 import warnings
 warnings.filterwarnings('ignore')  # Suppress librosa warnings
 class MusicAnalyzer:
     def __init__(self):
         # Emotion feature mappings - these define characteristics of different emotions
@@ -31,6 +34,40 @@ class MusicAnalyzer:
         # Musical key mapping
         self.key_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
     def load_audio(self, file_path, sr=22050, duration=None):
         """Load audio file and return time series and sample rate"""
@@ -56,8 +93,12 @@ class MusicAnalyzer:
         ac = librosa.autocorrelate(onset_env, max_size=sr // 2)
         ac = librosa.util.normalize(ac, norm=np.inf)
-        # Time signature estimation - a challenging task with many limitations
-        estimated_signature = self._estimate_time_signature(y, sr, beat_times, onset_env)
         # Compute onset strength to get a measure of rhythm intensity
         rhythm_intensity = np.mean(onset_env) / np.max(onset_env) if np.max(onset_env) > 0 else 0
@@ -65,48 +106,509 @@ class MusicAnalyzer:
         # Rhythm complexity based on variation in onset strength
         rhythm_complexity = np.std(onset_env) / np.mean(onset_env) if np.mean(onset_env) > 0 else 0
         return {
             "tempo": float(tempo),
-            "beat_times": beat_times.tolist(),
-            "beat_intervals": beat_intervals.tolist(),
             "beat_regularity": float(beat_regularity),
             "rhythm_intensity": float(rhythm_intensity),
             "rhythm_complexity": float(rhythm_complexity),
-            "estimated_time_signature": estimated_signature
         }
-    def _estimate_time_signature(self, y, sr, beat_times, onset_env):
-        """Estimate the time signature based on beat patterns"""
-        # This is a simplified approach - accurate time signature detection is complex
-        if len(beat_times) < 4:
-            return "Unknown"
-        # Analyze beat emphasis patterns to detect meter
-        beat_intervals = np.diff(beat_times)
-        # Look for periodicity in the onset envelope
-        ac = librosa.autocorrelate(onset_env, max_size=sr)
-        # Find peaks in autocorrelation after the first one (which is at lag 0)
-        peaks = librosa.util.peak_pick(ac, pre_max=20, post_max=20, pre_avg=20, post_avg=20, delta=0.1, wait=1)
-        peaks = peaks[peaks > 0]  # Remove the first peak which is at lag 0
         if len(peaks) == 0:
-            return "4/4"  # Default to most common
-        # Convert first significant peak to beats
-        first_peak_time = peaks[0] / sr
-        beats_per_bar = round(first_peak_time / np.median(beat_intervals))
-        # Map to common time signatures
-        if beats_per_bar == 4 or beats_per_bar == 8:
-            return "4/4"
-        elif beats_per_bar == 3 or beats_per_bar == 6:
-            return "3/4"
-        elif beats_per_bar == 2:
-            return "2/4"
         else:
-            return f"{beats_per_bar}/4"  # Default assumption
     def analyze_tonality(self, y, sr):
         """Analyze tonal features: key, mode, harmonic features"""
@@ -355,6 +857,26 @@ class MusicAnalyzer:
         emotion_data = self.analyze_emotion(rhythm_data, tonal_data, energy_data)
         theme_data = self.analyze_theme(rhythm_data, tonal_data, emotion_data)
         # Combine all results
         return {
             "file": file_path,
@@ -364,7 +886,7 @@ class MusicAnalyzer:
             "emotion_analysis": emotion_data,
             "theme_analysis": theme_data,
             "summary": {
-                "tempo": rhythm_data["tempo"],
                 "time_signature": rhythm_data["estimated_time_signature"],
                 "key": tonal_data["key"],
                 "mode": tonal_data["mode"],

 import librosa
 import numpy as np
+from scipy import signal
+from collections import Counter
 try:
     import matplotlib.pyplot as plt
 except ImportError:
 from scipy.stats import mode
 import warnings
 warnings.filterwarnings('ignore')  # Suppress librosa warnings
 class MusicAnalyzer:
     def __init__(self):
         # Emotion feature mappings - these define characteristics of different emotions
         # Musical key mapping
         self.key_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
+        # Common time signatures and their beat patterns with weights for prior probability
+        self.common_time_signatures = {
+            "4/4": {"beats_per_bar": 4, "beat_pattern": [1.0, 0.2, 0.5, 0.2], "weight": 0.35},
+            "3/4": {"beats_per_bar": 3, "beat_pattern": [1.0, 0.2, 0.3], "weight": 0.25},
+            "2/4": {"beats_per_bar": 2, "beat_pattern": [1.0, 0.3], "weight": 0.15},
+            "6/8": {"beats_per_bar": 6, "beat_pattern": [1.0, 0.2, 0.3, 0.8, 0.2, 0.3], "weight": 0.25},
+            "5/4": {"beats_per_bar": 5, "beat_pattern": [1.0, 0.2, 0.4, 0.7, 0.2], "weight": 0.10},
+            "7/8": {"beats_per_bar": 7, "beat_pattern": [1.0, 0.2, 0.3, 0.8, 0.2, 0.2, 0.3], "weight": 0.10},
+            "9/8": {"beats_per_bar": 9, "beat_pattern": [1.0, 0.2, 0.3, 0.8, 0.2, 0.3, 0.7, 0.2, 0.3], "weight": 0.10},
+            "12/8": {"beats_per_bar": 12, "beat_pattern": [1.0, 0.2, 0.3, 0.6, 0.2, 0.3, 0.8, 0.2, 0.3, 0.6, 0.2, 0.3], "weight": 0.15}
+        }
+        # Add common accent patterns for different time signatures
+        self.accent_patterns = {
+            "4/4": [[1, 0, 0, 0], [1, 0, 2, 0], [1, 0, 2, 0, 3, 0, 2, 0]],
+            "3/4": [[1, 0, 0], [1, 0, 2]],
+            "2/4": [[1, 0], [1, 2]],
+            "6/8": [[1, 0, 0, 2, 0, 0], [1, 0, 0, 2, 0, 3]],
+            "5/4": [[1, 0, 0, 2, 0], [1, 0, 2, 0, 0]],
+            "7/8": [[1, 0, 0, 2, 0, 0, 0], [1, 0, 0, 2, 0, 3, 0]],
+            "9/8": [[1, 0, 0, 2, 0, 0, 3, 0, 0]],
+            "12/8": [[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0]]
+        }
+        # Expected rhythm density (relative note density per beat) for different time signatures
+        self.rhythm_density = {
+            "4/4": [1.0, 0.7, 0.8, 0.6],
+            "3/4": [1.0, 0.6, 0.7],
+            "6/8": [1.0, 0.5, 0.4, 0.8, 0.5, 0.4],
+            "2/4": [1.0, 0.6],
+            "5/4": [1.0, 0.6, 0.8, 0.7, 0.6],
+            "7/8": [1.0, 0.5, 0.4, 0.8, 0.5, 0.4, 0.5]
+        }
     def load_audio(self, file_path, sr=22050, duration=None):
         """Load audio file and return time series and sample rate"""
         ac = librosa.autocorrelate(onset_env, max_size=sr // 2)
         ac = librosa.util.normalize(ac, norm=np.inf)
+        # Advanced time signature detection
+        time_sig_result = self._detect_time_signature(y, sr)
+        # Extract results from the time signature detection
+        estimated_signature = time_sig_result["time_signature"]
+        time_sig_confidence = time_sig_result["confidence"]
         # Compute onset strength to get a measure of rhythm intensity
         rhythm_intensity = np.mean(onset_env) / np.max(onset_env) if np.max(onset_env) > 0 else 0
         # Rhythm complexity based on variation in onset strength
         rhythm_complexity = np.std(onset_env) / np.mean(onset_env) if np.mean(onset_env) > 0 else 0
+        # Convert numpy arrays to regular Python types for JSON serialization
+        beat_times_list = [float(t) for t in beat_times.tolist()]
+        beat_intervals_list = [float(i) for i in beat_intervals.tolist()]
         return {
             "tempo": float(tempo),
+            "beat_times": beat_times_list,
+            "beat_intervals": beat_intervals_list,
             "beat_regularity": float(beat_regularity),
             "rhythm_intensity": float(rhythm_intensity),
             "rhythm_complexity": float(rhythm_complexity),
+            "estimated_time_signature": estimated_signature,
+            "time_signature_confidence": float(time_sig_confidence),
+            "time_signature_candidates": time_sig_result.get("all_candidates", {})
         }
+    def _detect_time_signature(self, y, sr):
+        """
+        Multi-method approach to time signature detection
+        Args:
+            y: Audio signal
+            sr: Sample rate
+        Returns:
+            dict with detected time signature and confidence
+        """
+        # 1. Compute onset envelope and beat positions
+        onset_env = librosa.onset.onset_strength(y=y, sr=sr, hop_length=512)
+        # Get tempo and beat frames
+        tempo, beat_frames = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
+        beat_times = librosa.frames_to_time(beat_frames, sr=sr)
+        # Return default if not enough beats detected
+        if len(beat_times) < 8:
+            return {"time_signature": "4/4", "confidence": 0.5}
+        # 2. Extract beat strengths and normalize
+        beat_strengths = self._get_beat_strengths(y, sr, beat_times, onset_env)
+        # 3. Compute various time signature features using different methods
+        results = {}
+        # Method 1: Beat pattern autocorrelation
+        autocorr_result = self._detect_by_autocorrelation(onset_env, sr)
+        results["autocorrelation"] = autocorr_result
+        # Method 2: Beat strength pattern matching
+        pattern_result = self._detect_by_pattern_matching(beat_strengths)
+        results["pattern_matching"] = pattern_result
+        # Method 3: Spectral rhythmic analysis
+        spectral_result = self._detect_by_spectral_analysis(onset_env, sr)
+        results["spectral"] = spectral_result
+        # Method 4: Note density analysis
+        density_result = self._detect_by_note_density(y, sr, beat_times)
+        results["note_density"] = density_result
+        # Method 5: Tempo-based estimation
+        tempo_result = self._estimate_from_tempo(tempo)
+        results["tempo_based"] = tempo_result
+        # 4. Combine results with weighted voting
+        final_result = self._combine_detection_results(results, tempo)
+        return final_result
+    def _get_beat_strengths(self, y, sr, beat_times, onset_env):
+        """Extract normalized strengths at beat positions"""
+        # Convert beat times to frames
+        beat_frames = librosa.time_to_frames(beat_times, sr=sr, hop_length=512)
+        beat_frames = [min(f, len(onset_env)-1) for f in beat_frames]
+        # Get beat strengths from onset envelope
+        beat_strengths = np.array([onset_env[f] for f in beat_frames])
+        # Also look at energy and spectral flux at beat positions
+        hop_length = 512
+        frame_length = 2048
+        # Get energy at each beat
+        energy = librosa.feature.rms(y=y, frame_length=frame_length, hop_length=hop_length)[0]
+        beat_energy = np.array([energy[min(f, len(energy)-1)] for f in beat_frames])
+        # Combine onset strength with energy (weighted average)
+        beat_strengths = 0.7 * beat_strengths + 0.3 * beat_energy
+        # Normalize
+        if np.max(beat_strengths) > 0:
+            beat_strengths = beat_strengths / np.max(beat_strengths)
+        return beat_strengths
+    def _detect_by_autocorrelation(self, onset_env, sr):
+        """Detect meter using autocorrelation of onset strength"""
+        # Calculate autocorrelation of onset envelope
+        hop_length = 512
+        ac = librosa.autocorrelate(onset_env, max_size=4 * sr // hop_length)
+        ac = librosa.util.normalize(ac)
+        # Find significant peaks in autocorrelation
+        peaks = signal.find_peaks(ac, height=0.2, distance=sr//(8*hop_length))[0]
+        if len(peaks) < 2:
+            return {"time_signature": "4/4", "confidence": 0.4}
+        # Analyze peak intervals in terms of beats
+        peak_intervals = np.diff(peaks)
+        # Convert peaks to time
+        peak_times = peaks * hop_length / sr
+        # Analyze for common time signature patterns
+        time_sig_votes = {}
+        # Check if peaks match expected bar lengths
+        for ts, info in self.common_time_signatures.items():
+            beats_per_bar = info["beats_per_bar"]
+            # Check how well peaks match this meter
+            score = 0
+            for interval in peak_intervals:
+                # Check if this interval corresponds to this time signature
+                # Allow some tolerance around the expected value
+                expected = beats_per_bar * (hop_length / sr)  # in seconds
+                tolerance = 0.25 * expected
+                if abs(interval * hop_length / sr - expected) < tolerance:
+                    score += 1
+            if len(peak_intervals) > 0:
+                time_sig_votes[ts] = score / len(peak_intervals)
+        # Return most likely time signature
+        if time_sig_votes:
+            best_ts = max(time_sig_votes.items(), key=lambda x: x[1])
+            return {"time_signature": best_ts[0], "confidence": best_ts[1]}
+        return {"time_signature": "4/4", "confidence": 0.4}
+    def _detect_by_pattern_matching(self, beat_strengths):
+        """Match beat strength patterns against known time signature patterns"""
+        if len(beat_strengths) < 6:
+            return {"time_signature": "4/4", "confidence": 0.4}
+        results = {}
+        # Try each possible time signature
+        for ts, info in self.common_time_signatures.items():
+            beats_per_bar = info["beats_per_bar"]
+            expected_pattern = info["beat_pattern"]
+            # Calculate correlation scores for overlapping segments
+            scores = []
+            # We need at least one complete pattern
+            if len(beat_strengths) >= beats_per_bar:
+                # Try different offsets to find best alignment
+                for offset in range(min(beats_per_bar, len(beat_strengths) - beats_per_bar + 1)):
+                    # Calculate scores for each complete pattern
+                    pattern_scores = []
+                    for i in range(offset, len(beat_strengths) - beats_per_bar + 1, beats_per_bar):
+                        segment = beat_strengths[i:i+beats_per_bar]
+                        # If expected pattern is longer than segment, truncate it
+                        pattern = expected_pattern[:len(segment)]
+                        # Normalize segment and pattern
+                        if np.std(segment) > 0 and np.std(pattern) > 0:
+                            # Calculate correlation
+                            corr = np.corrcoef(segment, pattern)[0, 1]
+                            if not np.isnan(corr):
+                                pattern_scores.append(corr)
+                    if pattern_scores:
+                        scores.append(np.mean(pattern_scores))
+            # Use the best score among different offsets
+            if scores:
+                confidence = max(scores)
+                results[ts] = confidence
+        # Find best match
+        if results:
+            best_ts = max(results.items(), key=lambda x: x[1])
+            return {"time_signature": best_ts[0], "confidence": best_ts[1]}
+        # Default
+        return {"time_signature": "4/4", "confidence": 0.5}
+    def _detect_by_spectral_analysis(self, onset_env, sr):
+        """Analyze rhythm in frequency domain"""
+        # Get rhythm periodicity through Fourier Transform
+        # Focus on periods corresponding to typical bar lengths (1-8 seconds)
+        hop_length = 512
+        # Calculate rhythm periodicity
+        fft_size = 2**13  # Large enough to give good frequency resolution
+        S = np.abs(np.fft.rfft(onset_env, n=fft_size))
+        # Convert frequency to tempo in BPM
+        freqs = np.fft.rfftfreq(fft_size, d=hop_length/sr)
+        tempos = 60 * freqs
+        # Focus on reasonable tempo range (40-240 BPM)
+        tempo_mask = (tempos >= 40) & (tempos <= 240)
+        S_tempo = S[tempo_mask]
+        tempos = tempos[tempo_mask]
+        # Find peaks in spectrum
+        peaks = signal.find_peaks(S_tempo, height=np.max(S_tempo)*0.1, distance=5)[0]
         if len(peaks) == 0:
+            return {"time_signature": "4/4", "confidence": 0.4}
+        # Get peak tempos and strengths
+        peak_tempos = tempos[peaks]
+        peak_strengths = S_tempo[peaks]
+        # Sort by strength
+        peak_indices = np.argsort(peak_strengths)[::-1]
+        peak_tempos = peak_tempos[peak_indices]
+        peak_strengths = peak_strengths[peak_indices]
+        # Analyze relationships between peaks
+        # For example, 3/4 typically has peaks at multiples of 3 beats
+        # 4/4 has peaks at multiples of 4 beats
+        time_sig_scores = {}
+        # Check relationships between top peaks
+        if len(peak_tempos) >= 2:
+            tempo_ratios = []
+            for i in range(len(peak_tempos)):
+                for j in range(i+1, len(peak_tempos)):
+                    if peak_tempos[j] > 0:
+                        ratio = peak_tempos[i] / peak_tempos[j]
+                        tempo_ratios.append(ratio)
+            # Check for patterns indicative of different time signatures
+            for ts in self.common_time_signatures:
+                score = 0
+                if ts == "4/4" or ts == "2/4":
+                    # Look for ratios close to 2 or 4
+                    for ratio in tempo_ratios:
+                        if abs(ratio - 2) < 0.2 or abs(ratio - 4) < 0.2:
+                            score += 1
+                elif ts == "3/4" or ts == "6/8":
+                    # Look for ratios close to 3 or 6
+                    for ratio in tempo_ratios:
+                        if abs(ratio - 3) < 0.2 or abs(ratio - 6) < 0.3:
+                            score += 1
+                # Normalize score
+                if tempo_ratios:
+                    time_sig_scores[ts] = min(1.0, score / len(tempo_ratios) + 0.4)
+        # If we have meaningful scores, return best match
+        if time_sig_scores:
+            best_ts = max(time_sig_scores.items(), key=lambda x: x[1])
+            return {"time_signature": best_ts[0], "confidence": best_ts[1]}
+        # Default fallback
+        return {"time_signature": "4/4", "confidence": 0.4}
+    def _detect_by_note_density(self, y, sr, beat_times):
+        """Analyze note density patterns between beats"""
+        if len(beat_times) < 6:
+            return {"time_signature": "4/4", "confidence": 0.4}
+        # Extract note onsets (not just beats)
+        onset_times = librosa.onset.onset_detect(y=y, sr=sr, units='time')
+        if len(onset_times) < len(beat_times):
+            return {"time_signature": "4/4", "confidence": 0.4}
+        # Count onsets between consecutive beats
+        note_counts = []
+        for i in range(len(beat_times) - 1):
+            start = beat_times[i]
+            end = beat_times[i+1]
+            # Count onsets in this beat
+            count = sum(1 for t in onset_times if start <= t < end)
+            note_counts.append(count)
+        # Look for repeating patterns in the note counts
+        time_sig_scores = {}
+        for ts, info in self.common_time_signatures.items():
+            beats_per_bar = info["beats_per_bar"]
+            # Skip if we don't have enough data
+            if len(note_counts) < beats_per_bar:
+                continue
+            # Calculate pattern similarity for this time signature
+            scores = []
+            for offset in range(min(beats_per_bar, len(note_counts) - beats_per_bar + 1)):
+                similarities = []
+                for i in range(offset, len(note_counts) - beats_per_bar + 1, beats_per_bar):
+                    # Get current bar pattern
+                    pattern = note_counts[i:i+beats_per_bar]
+                    # Compare with expected density pattern
+                    expected = self.rhythm_density.get(ts, [1.0] * beats_per_bar)
+                    expected = expected[:len(pattern)]  # Truncate if needed
+                    # Normalize both patterns
+                    if sum(pattern) > 0 and sum(expected) > 0:
+                        pattern_norm = [p/max(1, sum(pattern)) for p in pattern]
+                        expected_norm = [e/sum(expected) for e in expected]
+                        # Calculate similarity (1 - distance)
+                        distance = sum(abs(p - e) for p, e in zip(pattern_norm, expected_norm)) / len(pattern)
+                        similarity = 1 - min(1.0, distance)
+                        similarities.append(similarity)
+                if similarities:
+                    scores.append(np.mean(similarities))
+            # Use the best score
+            if scores:
+                time_sig_scores[ts] = max(scores)
+        # Return best match
+        if time_sig_scores:
+            best_ts = max(time_sig_scores.items(), key=lambda x: x[1])
+            return {"time_signature": best_ts[0], "confidence": best_ts[1]}
+        # Default
+        return {"time_signature": "4/4", "confidence": 0.4}
+    def _estimate_from_tempo(self, tempo):
+        """Use tempo to help estimate likely time signature"""
+        # Statistical tendencies: slower tempos often in compound meters (6/8, 12/8)
+        # Very fast tempos often counted in cut time (2/2 instead of 4/4)
+        scores = {}
+        if tempo < 70:
+            # Slow tempos favor compound meters
+            scores = {
+                "4/4": 0.4,
+                "3/4": 0.5,
+                "6/8": 0.7,
+                "12/8": 0.6
+            }
+        elif 70 <= tempo <= 120:
+            # Medium tempos favor 4/4, 3/4
+            scores = {
+                "4/4": 0.7,
+                "3/4": 0.6,
+                "2/4": 0.4,
+                "6/8": 0.5
+            }
         else:
+            # Fast tempos favor simpler meters
+            scores = {
+                "4/4": 0.6,
+                "2/4": 0.7,
+                "2/2": 0.6,
+                "3/4": 0.4
+            }
+        # Find best match
+        best_ts = max(scores.items(), key=lambda x: x[1])
+        return {"time_signature": best_ts[0], "confidence": best_ts[1]}
+    def _combine_detection_results(self, results, tempo):
+        """Combine results from different detection methods"""
+        # Define weights for different methods
+        method_weights = {
+            "autocorrelation": 0.25,
+            "pattern_matching": 0.30,
+            "spectral": 0.20,
+            "note_density": 0.20,
+            "tempo_based": 0.05
+        }
+        # Prior probability (based on frequency in music)
+        prior_weights = {ts: info["weight"] for ts, info in self.common_time_signatures.items()}
+        # Combine votes
+        total_votes = {ts: prior_weights.get(ts, 0.1) for ts in self.common_time_signatures}
+        for method, result in results.items():
+            ts = result["time_signature"]
+            confidence = result["confidence"]
+            weight = method_weights.get(method, 0.1)
+            # Add weighted vote
+            if ts in total_votes:
+                total_votes[ts] += confidence * weight
+            else:
+                total_votes[ts] = confidence * weight
+        # Special case: disambiguate between 3/4 and 6/8
+        if "3/4" in total_votes and "6/8" in total_votes:
+            # If the two are close, use tempo to break tie
+            if abs(total_votes["3/4"] - total_votes["6/8"]) < 0.1:
+                if tempo < 100:  # Slower tempo favors 6/8
+                    total_votes["6/8"] += 0.1
+                else:  # Faster tempo favors 3/4
+                    total_votes["3/4"] += 0.1
+        # Get highest scoring time signature
+        best_ts = max(total_votes.items(), key=lambda x: x[1])
+        # Calculate confidence score (normalize to 0-1)
+        confidence = best_ts[1] / (sum(total_votes.values()) + 0.001)
+        confidence = min(0.95, max(0.4, confidence))  # Bound confidence
+        return {
+            "time_signature": best_ts[0],
+            "confidence": confidence,
+            "all_candidates": {ts: float(score) for ts, score in total_votes.items()}
+        }
+    def _evaluate_beat_pattern(self, beat_strengths, pattern_length):
+        """
+        Evaluate how consistently a specific pattern length fits the beat strengths
+        Args:
+            beat_strengths: Array of normalized beat strengths
+            pattern_length: Length of pattern to evaluate
+        Returns:
+            score: How well this pattern length explains the data (0-1)
+        """
+        if len(beat_strengths) < pattern_length * 2:
+            return 0.0
+        # Calculate correlation between consecutive patterns
+        correlations = []
+        num_full_patterns = len(beat_strengths) // pattern_length
+        for i in range(num_full_patterns - 1):
+            pattern1 = beat_strengths[i*pattern_length:(i+1)*pattern_length]
+            pattern2 = beat_strengths[(i+1)*pattern_length:(i+2)*pattern_length]
+            # Calculate similarity between consecutive patterns
+            if len(pattern1) == len(pattern2) and len(pattern1) > 0:
+                corr = np.corrcoef(pattern1, pattern2)[0, 1]
+                if not np.isnan(corr):
+                    correlations.append(corr)
+        # Calculate variance of beat strengths within each position
+        variance_score = 0
+        if num_full_patterns >= 2:
+            position_values = [[] for _ in range(pattern_length)]
+            for i in range(num_full_patterns):
+                for pos in range(pattern_length):
+                    idx = i * pattern_length + pos
+                    if idx < len(beat_strengths):
+                        position_values[pos].append(beat_strengths[idx])
+            # Calculate variance ratio (higher means consistent accent patterns)
+            between_pos_var = np.var([np.mean(vals) for vals in position_values if vals])
+            within_pos_var = np.mean([np.var(vals) for vals in position_values if len(vals) > 1])
+            if within_pos_var > 0:
+                variance_score = between_pos_var / within_pos_var
+                variance_score = min(1.0, variance_score / 2.0)  # Normalize
+        # Combine correlation and variance scores
+        if correlations:
+            correlation_score = np.mean(correlations)
+            return 0.7 * correlation_score + 0.3 * variance_score
+        return 0.5 * variance_score  # Lower confidence if we couldn't calculate correlations
+    def _extract_average_pattern(self, beat_strengths, pattern_length):
+        """
+        Extract the average beat pattern of specified length
+        Args:
+            beat_strengths: Array of beat strengths
+            pattern_length: Length of pattern to extract
+        Returns:
+            Average pattern of the specified length
+        """
+        if len(beat_strengths) < pattern_length:
+            return np.array([])
+        # Number of complete patterns
+        num_patterns = len(beat_strengths) // pattern_length
+        if num_patterns == 0:
+            return np.array([])
+        # Reshape to stack patterns and calculate average
+        patterns = beat_strengths[:num_patterns * pattern_length].reshape((num_patterns, pattern_length))
+        return np.mean(patterns, axis=0)
     def analyze_tonality(self, y, sr):
         """Analyze tonal features: key, mode, harmonic features"""
         emotion_data = self.analyze_emotion(rhythm_data, tonal_data, energy_data)
         theme_data = self.analyze_theme(rhythm_data, tonal_data, emotion_data)
+        # Convert any remaining numpy values to native Python types
+        def convert_numpy_to_python(obj):
+            if isinstance(obj, dict):
+                return {k: convert_numpy_to_python(v) for k, v in obj.items()}
+            elif isinstance(obj, list):
+                return [convert_numpy_to_python(item) for item in obj]
+            elif isinstance(obj, np.ndarray):
+                return obj.tolist()
+            elif isinstance(obj, np.number):
+                return float(obj)
+            else:
+                return obj
+        # Ensure all numpy values are converted
+        rhythm_data = convert_numpy_to_python(rhythm_data)
+        tonal_data = convert_numpy_to_python(tonal_data)
+        energy_data = convert_numpy_to_python(energy_data)
+        emotion_data = convert_numpy_to_python(emotion_data)
+        theme_data = convert_numpy_to_python(theme_data)
         # Combine all results
         return {
             "file": file_path,
             "emotion_analysis": emotion_data,
             "theme_analysis": theme_data,
             "summary": {
+                "tempo": float(rhythm_data["tempo"]),
                 "time_signature": rhythm_data["estimated_time_signature"],
                 "key": tonal_data["key"],
                 "mode": tonal_data["mode"],

requirements.txt CHANGED Viewed

@@ -13,4 +13,3 @@ scipy>=1.12.0
 soundfile>=0.12.1
 matplotlib>=3.7.0
 pronouncing>=0.2.0
-pyannote.audio>=2.1.1

 soundfile>=0.12.1
 matplotlib>=3.7.0
 pronouncing>=0.2.0