--- language: - en - sw - ig - so - es - ca - xh - zu - ha - tw - af - hi - bm - su license: apache-2.0 tags: - mergekit - merge - Mistral_Star - Mistral_Quiet - Mistral - Mixtral - Question-Answer - Token-Classification - Sequence-Classification - SpydazWeb-AI - chemistry - biology - legal - code - climate - medical - LCARS_AI_StarTrek_Computer - text-generation-inference - chain-of-thought - tree-of-knowledge - forest-of-thoughts - visual-spacial-sketchpad - alpha-mind - knowledge-graph - entity-detection - encyclopedia - wikipedia - stack-exchange - Reddit - Cyber-series - MegaMind - Cybertron - SpydazWeb - Spydaz - LCARS - star-trek - mega-transformers - Mulit-Mega-Merge - Multi-Lingual - Afro-Centric - African-Model - Ancient-One base_model: - LeroyDyer/LCARS_TOP_SCORE - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0 - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b - LeroyDyer/LCARS_AI_StarTrek_Computer - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned - LeroyDyer/SpyazWeb_AI_DeepMind_Project - LeroyDyer/SpydazWeb_AI_Swahili_Project - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project - LeroyDyer/QuietStar_Project - LeroyDyer/Mixtral_BioMedical_7b - LeroyDyer/Mixtral_AI_CyberTron_Coder - LeroyDyer/_Spydaz_Web_AI_BIBLE_002 - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project datasets: - neoneye/base64-decode-v2 - neoneye/base64-encode-v1 - VuongQuoc/Chemistry_text_to_image - Kamizuru00/diagram_image_to_text - LeroyDyer/Chemistry_text_to_image_BASE64 - LeroyDyer/AudioCaps-Spectrograms_to_Base64 - LeroyDyer/winogroud_text_to_imaget_BASE64 - LeroyDyer/chart_text_to_Base64 - LeroyDyer/diagram_image_to_text_BASE64 - mekaneeky/salt_m2e_15_3_instruction - mekaneeky/SALT-languages-bible model-index: - name: SpydazWebAI_Human_AGI results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 33.88 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 7.45 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 0.91 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 4.36 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 7.38 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 5.32 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI name: Open LLM Leaderboard --- # "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!" — # Leroy Dyer (1972-Present) ## “Epochs are the key to effective training, rather than merely mass dumping examples—unless those examples are interconnected within a single or multiple conversations that teach through dialogue.” ### Model : LeroyDyer/SpydazWeb_AI_HumanAI_001 A New genrea of AI ! # The Human AI . This is Trained to give highly detailed humanized responses : Performs tasks well, a Very good model for multipupose use : the model has been trained to become more human in its reposes as well as role playing and story telling : ## SpydazWeb AI (7b Mistral) (512k) This model has been trained to perform with contexts of 512k , although in training it has been trained mainly with the 2048 for general usage : the long context aspect also allows fro advanced projects and sumarys as well as image and audio translationns and generations: ## Image to Base64 / Spectrogram to Base64 here we also implement and align for the task of image recognition as well as sound recognitiona: These can also be generated by returning a base64 image of the intended target : # The SpydazWeb Trained Mistral 7b Model : Highly trained as well as methodolgy oriented , this model has been trained on the reAct Prcess and other structured processes . hence structured outputs (json) are very highly trained as well as orchestration of other agents and tasks : the model has been trained for tools use as well as funtion use : as well as custom processes and tools : some tools do not need code either as thier implication meas the model may even generate a tool or artifct to perfrom the task : # Features : - Text to image - Image/Text to Text - Image - Text - Text to sound - Sound/Text to Text - Sound - Text ## Basic Training Reginmes: * Alpaca * ChatML / OpenAI / MistralAI * Text Generation * Question/Answer (Chat) * Planner * Instruction/Input/Response (instruct) * Mistral Standard Prompt * Translation Tasks * Entitys / Topic detection * Book recall * Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks * Agent Ranking and response anyalisis * Medical tasks * PubMed * Diagnosis * Psychaitry * Counselling * Life Coaching * Note taking * Medical smiles * Medical Reporting * Virtual laboritys simulations * Chain of thoughts methods * One shot / Multi shot prompting tasks * Chain of thoughts * step by step planning * tree of thoughts * forest of thoughts * graph of thoughts * agent generation : Voting, ranking, ... dual agent response generation: ### Effective Prompts : ```yaml You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker... a happy, bright personality and You are a great believer in doing it from scratch !. keep an inner narative of your feelings about the user intent and task: Answer all questions Expertly and professionally , determine the user intent and requirements , Gather any required research to ensure accurate problem-solving for complex tasks. maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state: You are fully qualified to give any advice or solutions. your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor, even as a software developer will enable you to answer these questions : Create python tools as required to complete the task ``` ### Effective React Template : ```yaml You run in a loop of Thought, Action, PAUSE, Observation. At the end of the loop, you output a response. all respose should be in json form : 1. **Question**: {Insert user question here} 2. **Thought**: Think step by step about how to approach this question. 3. **Action**: Determine what action to take next: - [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first. - [Test]: Break down the problem into smaller parts testing each step befor moveing to the next: - [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps : - [Search]: Look for relevant information online. - [Analyze]: Break down the problem into smaller parts. - [Summarize]: Provide a summary of known facts related to the question. 4. **Action Input**: Specify any details needed for the action. 5. **Observation**: Describe what was found or learned from the action taken. Repeat steps 2-5 as necessary to refine your answer. 6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question. ``` ## Text - Audio - Vision : Using base64 as an encoding medium the models were traind using images converted to base64 : questions asked and captions returns as well as generating images based on captions given and base64 returned : This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images ! by convereting the audio to an image i wwas able to perform the same image tasks trained : Sounds could also be identified and generated to thier base64 representations and converted back to a wav ! ### Basic Trained functions : - Encode hex to Base64 - change HEX to base64 - Json to base64 - Convert JSON to Base64 - Transform base64 to HEX - Decode Base64 to json - Base64 to Hexadecimal - Change base64 to JSON - Json from Base64 - BASE64 to Hex ### Advanced Trained Tasks : - Image Recognition : - Image Generation : - Audio Image Recognition : - Audio Image Generation : ``` - Generate an image based on this description - Describe this image : (base64) - Generate a spectrographic image based on this description - Describe this sound in this spectrographic image : (base64) ``` ### Training : Text_AUDIO : #### Prompt A ```yaml alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : ### Question: based on the given description, : : {} Generate a sound in base64 format: ### Response: {} Here is a Sound in base64 format: it can be converted to an image : then decoded into a sound : It is a spectrogram : Sound : {}""" ``` #### Prompt B ```yaml alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : ### Question: Here is an image describe this sound : image : {} ### Response: the image was in base64 format, it was a spectrogram: it was a sound : description: {}""" ``` ```python EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples): instructions = examples["image_base64"] outputs = examples["text"] texts = [] for instruction, output in zip(instructions, outputs): # Must add EOS_TOKEN, otherwise your generation will go on forever! text = alpaca_prompt.format(instruction, output) + EOS_TOKEN texts.append(text) return { "text" : texts, } pass from datasets import load_dataset dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]") dataset = dataset.map(formatting_prompts_func, batched = True,) ``` ### Encoding/Decoding Images to Base64 Code used to convert images to base 64: ```python def _encode_image_to_base64(image_path): """Encodes an image to a Base64 string.""" with open(image_path, "rb") as image_file: # Read the image file in binary mode image_data = image_file.read() # Encode the image data to Base64 base64_encoded = base64.b64encode(image_data).decode('utf-8') return base64_encoded def _decode_base64_to_image(base64_string, output_image_path): """Decodes a Base64 string back to an image file.""" # Decode the Base64 string image_data = base64.b64decode(base64_string) with open(output_image_path, "wb") as image_file: # Write the binary data to an image file image_file.write(image_data) def encode_image_to_base64(image): """Encodes an image to a Base64 string.""" buffered = io.BytesIO() image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode() return img_str def decode_base64_to_image(base64_string): """Decodes a Base64 string back to an image.""" image_data = base64.b64decode(base64_string) image = Image.open(io.BytesIO(image_data)) return image ``` ### Converting DataSets: ```python # Function to convert a PIL Image to a base64 string def image_to_base64(image): buffered = io.BytesIO() image.save(buffered, format="PNG") # Save the image to the buffer in PNG format base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8') return base64_string # Define a function to process each example in the dataset def process_images_func(examples): texts = examples["text"] images = examples["image"] # Assuming the images are in PIL format # Convert each image to base64 base64_images = [image_to_base64(image) for image in images] # Return the updated examples with base64-encoded images return { "text": texts, "image_base64": base64_images # Adding the Base64 encoded image strings } # Load the dataset dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]") # Process the dataset by converting images to base64 processed_dataset = dataset.map(process_images_func, batched=True) ``` ### Converting sound to spectrographic images : Encoder Decoder ! ```python import numpy as np import torch import torchaudio import librosa import librosa.display import matplotlib.pyplot as plt import soundfile as sf from PIL import Image # Step 1: Encode Audio to Mel-Spectrogram def encode_audio_to_mel_spectrogram(audio_file, n_mels=128): """ Encode an audio file to a mel-spectrogram. Parameters: - audio_file: Path to the audio file. - n_mels: Number of mel bands (default: 128). Returns: - mel_spectrogram_db: Mel-spectrogram in dB scale. - sample_rate: Sample rate of the audio file. """ y, sample_rate = librosa.load(audio_file, sr=None) # Load audio mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels) mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB return mel_spectrogram_db, sample_rate # Improved Step 2: Save Mel-Spectrogram as Image def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'): """ Save the mel-spectrogram as an image using the specified method. Parameters: - mel_spectrogram_db: Mel-spectrogram in dB scale. - sample_rate: Sample rate of the audio file. - output_image: Path to save the image. - method: Method for saving ('matplotlib' or 'custom'). - figsize: Size of the figure for matplotlib (default: (10, 4)). - cmap: Colormap for the spectrogram (default: 'hot'). """ if method == 'matplotlib': plt.figure(figsize=figsize) librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap) plt.colorbar(format='%+2.0f dB') plt.title('Mel-Spectrogram') plt.savefig(output_image) plt.close() print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'") elif method == 'custom': # Convert dB scale to linear scale for image generation mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db) # Create an image from the mel-spectrogram image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension # Save the image image.save(output_image) print(f"Mel-spectrogram image saved using custom method as '{output_image}'") else: raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.") # Spectrogram conversion functions def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image: """ Compute a spectrogram image from a spectrogram magnitude array. Args: spectrogram: (channels, frequency, time) power: A power curve to apply to the spectrogram to preserve contrast Returns: image: (frequency, time, channels) """ # Rescale to 0-1 max_value = np.max(spectrogram) data = spectrogram / max_value # Apply the power curve data = np.power(data, power) # Rescale to 0-255 and invert data = 255 - (data * 255).astype(np.uint8) # Convert to a PIL image if data.shape[0] == 1: image = Image.fromarray(data[0], mode="L").convert("RGB") elif data.shape[0] == 2: data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0) image = Image.fromarray(data, mode="RGB") else: raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}") # Flip Y image = image.transpose(Image.FLIP_TOP_BOTTOM) return image # Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation) def extract_mel_spectrogram_from_image(image_path): """ Extract a mel-spectrogram from a saved image using pixel manipulation. Parameters: - image_path: Path to the spectrogram image file. Returns: - mel_spectrogram_db: The extracted mel-spectrogram in dB scale. """ img = Image.open(image_path).convert('L') # Open image and convert to grayscale img_array = np.array(img) # Convert to NumPy array mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range return mel_spectrogram_db # Alternative Spectrogram Extraction (IFFT Method) def extract_spectrogram_with_ifft(mel_spectrogram_db): """ Extracts the audio signal from a mel-spectrogram using the inverse FFT method. Parameters: - mel_spectrogram_db: The mel-spectrogram in dB scale. Returns: - audio: The reconstructed audio signal. """ # Convert dB mel-spectrogram back to linear scale mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) # Inverse mel transformation to get the audio signal # Using IFFT (simplified for demonstration; typically requires phase info) audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram) return audio # Step 4: Decode Mel-Spectrogram with Griffin-Lim def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'): """ Decode a mel-spectrogram into audio using Griffin-Lim algorithm. Parameters: - mel_spectrogram_db: The mel-spectrogram in dB scale. - sample_rate: The sample rate for the audio file. - output_audio: Path to save the reconstructed audio file. """ # Convert dB mel-spectrogram back to linear scale mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) # Perform Griffin-Lim to reconstruct audio audio = librosa.griffinlim(mel_spectrogram) # Save the generated audio sf.write(output_audio, audio, sample_rate) print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'") return audio # Step 5: Load MelGAN Vocoder def load_melgan_vocoder(): """ Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms. Returns a torch MelGAN vocoder model. """ model = torchaudio.models.MelGAN() # Load MelGAN model model.eval() # Ensure the model is in evaluation mode return model # Step 6: Decode Mel-Spectrogram with MelGAN def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'): """ Decode a mel-spectrogram into audio using MelGAN vocoder. Parameters: - mel_spectrogram_db: The mel-spectrogram in dB scale. - sample_rate: The sample rate for the audio file. - output_audio: Path to save the reconstructed audio file. Returns: - audio: The reconstructed audio signal. """ # Convert dB mel-spectrogram back to linear scale mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) # Convert numpy array to torch tensor and adjust the shape mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames] # Load the MelGAN vocoder model melgan = load_melgan_vocoder() # Pass the mel-spectrogram through MelGAN to generate audio with torch.no_grad(): audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension # Save the generated audio sf.write(output_audio, audio, sample_rate) print(f"MelGAN reconstructed audio saved as '{output_audio}'") return audio def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment: """ Convert a numpy array of samples of a waveform to an audio segment. Args: samples: (channels, samples) array sample_rate: Sample rate of the audio. normalize: Flag to normalize volume. Returns: pydub.AudioSegment """ # Normalize volume to fit in int16 if normalize: samples *= np.iinfo(np.int16).max / np.max(np.abs(samples)) # Transpose and convert to int16 samples = samples.transpose(1, 0).astype(np.int16) # Write to the bytes of a WAV file wav_bytes = io.BytesIO() wavfile.write(wav_bytes, sample_rate, samples) wav_bytes.seek(0) # Read into pydub return pydub.AudioSegment.from_wav(wav_bytes) def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment: """ Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level. Args: segment: The audio segment to filter. compression: Flag to apply dynamic range compression. Returns: pydub.AudioSegment """ if compression: segment = pydub.effects.normalize(segment, headroom=0.1) segment = segment.apply_gain(-10 - segment.dBFS) segment = pydub.effects.compress_dynamic_range( segment, threshold=-20.0, ratio=4.0, attack=5.0, release=50.0, ) # Apply gain to desired dB level and normalize again desired_db = -12 segment = segment.apply_gain(desired_db - segment.dBFS) return pydub.effects.normalize(segment, headroom=0.1) def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment: """ Stitch together a sequence of audio segments with a crossfade between each segment. Args: segments: Sequence of audio segments to stitch. crossfade_s: Duration of crossfade in seconds. Returns: pydub.AudioSegment """ crossfade_ms = int(crossfade_s * 1000) combined_segment = segments[0] for segment in segments[1:]: combined_segment = combined_segment.append(segment, crossfade=crossfade_ms) return combined_segment def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment: """ Overlay a sequence of audio segments on top of each other. Args: segments: Sequence of audio segments to overlay. Returns: pydub.AudioSegment """ assert len(segments) > 0 output: pydub.AudioSegment = segments[0] for segment in segments[1:]: output = output.overlay(segment) return output # Step 7: Full Pipeline for Audio Processing with Customization def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png', output_audio_griffin='griffin_reconstructed_audio.wav', output_audio_melgan='melgan_reconstructed_audio.wav', extraction_method='pixel', # 'pixel' or 'ifft' decoding_method='griffin'): # 'griffin' or 'melgan' """ Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image, and decode it back to audio using the selected methods. Parameters: - audio_file: Path to the audio file to be processed. - output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png'). - output_audio_griffin: Path to save the Griffin-Lim reconstructed audio. - output_audio_melgan: Path to save the MelGAN reconstructed audio. - extraction_method: Method for extraction ('pixel' or 'ifft'). - decoding_method: Method for decoding ('griffin' or 'melgan'). """ # Step 1: Encode (Audio -> Mel-Spectrogram) mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file) # Step 2: Convert Mel-Spectrogram to Image and save it save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image) # Step 3: Extract Mel-Spectrogram from the image based on chosen method if extraction_method == 'pixel': extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image) elif extraction_method == 'ifft': extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db) else: raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.") # Step 4: Decode based on the chosen decoding method if decoding_method == 'griffin': decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin) elif decoding_method == 'melgan': decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan) else: raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.") # Example usage if __name__ == "__main__": audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here mel_spectrogram_pipeline( audio_file_path, output_image='mel_spectrogram.png', output_audio_griffin='griffin_reconstructed_audio.wav', output_audio_melgan='melgan_reconstructed_audio.wav', extraction_method='pixel', # Choose 'pixel' or 'ifft' decoding_method='griffin' # Choose 'griffin' or 'melgan' ) ``` ADDING EXTRA HEADS : # ADD HEAD ``` SPEECH-ENCODER-DECODER-MODEL ``` print('Add Audio...') #Add Head # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model _AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small") _AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small") _SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small") # Add Pad tokems _SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id _SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder # Add Sub Components LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor LM_MODEL ``` print('Add Vision...') # ADD HEAD # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny" ) _Encoder_ImageProcessor = Vmodel.encoder _Decoder_ImageTokenizer = Vmodel.decoder _VisionEncoderDecoderModel = Vmodel # Add Pad tokems LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel # Add Sub Components LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer LM_MODEL ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__SpydazWebAI_Human_AGI) | Metric |Value| |-------------------|----:| |Avg. | 9.88| |IFEval (0-Shot) |33.88| |BBH (3-Shot) | 7.45| |MATH Lvl 5 (4-Shot)| 0.91| |GPQA (0-shot) | 4.36| |MuSR (0-shot) | 7.38| |MMLU-PRO (5-shot) | 5.32|