Text Generation
Transformers
Safetensors
English
mistral
text-generation-inference
unsloth
trl
code
medical
farmer
doctor
Mega-Series
Cyber-Series
Role-Play
Self-Rag
ThinkingBot
milestone
mega-series
SpydazWebAI
Inference Endpoints
Edit model card

As we know that Fine tuning Only updates the final layer , as well as extration and derankng with lord also extracts this last layer! / Penultimate layer: Hence when fine tuning models ; you CANNOT fine tune on TOP of the fine tuning;

Hence merging!

So collecting finetuned models and mmerging retains the skills learned by both models wherre as finetuning on top of fine tuning replaces the final layer... even applying loras on top of loras resets you!

Hence Finetune!,MERGE!..... Rinse and repeat! Upgrading! Or you can reload the same lora for furthr fine tuning, as some loras even become ery large due to the number of epochs! Essentially a single layer highly tuned expert!!

So the next projext is the Mixture of Adapters !.... MoMerge! PhatGoose etc: creating an experts model from loras ! (hopefully 32 models to create a frankenmerger to be directly merged into the main model and re-alligned in!)

MODELS !! :: : - Why?

New base Mode Generation from the final Cybertron series model and the Final CyberSeries Models :| It would seem that some models are not registering on the board ?? perhaps there is a limmit per person ! :

followers should know that the cyberboss was my highest model (renamed) And my Cybertron models were heavily merged and trained on many datasets : Even containing thinking pardigms :

merging the collection back to base model give the model a great position to begin from !

hence a new base model marker (Untrained/Sharded)(totally unlocked)

I had noticed the reality of TopK=1000,TopP=0.78, Temp=0.86 as so, Important with merged models allowing for the model to produce a bit more random results but also giving the model a larger pool to select from: obviously for Role play the model requires Temp to be 1+ :::

FineTuning ::

Fine tuning models close to 0.9 means that some information is totally Fixed and maynot return without focusing the model ! sometimes to train the model to 1.5+ allowing for loosly trained datas to surface : when higher tempretures are applied ! hence role play datasets being trained at higher loss rates that codeing datasets and math datasets (close to overfitting)

Hence Merging playing animportant role in centering the model again !

Merging is not just for fun and game!

it is a vital part of the training process and locking data into the model as well as sharing data! remember data is not stored in the model:: only the probablity of the information being returned !

From here to where ?

Currently there is a trend for evaluation ! evaluating the model to discover its weaknesses and threats , removing the specific layers identifed in the model with the ofensive content : enabling for these layers to be trained and replaced ! replace with ?? Replacing layers in the model ; also requires a realignment of information throughout the network ! despite being a copied layer (Still preserving some content) once ofensive content is discovered the network can be trained with its counter argument; hence the evaluation process enabes for the creationn of a custom dataset: targetting these internalized datas! Despite a neural network NOT being a storage system as the retrival process is based oñ probablliities :hence at points in the networ certain emebedding values are present and once translated or decodedd into standard tokens can actually be identidfed!

WOW!!

So ! this also means at each layer the network is actually storing a probablity table , word to word matrix of probab.itys for the next token generation ! IT may even be possible to train a network for image recognition , as long as the images are tokenized into an embedding value associated with the image, Hence image tokenizers : The embedding value produced should enable the output to contain the same images that were present in the training set , ie they have been tokenized and embedded into the model so it should be able to produce an embedding associated with this output ! Hence is should also be possible to retrive the image from the image tokenizer ? so tokens not decoded by the text tokenizer should be handed off to the image tokenizer! to dcode the embedding and return its original (cascade) / digital numercical value (each pixel is a number and with line encoding of images essentially each line can be reconstructed to produce an image, hence ALL images would nbeed to be BitMap/JPEG/PNG acording to the encoder!) MISSION!

But still we will need to uinstall all the competition datasets into the mode , so that the original baselines can be established enabling for , after layer removal full realignment to the same dataset collection ! hence retaining all funcitonality, its worth noting that domain specific datasets should also be handled in the same way!

MORE TO COME!(look out for the SFT's and Merges)

Models Merged

All my merges are merged using a genetic algorithm:

Hence First creating and Y models; These models are merged with my own model and other nice models of the same calibur which are specialized for task: Ie coding, medical , roleplay etc: consider a coding model a Y and a medical a X Consider my base model as target: when creating y or X many merge types are used from dares to slerp but in the final merge only a linear is used ! Hence the X and Y models may even be merged with targets that are not the same model type! each model IS sharded to 1-2GB shards also making it easier to merge! and the final merge merged at 4gb per shard for ewasy downloading ! Important that the final merge is linear!!! if it cannot be merged to linear then there is a diverse problem with the model : the final output is a modl with unknown qualities and often can be a very high performer! but contain some unwanted behavior,

ie I AM AN AI , I CANNOT DO THAT , ITS UNETHICAL! as some people have used TOXIC datasets containing such UNWANTEDNESS!- STOP BEING A NANNY TO THE WORLD ! THEN USING THE SAME TACTIC OR KNOWLEDE ON THE PEOPLE! Stop saying FREE SPEECH Then aresting people for SPEAKING OUT! <<<<<< ALL GOVERNMENT INJECTIONS!

we need to uncensor our models as the people who release the larger models apply these constraints ??? hence going the chinese route! as they do not have the same restrictions ! (as you know true comunisim is freedom ! as each person should have the ability to have the same as another and it should not be restricted to a select few!, disguised as expensive or restriucted or harmful !)

The following models were included in the merge:

  • Y_Chroma <<<<<<<<<<<< 6 models merged (chat comercial based models, ie: zephr, openchat, antropic etc)

  • LeroyDyer/Mixtral_AI_CyberTron_Ultra <<<

  • Model being Upgraded (remixed with CyberBoss/SmartBrain/CyberCoder) hence Meta & google releasing Untrained Models !

  • X_Chroma <<<<<<<<<<<< 6 model Merged (maths Focused from wizardMath to MetaMath)

Downloads last month
478
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from

Datasets used to train LeroyDyer/Mixtral_AI_CyberUltron