For some reason this model is SLOW!
but it is highly powerful: Create a Prompt and let it perform the TASK! Im not sure about its conversive powers! (but task based !!!!!@)
MODELS !! :: : - Why?
New base Mode Generation from the final Cybertron series model and the Final CyberSeries Models :| It would seem that some models are not registering on the board ?? perhaps there is a limmit per person ! :
followers should know that the cyberboss was my highest model (renamed) And my Cybertron models were heavily merged and trained on many datasets : Even containing thinking pardigms :
merging the collection back to base model give the model a great position to begin from !
hence a new base model marker (Untrained/Sharded)(totally unlocked)
I had noticed the reality of TopK=1000,TopP=0.78, Temp=0.86 as so, Important with merged models allowing for the model to produce a bit more random results but also giving the model a larger pool to select from: obviously for Role play the model requires Temp to be 1+ :::
FineTuning ::
Fine tuning models close to 0.9 means that some information is totally Fixed and maynot return without focusing the model ! sometimes to train the model to 1.5+ allowing for loosly trained datas to surface : when higher tempretures are applied ! hence role play datasets being trained at higher loss rates that codeing datasets and math datasets (close to overfitting)
Hence Merging playing animportant role in centering the model again !
Merging is not just for fun and game!
it is a vital part of the training process and locking data into the model as well as sharing data! remember data is not stored in the model:: only the probablity of the information being returned !
From here to where ?
Currently there is a trend for evaluation ! evaluating the model to discover its weaknesses and threats , removing the specific layers identifed in the model with the ofensive content : enabling for these layers to be trained and replaced ! replace with ?? Replacing layers in the model ; also requires a realignment of information throughout the network ! despite being a copied layer (Still preserving some content) once ofensive content is discovered the network can be trained with its counter argument; hence the evaluation process enabes for the creationn of a custom dataset: targetting these internalized datas! Despite a neural network NOT being a storage system as the retrival process is based oñ probablliities :hence at points in the networ certain emebedding values are present and once translated or decodedd into standard tokens can actually be identidfed!
WOW!!
So ! this also means at each layer the network is actually storing a probablity table , word to word matrix of probab.itys for the next token generation ! IT may even be possible to train a network for image recognition , as long as the images are tokenized into an embedding value associated with the image, Hence image tokenizers : The embedding value produced should enable the output to contain the same images that were present in the training set , ie they have been tokenized and embedded into the model so it should be able to produce an embedding associated with this output ! Hence is should also be possible to retrive the image from the image tokenizer ? so tokens not decoded by the text tokenizer should be handed off to the image tokenizer! to dcode the embedding and return its original (cascade) / digital numercical value (each pixel is a number and with line encoding of images essentially each line can be reconstructed to produce an image, hence ALL images would nbeed to be BitMap/JPEG/PNG acording to the encoder!) MISSION!
But still we will need to uinstall all the competition datasets into the mode , so that the original baselines can be established enabling for , after layer removal full realignment to the same dataset collection ! hence retaining all funcitonality, its worth noting that domain specific datasets should also be handled in the same way!
MORE TO COME!(look out for the SFT's and Merges)
- Downloads last month
- 10