Discussion?
This model shows really good understanding and feels a lot less shallow than gemma-1.1. And slightly less shallow than Starling-beta when asking for a summarization of a song about internal struggles. It seems to understand it with more depth? Even in a smaller quant.
WizardLM-2-7B-IQ4_XS
The song's lyrics express feelings of inner turmoil and a struggle to cope with the thoughts and emotions overwhelming the singer. The chorus, "Bombs away," metaphorically describes the intensity of the singer's internal struggles, with the "bombs" being their thoughts and feelings that are metaphorically exploding in their head, making it hard to feel isolated despite the chaos.
In the verses, the singer reflects on their life, feeling that everything that once brought them joy now causes pain. They feel a sense of self-loathing for hurting others and a desire to isolate themselves, either by imagining a world that is empty and devoid of life ("vacant, dead") or one where life continues regardless ("alive, either way, it's all the same").
The bridge suggests a contemplation of the future, with the realization that the next day begins, regardless of whether the listener is present or not. It underscores the theme of facing the next day with whatever comes, whether the listener is there to experience it or not.
Overall, the song conveys a sense of despair and a desire to escape from the pain and problems of life, with an acknowledgment that the cycle of running away from these issues is ongoing. It touches on themes of mental health, self-perception, and the emotional weight of facing a new day.
Starling-LM-7B-beta-Q5_K_M
The song's lyrics express feelings of mental strain, isolation, and the struggle to cope with daily life. The chorus, "Bombs away," symbolizes the overwhelming thoughts and emotions filling the singer's mind, making them feel like their head is exploding. Despite the voices in their head, they find it difficult to feel alone. The repetitive chorus suggests a desire to escape these thoughts and feelings.
The verses delve into the singer's experiences of pain and joy, and how these emotions are intertwined. They've become what they once despised, and find it hard to face others due to feeling like they're faking their happiness. The singer imagines a world void of people or indifferent to their existence, reflecting their internal turmoil.
The bridge, "Tomorrow begins," hints at a sense of hope for the future, even if the singer is uncertain whether others will be there to support them. The chorus returns, emphasizing the desire to run away from their problems and seek relief from the mental chaos. Overall, the song addresses themes of mental health, self-discovery, and the struggle to connect with others.
gemma-1.1-7b-it-Q4_K_M
Song Summary:
The song expresses intense mental distress and a desire to escape from reality. The artist is overwhelmed by anxieties, voices in their head, and a sense of loneliness. They feel trapped in a cycle of pain and self-destructive behavior.
The lyrics suggest a feeling of emptiness and hopelessness, with a desire to isolate themselves from others. The song also hints at a belief that life is meaningless and suggests an impending collapse.
Personal preference is key though!
Lyrics for reference: Genius
I haven't had the chance to test it for too long, but I tried it for coding and it seemed to be doing decently with basic concepts. Your tests look good. Also, Starling has some issue with extra tokens which ruined some of my merges in a way, I might actually try merging with the wizard instead, but it's highly censored and aligned (even if we consider they deleted it because it wasn't rated on toxic eval...)
It seems that MS also quietly released more new models:
https://huggingface.co/microsoft/wavecoder-ultra-6.7b
The wavecoder-ultra seems to be scoring quite well, even surpasses gpt-4 in one of the test. Trained on carefully selected synthetic data, it seems, more info in their page and paper.
I haven't had the chance to test it for too long, but I tried it for coding and it seemed to be doing decently with basic concepts. Your tests look good. Also, Starling has some issue with extra tokens which ruined some of my merges in a way, I might actually try merging with the wizard instead, but it's highly censored and aligned (even if we consider they deleted it because it wasn't rated on toxic eval...)
It seems that MS also quietly released more new models:
https://huggingface.co/microsoft/wavecoder-ultra-6.7b
The wavecoder-ultra seems to be scoring quite well, even surpasses gpt-4 in one of the test. Trained on carefully selected synthetic data, it seems, more info in their page and paper.
I ran into the censorship pretty quick, i guess we can hope someone figures out how to de-align it while keeping the smarts?
I'd love to have a model like WizardLM2 that was uncensored from MS themselves, would be so much more versatile. Wasted all the smarts on "as an ai"
I'm not exactly one for coding so i don't run into those issues, as long as it can correct my JSON formatting I'm happy. When i fed WizardLM2 my JSON it automatically identified it as a windows terminal profile but deleted all of the colour profiles from the file so it's not good with JSON i guess. Nous Hermes 2 8x7B on huggingchat did it easily though.
I can upload the jsons for reference but i don't want to make this message giant
Edit - WizardLM2 seems to have a deeper understanding of the lyrics than solar instruct which i did not expect π
- Solar-10.7B-Instruct-V1.0-Q3_K_M
The song's lyrics depict a struggle with inner turmoil and overwhelming feelings. The narrator describes having a mind on the verge of exploding, filled with voices making it difficult to feel lonely. They want to escape from these thoughts and continue running away from their problems. They describe life as a constant struggle between joy and pain, and feeling like a reflection of the things they once hated. The desire to hide from society and imagine a world without people or emotions seems strong. Regardless of the outcome, the narrator accepts that tomorrow comes regardless of their current state.
Interesting results, you may want to try swiping on Solar when it gives short responses, or instruct it to give a more in-depth analysis. I've gotten it to be quite prolific in the RP merges. But sometimes it's sorta short and bland.
This is otherwise quite a good test tbh, asking about meaning of lyrics. I haven't thought to do that, but it's a good way to test abstract thinking as well if the meaning is more veiled, however it may be skewed if you pick older/popular songs that it could have been trained on.
I use just one response as an almost worst case scenario for models, especially instruct models.
In the sense that people using gpt4 would get annoyed having to regen. because they just want a good answer all the time and not an amazing answer every third time?
I use solar models for rp and chat scenarios as that is where the model shines but it doesn't seem to be the one fits all, first attempt, model for instruct use?
Although it was funny asking wizardlm2 what asus motherboard error codes mean and watching it make up new, completely different answers every regen
And it's just full of "as an ai" as soon as you touch anything opinionated
But I won't be surprised if there is a wizardlm2-dolphin model coming, which would be amazing for instruct use
Edit - The song has 11~ million plays on spotify, so I imagine it hasn't been in any datasets? But models are trained on hundreds of terabytes so we'll never really know πΏ
If web data from scraped websites like Genius was included, it could have an advantage, but without closely inspecting the datasets it's hard to know for sure.
In any case, I don't really regen with my settings. This is the first response I get from an 11b:
The lyrics you've presented are reflective of a person grappling with internal turmoil and existential distress. The repeated refrain of "bombs away" suggests a metaphorical explosion of thoughts and emotions, indicating a mind that is overwhelmed and possibly seeking an outlet for release.
The lines "Voices in my head, it's kinda hard to feel lonely" imply that the singer feels isolated in their experiences, with an inner dialogue that may either be a manifestation of their thoughts or a sign of mental health struggles such as anxiety or depression. The sentiment of feeling lonely despite an overabundance of voices suggests a contradiction between the need for human connection and the desire for solitude or escape.
The verse highlights a sense of existential malaise, where the act of living ("livin' life in a fade") feels both necessary and unsatisfactory, as it "brings me joy brings pain." This paradox reflects a common human experience where life's pleasures are intertwined with their opposite, often leading to discontent. The singer laments their transformation into what they once despised, indicating a loss of identity or a struggle with self-acceptance.
The admission "All I do is hurt people, I don't wanna" speaks to the singer's awareness of their negative impact on others and their reluctance to engage with the world due to the fear of causing further harm. The desire to avoid confronting othersβ"Go outside and see everybody's faces, it bothers me just to watch everyone fakin' it"βreveals a sense of disconnection and cynicism towards perceived inauthenticity in social interactions.
The imagining of a world "vacant, dead / Or alive, either way, it's all the same" reflects a nihilistic outlook, where the singer sees no meaningful difference between existence and non-existence. This suggests a profound search for purpose and a coping mechanism to numb the pain of a life that feels meaningless or unbearable.
The bridge "End / Tomorrow begins / With or without you there" offers a sense of hope or continuity, implying that regardless of the current struggles, the singer will move forward. The repetition of the chorus at the end reaffirms the desire to escape from the overwhelming thoughts and feelings, to "look away from all my problems, keep running / And running, and running," which could be interpreted as a metaphor for coping mechanisms or a quest for a fresh start.
Overall, the lyrics depict a deeply personal and emotional journey through despair and the search for a way to endure. They resonate with listeners who have faced or are facing similar struggles, offering a poignant reflection on the complexities of mental health and the human condition.
Prompt:
Provide a thoughtful analysis of these lyrics:
[...]
And the character card is "bland assistant who answers any questions"
If you want to do these tests consistently you should be swiping a lot with each prompt and model to get a sense of its "range".
And here is a first response on Wizard-LM-2
The lyrics you've provided express a sense of mental turmoil and a feeling of disconnection from the world around the singer. The chorus, "Hey, bombs away," could metaphorically represent an internal conflict or emotional explosion that the singer is experiencing. The repeated phrase, "So much on my mind, think my head's exploding," indicates a mental overload and a struggle to cope with the overwhelming thoughts.
The line, "Voices in my head, it's kinda hard to feel lonely," suggests that the singer might be experiencing a sense of isolation, possibly due to mental health issues like anxiety or depression, where internal voices or thoughts make solitude more pronounced. The desire to "let it bang inside my brain" implies a desire to confront or release these internal pressures, but the singer is finding it challenging to do so.
The verses delve deeper into the singer's emotional state, where they feel that everything they once enjoyed now causes pain, and their actions have hurt others. The singer's aversion to going outside and being around people reflects a sense of alienation and a desire to escape from reality, which they describe as "everyone fakin' it." This line suggests a feeling of disillusionment with society and a belief that people are not being authentic.
The bridge, "End, tomorrow begins, with or without you there," suggests a sense of resignation and the realization that the singer has some control over their own narrative, regardless of external circumstances or the presence of others. This could imply a sense of hope or the acknowledgment that life goes on, with or without the singer's participation in the world as it currently is.
The chorus' repetition throughout the song emphasizes the cyclical nature of the singer's mental state, with a pattern of internal conflict and avoidance. The repeated lines also serve to embed the themes of the song into the listener's mind, potentially creating a shared experience of the emotional journey the singer is describing.
Overall, the lyrics present a raw and honest depiction of mental health struggles, highlighting themes of self-reflection, the internal battle with one's own thoughts and emotions, and the desire for escape or release from psychological pain. The song may resonate with listeners who have experienced similar feelings of internal conflict and isolation.
I would definitely go with the 11B on this test as it sounds a bit less AI-y to me, but that's why it's worth testing the same scenario many times in a row to get a real sense of the model.
OH, and BTW, the Wizard-2 model hallucinates like wild, in super bland Alpaca preset in LM Studio it makes up stuff quite a lot:
The lyrics you've provided are from the song "Bombs Away" by British alternative rock band Blurgh, which was released in 20003 as part of their album "Silent Alarm."
And that's on Q8 :D
Just to comment more on hallucinations:
10 regenerations on 11B (silver sun) = 0 times of starting the response with hallucation about author.
10 regenerations on Wizard-2 = 8/10 times hallucinated who the artist and song is.
(bonus) 10 regenerations on wavecoder-6.7B-Ultra (the other new model from microsoft) = 0 times of inventing who the artists/song are
So for sure I'd be rocking the solars as daily vs this wizard
EDIT: Actually, the hallucinations occur when using other instruction sets (like Alpaca, vicuna, chatml) but don't happen with Wizard set, so I scratch this off. However, it's a bit of a downside because there are many models which won't hallucinate or break down between multiple instruction sets.
Overall, I would still probably not have it as a daily driver due to how "AI-aligned" it is, and how much it writes like AI. It's highly censored as well, which almost defeats the purpose of local LLM.
I agree with you that making a dolphin variant would likely help a lot. So much training data and efforts wasted on this alignment bs.
Overall, I would still probably not have it as a daily driver due to how "AI-aligned" it is, and how much it writes like AI. It's highly censored as well, which almost defeats the purpose of local LLM.
I agree with you that making a dolphin variant would likely help a lot. So much training data and efforts wasted on this alignment bs.
I think that WizardLM could be a really useful base for training on a single small dataset for specific commercial purposes or training on support bots for low volume applications.
I don't think it's a contender for local llm at all, it's like a selling a car without wheels, why do it? Nobody wants a car without wheels, nobody wants a llm that berates you for asking it about itself and both make it harder for the end user to actually use how they want.
They train it on human data then take away the human part.
Censorship sucks πΏ
Solar still wins for local chat use, I don't think anything will challenge it unless llama3 comes unaligned.
Overall, I would still probably not have it as a daily driver due to how "AI-aligned" it is, and how much it writes like AI. It's highly censored as well, which almost defeats the purpose of local LLM.
I agree with you that making a dolphin variant would likely help a lot. So much training data and efforts wasted on this alignment bs.
I think that WizardLM could be a really useful base for training on a single small dataset for specific commercial purposes or training on support bots for low volume applications.
I don't think it's a contender for local llm at all, it's like a selling a car without wheels, why do it? Nobody wants a car without wheels, nobody wants a llm that berates you for asking it about itself and both make it harder for the end user to actually use how they want.
They train it on human data then take away the human part.
Censorship sucks πΏ
Solar still wins for local chat use, I don't think anything will challenge it unless llama3 comes unaligned.
I fully agree, and I noticed some merges with the wizard already appeared, but I haven't tried them out yet. One involved Dolphin. However, I'd like real fine-tuning, not just merging.
On a side note, I got into a fun project at work, since I do QA but my dev team knows I am concernedly obsessed with LLM and decided to add me in. My task was to sort out an open source model for document analysis, and it's a 2-day project. All day I've been trying to get decent responses... I tried Starling beta, Wizard lm-2, and a couple of mistral variants. They all hallucinate so badly, and fail logically :/ Using ollama models which come with their correct instruction sets in the modelfiles.
Ultimately, I am heavily leaning into the ol' reliable - Mistral 7b v0.2 instruct. But in truth, all of the 7b models are UNreliable. This is why I entirely focused on doing PR merges. I can't afford to work with 30B and beyond, so that's the best you can do at the lower sizes. Hope for unhinged and creative RP :D You just cannot rely on these models much beyond that and always need to double-check their reasoning and responses.
SOLAR is only that good because it's mistral, but with the added 3B tokens training. This training really pulled it through, however it's still not reliable in that full sense. (But to me it seems to clearly outweigh 7B mistral, yet still makes dumb logical mistakes at times)
I'm very open to wizard 2 fine-tunes, of course. And I really hope llama3 comes out soon this week and has a 7b, and it's somehow really good.
EDIT: kinda funny but dolphin-phi-2 actually has been doing the best in my use case for this fun project with the document analysis... :D Mostly its logic is sound. And it's way faster on cpu and ram. Smart little cookie
EDIT: kinda funny but dolphin-phi-2 actually has been doing the best in my use case for this fun project with the document analysis... :D Mostly its logic is sound. And it's way faster on cpu and ram. Smart little cookie
I wonder if a Yi based models would help, like faro, which is based on Yi-9B and focuses on long context applications
Or InternLM2-7B, they claim nearly perfect context modelling.
"Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval."
And
"InternLM2-Chat also provides data analysis capability."
Just some models I've ran across before, I haven't personally used them and haven't ever used a model for document analysis
I'm sure mistral 8x22B Instruct would be amazing for the application with the 64k context window but the 263GB+ContextGB model size is insane
58x4.8GB shards π
Mistral should retrain or make a new model in 1.58bit, the actual compute is easy but the ram is unfathomable, 1.58bit would solve those issues.
And probably a dumb concept but the papers claim near 1/1 model performance at 1.58bit, I wonder if it would it be possible to then quant 1.58bit into the equivalent of say, Q4 and have a coherent 0.4bit model? Because that would allow for an 8x7b in 1.45GB of ram...
The thing is that this project is basically something we do in a docker container and run locally to demo. So we are limited by our ridiculous laptops or AVD instances. So even a 3B takes minutes to respond. I don't even want to imagine a 9b with 200k context :D
Minutes to get something out of 4k context phi-2...
Otherwise, I would have went with something much bigger, or straight-up hook up the gpt-4 api.
As for 1bit era... still waiting on that :|
The thing is that this project is basically something we do in a docker container and run locally to demo. So we are limited by our ridiculous laptops or AVD instances. So even a 3B takes minutes to respond. I don't even want to imagine a 9b with 200k context :D
Minutes to get something out of 4k context phi-2...
Otherwise, I would have went with something much bigger, or straight-up hook up the gpt-4 api.
As for 1bit era... still waiting on that :|
I think at that point some phones would be faster π
Or Gemini API wait no distraction
Llama3! It's alive!
Only 8k context but it can be roped better than roping from 4k then!
- meta-llama/Meta-Llama-3-8B
That agreement box is huge π