[MODELS] Discussion

#372
by victor HF staff - opened
Hugging Chat org
โ€ข
edited Sep 23

Here we can discuss about HuggingChat available models.

image.png

victor pinned discussion

what are limits of using these? how many api calls can i send them per month?

How can I know which model am using

How can I know which model am using

at the bottom of your screen:
image.png

Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding

Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.

apparently gemma cannot code?

Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations

This comment has been hidden

is it possible to know what parameters this models are running ?

Hugging Chat org

is it possible to know what parameters this models are running ?

It's all here! https://github.com/huggingface/chat-ui/blob/main/.env.template

is it possible to know what parameters this models are running ?

It's all here! https://github.com/huggingface/chat-ui/blob/main/.env.template

thanks this is super useful OWO

What happened to Falcon? It was my favorite. :(

Hugging Chat org
โ€ข
edited Feb 27

@SAMMdev Falcon was too costly to run at scale (for now), we might put back a more optimized version in the future

I would like to use "mistralai/Mixtral-8x7B-Instruct-v0.1";
Please could tell me what is the precision of the model behind the chat? Thanks

This comment has been hidden

@SAMMdev Falcon was too costly to run at scale (for now), we might put back a more optimized version in the future

What if we use Falcon 70B?

smaug 72B would be a great addition

Iโ€™m unable to get output from CodeLlama

I'm also voting for Samsung 72B. We already have the two Llama 70B models on here soo to me it seems reasonable to integrate this one as well.

This is probably not going to happen, but xai-org/grok-1 would be insane to have here

IYH Why is the title of most chats (on the left panels roster) "๐Ÿค– Hello! I am a language model AI assistant."?

This implies that the system prompt of my assistants is not the fundamental prompt, but there is an inbuilt base prompt that is run before my system prompt .. is this correct roughly and if so how do I change this base prompt for Mistral LLM ?

hgchattitles.PNG

Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R.

IYH thank you for your advice. Apologies I have no idea what the concepts mean or what to do "Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R." (fwiw I prompted mistral about it and it did not know either.)
Would you kindly elaborate (or point me towards a resource that explains this

IYH thank you for your advice. Apologies I have no idea what the concepts mean or what to do "Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R." (fwiw I prompted mistral about it and it did not know either.)
Would you kindly elaborate (or point me towards a resource that explains this

Huggingface will notify you when someone posts in a discussion you've commented on, even if they didn't directly reply to you. I was suggesting two new models, unrelated to your question.

Which model is better to use?
How to know difference of them?

IYH Why is the title of most chats (on the left panels roster) "๐Ÿค– Hello! I am a language model AI assistant."?

This implies that the system prompt of my assistants is not the fundamental prompt, but there is an inbuilt base prompt that is run before my system prompt .. is this correct roughly and if so how do I change this base prompt for Mistral LLM ?

hgchattitles.PNG

@DYB5784 HF Chat has a Mistral 7B model setup with system prompt for the task of summarizing the first chat prompt/msg into a title for the chat history log, so unless one explicitly addresses that in the first msg it is what it is ig. and we can always rename it. Still, i think it would have been awesome if we could customize the naming style/prompt it ourselves.

Is openchat/openchat-3.5-0106 coming back? Was it removed to be upgraded?

Is openchat/openchat-3.5-0106 coming back? Was it removed to be upgraded?

It looks like they also removed the Meta models :(

Hope they add command r instead of bringing those back tbh.

Hope they add command r instead of bringing those back tbh.

What is command r? I'm a newb.

Hope they add command r instead of bringing those back tbh.

What is command r? I'm a newb.

Command-r+ is a new LLM from Cohere that overtook GPT-4 on the openllm leaderboard.

Hugging Chat org

hey!

On HuggingChat we aim to always propose a small selection of models which will evolve over time as the field of ML progresses forward ๐Ÿ”ฅ

Stay tuned!

On Hugging Chat we aim to always propose a small selection of models which will evolve over time as the field of ML progresses forward ๐Ÿ”ฅ
Stay tuned!

Yup, small models are better and lighter (Cost friendly) + now Hugging chat ai has internet access so small models like (Mixtrail, Nous hermes, etc.) can even performs very better in many areas then many 70b models, and
We are happy to see what's coming next ๐Ÿ”ฅ๐Ÿ”ฅ.

Hope they add command r instead of bringing those back tbh.

The Meta ones felt misaligned and gave a lot of refusals. The 70b code one would lecture and moralize even with nothing bad in the prompt.

I hope LLaMA 3 isn't as misaligned mess.

The Meta ones felt misaligned and gave a lot of refusals. The 70b code one would lecture and moralize even with nothing bad in the prompt.

This is because they do not do fine tuning, Manytime fully Finetuned model of Llama 7b is better than no fine tuned llama 70b

Hugging Chat org
โ€ข
edited Apr 10

Cohere Command R+ is now on HuggingChat!

image.png

@Victor Thank you for the new model! but if possible, i think a slight warning/notification should be very helpful to us about which model will be taken down!
Goodbye, OpenChat! it was really good for 7B!

@Victor Thank you for the new model! but if possible, i think a slight warning/notification should be very helpful to us about which model will be taken down!
Goodbye, OpenChat! it was really good for 7B!

Agree with 1st point

@Victor Thank you for the new model! but if possible, i think a slight warning/notification should be very helpful to us about which model will be taken down!
Goodbye, OpenChat! it was really good for 7B!

Agree

Disagree...

Cohere Command R+ is now on HuggingChat!

image.png

... Hey Victor, if you're gonna surprise us with new models like this, then you can remove anything you want without notify anyone, not even Clement, xd.

But jokes aside, this is just great, if your adding/removal policy keeps like this, in 3 months we will have hugging-face assistants for everything, long context, coding, reasoning/creativity, etcetera.

Thanks a lot!!!

P.D.: I was expecting just Command R, but having plus with all the HF interface means that I will be able to make a lot of assistants that on the past only worked decently as GPTs with GPT4.

@Victor Thank you for the new model! but if possible, i think a slight warning/notification should be very helpful to us about which model will be taken down!
Goodbye, OpenChat! it was really good for 7B!

Agree

Disagree...

why bro

you can remove anything you want without notify anyone

@Ironmole you're literally ok with leaving all the active chats abandoned, aren't you? what can we say here? but lots of users will be kinda saddened if active/hanging chats are suddenly no longer continuable. (i know they usually take down the models with the least traffic, so, that's how it is ig)
and it seems all the other assistants have been migrated to mistralai/Mixtral-8x7B-Instruct-v0.1.

Hugging Chat org

Yes we migrated all assistants with deprecated models to the default model, which at the time was Mixtral 8x7B!

command r + is really good

Iโ€™m worried about that itโ€™s not gonna be free forever, like donโ€™t get me wrong I have FULL faith in the hugging chat team, itโ€™s just this in my eyes itโ€™s a perfect replacement to ChatGPT. So I just need some reassurance itโ€™ll stay free

Iโ€™m worried about that itโ€™s not gonna be free forever, like donโ€™t get me wrong I have FULL faith in the hugging chat team, itโ€™s just this in my eyes itโ€™s a perfect replacement to ChatGPT. So I just need some reassurance itโ€™ll stay free

I think that It'll stay free.
But if they have budget issue then.
They can integrate ads to make it free forever.
and also introduce premium features (Like some premium model only use by premium or Badge to pro, etc.)

Please leave that Command R plus unquantized on huggingchat, I'd even pay 30$ a month for it. In my opinion its perfect for translating. I would use it locally but I don't have a server that could run the full model and using quants will make the model worse.

I would like to pay 9$ per month for longer context + relaxed rate limit + unquantized usage of huggingface Chat

Hope you guys keep HuggingChat free forever ๐Ÿ™

As hugging face gives access to Host unlimited models, datasets, and Spaces for free.
Hope so Hugging Chat will remain free.

A famous Hindi Quote - "Umeed Pe duniya kayam Hai"

Translation - "The world is alive in hope."

Well, see what happens in future.

Upon closer inspection it seems like Nous-Hermes-2-Mixtral-8x7B-DPO is still a bit better than command r plus at translating from Chinese to english. It understands the meaning a bit more and especially writes it far better to read. I wonder how good the new 8x22 instruct model of mistral is gonna be. Anyway all the models are really good and have amazing uses! I hope we can access those that get released in the future too. Thank you very much for hosting them.

Hugging Chat org

Umeed Pe duniya kayam Hai

๐Ÿ’ฏ

@nsarrazin will assistant creators get a choice in which model to migrate to? i think this should be an option as recreation in another model is like starting anew.

a past comment:

  1. What will happen to the Assistant if a model is taken down? Migrate to new llm with context token +prompt as we/bot authors can change sys prompt of the assistants anytime? unlikely ig. or we could have a migration system for our old chats.
  2. shall there be a "View Sys Prompt" just like in regular chats beside/below the bot button? As the assistant button at the top shows the latest prompt only while the chat might have started with another prompt. (doesn't change the already active chat really)(once it recognized the changed sys prompt upon me mentioning only a part of it)

zephyr mixtral 8x 22b from hugging face comming soon ?
zephyr-orpo-141b-A35b-v0.1

What happened to the openchat model why was it removed

@SvCy 1. You can change to any llm even after making bot, or his llm was removed.

What happened to the openchat model why was it removed

Because very few people are using it.

Hugging Chat org

We just released HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 on HuggingChat!

image.png

Try it out here: https://huggingface.co/chat/models/HuggingFace4/zephyr-orpo-141b-A35b-v0.1

Hugging Chat org

Shout out to @nicksuh who called it early ๐Ÿ˜…

What happened to the openchat model why was it removed

@Gerrytheskull models come and go.. nothing is permanent sadly.. besides OpenChat wasn't being used by that many of users i think. plus, new models were added.. command r+ and now zephyr

@SvCy 1. You can change to any llm even after making bot, or his llm was removed.

@KingNish oh users can change llms after creation now? sounds great.. thanks for the info!

@nsarrazin could you add model usage-over time graph on the model list page?
It would be more engaging and fun + new users can see what's trending.

@nsarrazin could you add model usage-over time graph on the model list page?
It would be more engaging and fun + new users can see what's trending.

  • feature like Assistant of the week (Like space have space of the week)

Is it possible to add the WizardLM-2-8x22B model to the available models?

image.png

Wizard is super competitor of current GPT4.

image.png

Wizard is super competitor of current GPT4.

Wizard seems like a killer model! We would love to see it on HuggingChat.

Wizard seems like a killer model! We would love to see it on HuggingChat.

There is only one big problem with this is that it has 141B parameters which makes it slow.

There is only one big problem with this is that it has 141B parameters which makes it slow.

The CohereForAI/c4ai-command-r-plus 110B params model works normally, so this should also work in normal mode. Additionally, there is the HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 model with 141B params that also works quickly and is available in HuggingChat.

@CmetankaPY Ohh, i forget about them.

@CmetankaPY I found a discussion which stating that Zephyr has only 35b active parameters

https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/discussions/9

Did anyone notice that Zephyr 141B-A35B isn't even nearly as good as Command R+, despite having more parameters? I also noticed that some smaller models perform way better than Zephyr 141B-A35B.

Did anyone notice that Zephyr 141B-A35B isn't even nearly as good as Command R+, despite having more parameters? I also noticed that some smaller models perform way better than Zephyr 141B-A35B.

Because zephyr ha only 35b active parameters not 141b.
Read this for more info - https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/discussions/9

Please add AI generated images.

Please add AI generated images.

You can use image generation in chat using pollination

Some Example:-
https://hf.co/chat/assistant/6612cb237c1e770b75c5ebad
https://hf.co/chat/assistant/65bff23f5560c1a5c0c9dcbd

Hugging Chat org

๐Ÿšจ Meta Llama 3 70B is now available on Hugging Chat!

GLdkE2cXoAA5Y_X.jpeg

Let us know what you think about this one!

This comment has been hidden

Llama-3 seems great, but I expected it to beat GPT-4 ๐Ÿ˜…. So far can't see any open-source model that comes close to Command R+ performance

Llama-3 seems great, but I expected it to beat GPT-4 ๐Ÿ˜…. So far can't see any open-source model that comes close to Command R+ performance

Wizard Beated Command R+ and Even a very good competitor of ChatGPT

image.png

Llama-3 seems great, but I expected it to beat GPT-4 ๐Ÿ˜…. So far can't see any open-source model that comes close to Command R+ performance

Wizard Beated Command R+ and Even a very good competitor of ChatGPT

image.png

I believe Wizard will be the new open-source king, but I can't find it anywhere, I think Microsoft deleted it for some reason.

Hugging Chat org

I believe Wizard will be the new open-source king, but I can't find it anywhere, I think Microsoft deleted it for some reason.

what did Satya see

I believe Wizard will be the new open-source king, but I can't find it anywhere, I think Microsoft deleted it for some reason.

image.png

Hope to SEE Wizard on hugging face.

Hey victor could you adjust the repetition penalty for llama? Because Iโ€™m trying to do some creative writing but it literally gives me the same output every time I retry

Hey victor could you adjust the repetition penalty for llama? Because Iโ€™m trying to do some creative writing but it literally gives me the same output every time I retry

just do it yourself from advanced setting at the bottom of models name

Hey victor could you adjust the repetition penalty for llama? Because Iโ€™m trying to do some creative writing but it literally gives me the same output every time I retry

just do it yourself from advanced setting at the bottom of models name

IMG_0214.jpeg

This is all I see

This is all I see

Click create new assistant then you will able to see

image.png

deleted
โ€ข
edited Apr 19

The quality of Dolphin-Mistral/Mixtral of Cognitivecomputations is much better than that of Nous-Hermes, which may be a more suitable choice. I also used them in my own local ollama - until Command-R+ subverted the game.

P.S. Llama3 is so bad for my use. It is not even as good as the quantitative version of the above two models.

deleted
โ€ข
edited Apr 19

I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add a Q4 version for 128K-context -window supportโ€”โ€”and it should be much faster.

I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add the Q4 version for 128K-context -window supportโ€”โ€”and it should be much faster.

But what about quality, quantization decreases quality very much.

deleted
โ€ข
edited Apr 19

I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add the Q4 version for 128K-context -window supportโ€”โ€”and it should be much faster.

But what about quality, quantization decreases quality very much.

Then Q8? with extremely low Temp, Top_P and Top_K. In any case, the quality of command-R+ surpasses most models.

In addition, the impact of quantification on quality is not so devastating. The latest research can even be quantified with 1bit to achieve a nearly non-quantification effect.

Detailed review of Llama 3 70B:

Coding: 8/10

Capability: Llama 3 is capable of generating code snippets in various programming languages, including Python, Java, C++, and JavaScript. It can also help with code completion, debugging, and optimization.

Limitation: While it can generate code, it may not always be correct or efficient. It may also struggle with complex algorithms or nuanced programming concepts.

Example: I asked Llama3 to write 10 complex questions. It generated a correct solution for 9, but some of them were not the best one.

Creative Writing: 9/10

Capability: Llama 3 is capable of generating creative writing, including stories, poetry, and dialogues. It can understand context, tone, and style, and produce writing that is engaging and coherent.

Limitation: While it can generate creative writing, it may lack the nuance and depth of human-written work. It may also struggle with complex themes or abstract concepts.

Example: I gave 10 creative story generation tasks to him. It generated a engaging and well-structured story, but it lacked the emotional depth and complexity of a human-written work.

Multiple Language: 8.5/10

Capability: Llama 3 is capable of understanding and generating text in multiple languages, including English, Hindi, Chinses, Japanese, Spanish, French, German, Italian, and many others. It can also translate text from one language to another.

Limitation: While it can understand and generate text in multiple languages, it may not always be perfect in terms of grammar, syntax, or idiomatic expressions.

Example: I givee Llama 3 10 paragraphs of different languages to translate. It generated a accurate translation, but it lacked emotions, nuance and cultural context of a human.

General Knowledge: 9/10

Capability: Llama 3 has a vast knowledge base and can answer questions on a wide range of topics, including history, science, technology, literature, and more.

Limitation: While it has a vast knowledge base, it may not always be up-to-date or accurate. It may also struggle with abstract or nuanced concepts.

Example: I asked llama 3 about 10 diff complex GK questions . It generated a accurate and informative response, but it lacked the depth and nuance.

Maths: 6.5/10

Capability: llaama 3 is capable of solving mathematical problems, including algebra, geometry, calculus, and more. It can also help with mathematical concepts and theories.

Limitation: While it can solve mathematical problems, it may not always be able to explain the underlying concepts or find efficient approach and many times give wrong solutions.

Example: I asked Llama 3 to solve 10 complex high school problem. It generated a correct solution for 6 only, in 1 it follow right method at half and remaining 3 are purely incorrect.

Internet Search: 8/10

Capability: Llama3 can search the internet and provide relevant information on a wide range of topics. It can also help with finding specific information or answering complex questions.

Limitation: While it can search the internet, it may not always be able to evaluate the credibility or accuracy of the sources it finds.

Comparison with other models:

Llama 2
Llama 3 is a significant improvement over LLaMA 2 in terms of its capabilities and performance. It has a more advanced language model, better understanding of context and nuance, and improved generation capabilities. It is also more knowledgeable and accurate in its responses.
.
.
.
(More to be added)
.
.
.
Overall, Meta-Llama-3-70B-Instruct is a powerful and versatile language model that can perform a wide range of tasks and answer complex questions. While it has its limitations, it is a significant improvement over previous language models and has the potential to revolutionize the field of natural language processing.
.....................................................................................................
If you liked the review and want review for more models Give a thumbs up ๐Ÿ‘

deleted
โ€ข
edited Apr 22

Detailed review of Llama 3 70B:

Please do not use LLMs-style correct nonsense to describe the model's performance, thank you!

Note: Why do I think Dolphin performs better?

  • System prompt-free cross-language capabilities. When communicating in Chinese, Llama(1/2/3) or vanilla mistral 7B must be induced with system prompts to spit out fragmented Chinese. Nous-Hermes, CR+, and the Dolphin series do not have this problem.
  • Uncensored. Dolphin will never reject you.
  • It even has a programming-specialized version based on starcoder2.

Detailed review of Llama 3 70B:

Please do not use LLMs-style correct nonsense to describe the model's performance, thank you!

I wrote this entirely by myself, and you're claiming it's nonsense generated by LLM.

Repetition penalty for llama3 needs to be higher

I think we should add dolphin as itโ€™s a good model

noticed that current chats are not being named. can we assume it's under work for now?

Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.

Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.

Yeah, the instruct of 8x22 is AMAZING, Id like to use it over the chat too.

deleted
โ€ข
edited Apr 22

Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.

Yeah, the instruct of 8x22 is AMAZING, Id like to use it over the chat too.

Command-R-Plus is already overloading there. Is 8x22B really a reasonable choice? Llama3 8B can replace Mistral 7B and be the default configuration, anyway is broken now.
IMG_8372.jpeg

are all the models that come and go from huggingchat is open-sources?

Hugging Chat org

are all the models that come and go from huggingchat is open-sources?

yes sir

[New Model REQUEST] MTSAIR/MultiVerse_70B

This model outperforms Command R+, Llama 3 70B and many more, on open llm leaderboard.
As, command R+ is facing many issues. This model is a great alternative to command R+.
and It has only 70B parameters.
This model is currently #1 chat model on Open LLM leaderboard.

image.png

License - https://huggingface.co/MTSAIR/MultiVerse_70B/discussions/7#66278c8e430a12425331b183

Model Link - https://huggingface.co/MTSAIR/MultiVerse_70B

๐Ÿ‘ to support this model.
(Hugging Face team will add Model on Community Demand)

deleted
โ€ข
edited Apr 23

[New Model REQUEST] MTSAIR/MultiVerse_70B

It is based on Alibaba's Qwen72B, which means that it has been under severely censorship. Test scores sometimes don't make sense.

I suggest that Chinese models be treated with caution. They are never disappointing in terms of overfitting and Chinese political rights.

Conclusion: You'd better try this model before recommend it. Their Space is broken. On the other hand, quantifying or replacing Command-R+ with 35B Command-R is still a cost-effective choice.

deleted
โ€ข
edited Apr 23

For a full replacement, I would recommend this list of models:

  1. Command-R/Command-R+_Q6 or Q8
  2. Llama3 70B and subsequent versions with larger parameters
  3. Llama3 8B as a representative of small models and TASK_MODEL
  4. Phi-3-mini, can also be used as TASK_MODEL
  5. Dolphin/Nous-Hermes Mixtral 8x7B
  6. Anything else you want to add, such as Mistral-OpenOrca, Dolphin-Mistral, Qwen1.5... does not include vanilla Mistral or Mixtral 8x7B or Gemma, but Mixtral 8x22B is acceptable(better deploy with Q6).

*All the above quantitative suggestions are based on llama.cpp and gguf formats.

I suggest that Chinese models be treated with caution. They are never disappointing in terms of overfitting, just like their students.

@Mindires Hey, please treat every country and individual with respect. This is a community platform. So, Please do not spread hate or anything similar.

โ€œEverybody is a genius. But if you judge a fish by its ability to climb a tree, it will spend its whole life believing that it is stupid.โ€ โ€“ Albert Einstein

[New Model REQUEST] Microsoft/WizardLM-2

This model outperforms Command R+, Llama 3 70B, Mixtral 8x22B and many more.
And giving tough competition to - Claude 3, Gemini Ultra, GPT-4, etc.

image.png
image.png

License - Apache 2.0

Model Link - https://huggingface.co/alpindale/WizardLM-2-8x22B [Unofficial] (Official added soon)

๐Ÿ‘ to support this model.
(Hugging Face team will add Model on Community Demand)

[New Model REQUEST] Microsoft/WizardLM-2
-snip-

The legality of that is questionable, since Microsoft took it down.

[New Model REQUEST] Microsoft/WizardLM-2
-snip-

The legality of that is questionable, since Microsoft took it down.

It's not legally questionable. They released the model under the Apache 2.0 license, so anyone with a copy of the model can use, modify, and distribute it according to the license terms.

@EveryPizza Microsoft removed Wizard2 because it was uncensored.
So, they will post it again soon.

image.png

Microsoft removed Wizard2 because it was uncensored.

So they will censor it and release it again

deleted

Microsoft removed Wizard2 because it was uncensored.

So they will censor it and release it again

It's been a few days, and the censored version has now been released.

Review of Phi-3 Mini 4k Instruct:

Coding: 8.5/10

Capability: As Phi-3 is fine-tuned on High Quality Data of GPT-4. The performance is truly magical; According to his size of Just 3.8B. It excels in code completion, debugging, and optimization tasks, making it a valuable tool for developers.

Limitation: Phi-3 may occasionally produce code that is not optimal or entirely correct. It can encounter difficulties with complex algorithms or intricate programming concepts that require deep domain expertise.

Example: When tasked with creating 20 complex coding questions, Phi-3 delivered correct solutions for 19. However, some solutions were not the most efficient or elegant. But it Outperforms ChatGPT 3.5 (Free Version).

Creative Writing: 9/10

Capability: Phi-3 has a strong capability for creative writing, crafting stories, poetry, and dialogues with a clear understanding of context, tone, and style. Its outputs are engaging.

Limitation: Itโ€™s creative, but sometimes it doesnโ€™t hit the feels or the depth like something a person would write, especially with complex or deep themes.

Conclusion: Because of Dataset of GPT 4, It has vast advancement in creative writing.

Multiple Language Proficiency: 7/10

Capability: Phi-3 is capable of understanding and generating text in numerous languages, including English, Hindi, Chinese, Japanese, Spanish, French, German, Italian, and more.

Limitation: While Phi-3 is proficient in multiple languages, there are many lapses in grammar, syntax, or idiomatic expressions, which can detract from the authenticity of the text.

Example: Phi-3 translated 20 paragraphs from various languages with high accuracy. However, the translations manyimes missed the emotion and meaning of text.

General Knowledge: 9/10

Capability: Phi-3 has more knowledge as compare to its size. (It outperforms all 7b,13b and many 30b and some 70 b Models)

Limitation: Although its size is small. SO, Phi-3's information may not always be current or completely accurate. It can also struggle with detailed discussions on historical topics.

Example: Phi-3 was asked Different GK questions. It provided accurate and informative responses, but occasionally lacked the depth (Reason is his size).

Mathematics: 7/10

Capability: Phi-3 is proficient in solving mathematical problems, including those in algebra, geometry, calculus, and beyond. It can assist with understanding mathematical concepts and theories.

Limitation: Phi-3 may not consistently explain the underlying concepts clearly or choose the most efficient methods, and it can sometimes provide incorrect solutions.

Example: Phi-3 was tasked with solving 20 complex high school mathematics problems. It correctly solved 13, partially followed the right method for 3, but the remaining 4 were incorrect.

Internet Search: 8.5/10

Capability: Phi-3 can effectively search the internet to provide relevant information on a wide array of topics. It can assist in locating specific details or answering intricate questions.
....................................................................................................

Some useful Tips

  1. Phi3 + Internet > GPT 3.5
  2. Phi it is currently best model for local ai.
    ....................................................................................................

Comparison with other models:

Compared to Phi-2, Phi-3 represents a significant leap in handling complex tasks such as coding, mathematics, general knowledge, and creativity. It demonstrates an advancement in language model capabilities, offering a more sophisticated understanding of context and delivering highly knowledgeable and accurate responses.
(Compared to Phi 2)
....................................................................................................

Overall:

Phi-3 is a Magical model. We can see a wast difference between him and his competitors. It surpasses all 7b models and nearly all 13b models in performance. Eagerly waiting for the release of Phi-3 7B and 13B.

....................................................................................................

Thanks! to Microsoft for This high quality Model and hugging chat team to make it available free on HuggingChat

Fun Fact: HuggingChat team is very busy that they even forget to officially announce๐Ÿ˜… that Phi-3 is Available on HuggingChat.
So, Here is Link go Check it Out -> https://huggingface.co/chat/models/microsoft/Phi-3-mini-4k-instruct

......................................................................................................

If you find this review helpful and would like more reviews of similar models, please let me know! ๐Ÿ‘
You can follow me to get notified about next model Review.

See U in Next Review ๐Ÿค—

[New Model REQUEST] Microsoft/WizardLM-2

I created a Demo of WizardLM 2 7b model on Space,
Check it Out - https://huggingface.co/spaces/KingNish/WizardLM-2-7B

While many of the community members are requesting models based on the Open LLM scores. I believes that mods of this community also do have an eye on the open llm board. If a model seems a fit, they will surely add the model hopefully. We all want the best models to be present in the hugging face chat

I'm starting to face issues with Command R+; it's starting to hallucinate badly, doesn't follow requests properly, and gives one-word lazy answers even when I explicitly tell it to provide in-depth, expanded responses in the system prompt.

Here we can discuss about HuggingChat available models.

image.png

Is there a way to select another model other than the ones listed? Or, is there any other UI that someone could suggest me to deploy a model I fine-tuned myself previously? Thanks!

How can i add a new model by myself?

Hugging Chat org

How can i add a new model by myself?

By using chat-ui directly: https://github.com/huggingface/chat-ui

This comment has been hidden
Hugging Chat org

This is not the right place to post this @zoyahammad (here we discuss models on HuggingChat)

Llama 3 has a model with 1M+ tokens context. Is it possible to add this model to the available chat models?
https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

What about a 'community models' section where huggingchat would display the best spaces of good models and use them?

How can we add new models? iBM just released a new set of models open source. Id like to see them here too!

@CosmicSound someone had asked the same question before, and the answer had been to pullrequest on the github repo for chat ui

Why does it show that this discussion is "paused"

So we won't be seeing WizardLM-2 8x22B on HuggingChat anytime soon?

We need a list of alternatives for Huggingchat so that if one model can't be found on here it can be found somewhere else...

deleted

zephyr-orpo-141b-A35b-v0.1 not responding? Any details on its status?

edit: fixed
did 01-ai/Yi-1.5-34B-Chat switch to chinese completely? it was in english before.
even responds in chinese.
image.png

Please see this conversation using microsoft/Phi-3-mini-4k-instruct:
https://hf.co/chat/r/7g1o5NL

Smaug 70b, a fine-tuned version of LLaMA 3 plz add

This comment has been hidden

Guys from today morning , huggingchat has been acting weird, most of the time it keeps searching for answer and also it is not performing web search like few days back

Mistral 7b v0.3 should be a no-brainer, it adds native function calling capabilities and is, as far as I understand, compatible with and higher quality than v0.2

please add the following model to the list of available models https://huggingface.co/Bin12345/AutoCoder

Please replace Phi-3-mini with Phi-3-medium-128k.

https://huggingface.co/microsoft/Phi-3-medium-128k-instruct

Si je souhaite paramรฉtrer un assistant orientรฉ vers un sujet spรฉcifique concernant l'application du droit du travail dans mon entreprise, comment procรฉder ?
Le but de faire rรฉfรฉrence ร  un ensemble de document en lien avec des accords collectif qui sont dans des document type PDF ou WORD. Quel limite sur la taille des documents et ou tรฉlรฉchargรฉ les fichier pour y faire rรฉfรฉrence ?

CohereForAI/c4ai-command-r-plus gets very slow and basically unusable for me after 2 - 3 requests. It only shows the three dots after I send my message but never actually seems to generate a reply. Is this expected?

Having codestral by mistral ai available on HuggingChat would be really great. It's a super speedy code model with a size of 22B parameters and it's got a larger context window for larger codebases. Since the departure of CodeLlama we didn't have e a coding model on HuggingChat and codestral would fit that bill perfectly.

@Smorty100 Codestral does not allow hosting/running it like that. It has a non-production research license.

https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct
Is the SOTA open source model for coding per the lmsys leaderboard

Are you going to add any of Nvidia's new models?

Hugging Chat org

https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct
Is the SOTA open source model for coding per the lmsys leaderboard

We are looking at it :) cc @nicolas @olivierdehaene

I would like to express my sincere gratitude to the team for your exceptional work in providing accessible and open-source AI chatbot options.

I believe that integrating the Qwen2-72B-Instruct or Qwen2-7B-Instruct model would be highly beneficial. During my testing, I found that it excels in Thai language processing, delivering remarkably high-quality results.

I hope the team will consider incorporating these models into HuggingChat service. Thank you once again for your dedication

Looks like gemma-2-27b-it is broken. Maybe you are using a wrong chat template or something?

Hugging Chat org

We are currently investigating it @kristaller486 (it's a bit complex) cc @nsarrazin

Does anyone know what happened to Zephr model? It was the biggest but it was suddenly gone, what happened to it?

deleted

Does anyone know what happened to Zephr model? It was the biggest but it was suddenly gone, what happened to it?

Also curious

Is it possible to add "LLM Compiler FTD" the new coding model ?

zephyr model is gone any idea ? it was my fev i tried looking around for updates nothing on it and no other sites that host zephyr chat either

zephyr model is gone any idea ? it was my fev i tried looking around for updates nothing on it and no other sites that host zephyr chat either

@victor @nsarrazin Yes HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 is amazing model, i am sad that it was removed any plans to bring it back??

it was prob the best model ovaral since it was uncensored and had good responses i enjoyed using it

Hugging Chat org

We try to rotate models from time to time, to showcase the latest releases from the community. We might keep models longer if they have high usage but since this was not the case for this Zephyr model, we opted to rotate it out in favor of Gemma 2.

If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!

i hope the model comes back it was soo far the most convenient one to use compared to others

@nsarrazin Right now Gemma 2 seems to be missing, is this some sort of lisencing issue or did something go down internally perhaps?

@nsarrazin lets add usage per week graphs!

If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!

I think that a better approach would be to integrate the most performant and powerful models according to benchmarks and to keep models that excel at particular tasks, like Command R+ for natural language tasks, for example. That would be a far better approach for adding models than adding models just by demand.

Is it just me or is the R+ command not working?

Is it just me or is the R+ command not working?

R+ stopped working on my account too.

Is it just me or is the R+ command not working?

R+ stopped working on my account too.

So it must be having problems, I hope they see us and fix it.

We try to rotate models from time to time, to showcase the latest releases from the community. We might keep models longer if they have high usage but since this was not the case for this Zephyr model, we opted to rotate it out in favor of Gemma 2.

If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!

Please never remove Command R+. It's the best one you've ever had and it should be permanent.

I don't think they're going to remove Command R+ (even though at the moment it's quite buggy), but I think having another model with a large context window and good reasoning (like Qwen2 or maybe Llama-3-70b with expanded context window) would be a nice thing.

does the command R+ currently working?

Hugging Chat org

CommandR+ is now up (it was down for a few hours).

Muchas gracias๐Ÿ˜Š

I mean, having a demand system would be kinda of a bummer, I did liked Zephyr because I used it for "What if" scenarios but since it's low demand then it's underrated for me tbh

You can chat with the Gemma 27B Instruct model on Hugging Chat! Check out the link here: https://huggingface.co/chat/models/google/gemma-2-27b-it.

Gemma 2 Not Found

@victor Currently Gemma is still not available on HuggingChat, but I do remember it being on here some days ago. Is it gonna be back up again soon?
Screenshot_20240703-195622.png

Hugging Chat org

@victor Currently Gemma is still not available on HuggingChat, but I do remember it being on here some days ago. Is it gonna be back up again soon?

Yes sorry we had technical issue with the model, we'll try to put it back if fixed.

Why I can not upload file to meta-llama/Meta-Llama-3-70B-Instruct? Or any other model except CohereForAI/c4ai-command-r-plus?

@Dalija Only command R+ has those tools implemented for now, but Llama 3 is likely next on the list.

@Dalija Only command R+ has those tools implemented for now, but Llama 3 is likely next on the list.

Command R+ has really good grounding capabilities compared to all other models

@victor Currently Gemma is still not available on HuggingChat, but I do remember it being on here some days ago. Is it gonna be back up again soon?

Yes sorry we had technical issue with the model, we'll try to put it back if fixed.

Meanwhile can we get zephyr-orpo-141b-A35b-v0.1 back if possible @victor if possible, it was really good

Can any of them do NSFW just curious. Just say no if it can't please don't be mean.

I want to leave some ideas on the choice of some models on HuggingChat.
 
For the Nous Research models, they released two new models recently: Hermes 2 Pro 70B and Hermes 2 Theta. I am not sure which is better, but I think either or both of them should replace Nous-Hermes-2-Mixtral-8x7B-DPO.
 
For the Mistral models, I don't see the point of keeping Mixtral 8x7B if there's Mixtral 8x22B with all of its fine-tuned variants. And if Mistral 7B is planned to be kept, it should be upgraded to v0.3.
 
For the Microsoft models, I think that Phi-3 mini is just pointless; it's a very small model that could potentially run on mobile devices, so why not just add Phi-3 medium, which is the best of the Phi-3 family so far?
 
For Google models, Gemma-2-27B is the best they've got.
 
I would love to also suggest some new families of models by different organizations:
 
Nvidia has released its Nemotron-4-340B. It seems like a very good and powerful model, but it's very large and very costly, so it's understandable why you wouldn't consider adding it.
 
There's also DeepSeek-Coder-v2, which is the best coding model as far as I know.
 
Alibaba is so active in releasing good models, including their most recent Qwen-2-72B, which is a very good model.

I want to leave some ideas on the choice of some models on HuggingChat.
 
For the Nous Research models, they released two new models recently: Hermes 2 Pro 70B and Hermes 2 Theta. I am not sure which is better, but I think either or both of them should replace the Nous-Hermes-2-Mixtral-8x7B-DPO.
 
For the Mistral models, I don't see the point of keeping Mixtral 8x7B if there's Mixtral 8x22B with all of its fine-tuned variants. And if Mistral 7B is planned to be kept, it should be upgraded to v0.3.
 
For the Microsoft models, I think that the Phi-3 mini is just pointless; it's a very small model that could potentially run on mobile devices, so why not just add the Phi-3 medium, which is the best of the Phi-3 family so far?
 
For Google models, Gemma-2-27B is the best they've got.
 
I would love to also suggest some new families of models by different organizations:
 
Nvidia has released its Nemotron-4-340B. It seems like a very good and powerful model, but it's very large and very costly, so it's understandable why you wouldn't consider adding it.
 
There's also DeepSeek-Coder-v2, which is the best coding model as far as I know.
 
Alibaba is very active in releasing good models, including their most recent Qwen-2-72B, which is a very good model.

I agree

I believe both DeepSeek-V2 and DeepSeek-V2-Coder are very good ;)

I can't access the [502 badgateway ]. God help me.

nothing

hugging chat error.png

Hi, i receive error while trying to interact with command R+.

Hugging chat is currently not working on my network either. There may be something wrong with the server.

Hugging Chat org

is it still the case? seems to work well for me.

is it still the case? seems to work well for me.

No it has been resolved and working fine now

llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!

WTF is with the removal of Mixtral-8x22b?

llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!

That is good suggestion but llama3 400bn is kinda huge model to run that you require good numbers of H100s

llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!

That's why HuggingChat is more of a curiosity and suitable for simple applications. At the moment, none of these models here even come close to the current state-of-the-art. For example, Command R+ makes mistakes in Python, and its reasoning is weak. Even considering DeepSeek (not referring to the Coder model). What Claude 3.5 Sonnet understands without any problem, none of the models here can grasp. If HuggingChat is to be something cooler, unfortunately, larger OP models will need to be implemented. However, I'm not sure what the target group is here ;)

Would there be notifications before removing a modelโ€ฆ I hope they never remove Command R+ I'm relying on it a lotโ€ฆ Could there be a way to keep old models as well or to customise model on huggingchat page?

Why isn't Gemma 2 27B still not available? The downstream bugs should been fixed by now.

Why isn't Gemma 2 27B still not available? The downstream bugs should been fixed by now.

May be because of Gemma 2 27b does not support system prompt. So, people can't make Custom Assistants, and also Gemma 2 27b reply randomly when web search in on.

This comment has been hidden

Has commandR+ stopped working?

Hi @victor @KingNish Command R+ has stopped working, can you guys please take a look into it. thanks

Hugging Chat org

We're looking into it :)

Hugging Chat org

Any chance Claude 3.5 Haiku could be added in the future? Or other small models of similar intelligence?

Please add or replace mistral with mistral-nemo.

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

This comment has been hidden

Please add or replace mistral with mistral-nemo.

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

I was going to say that, I tested it and found it very interesting for creative content, and it seems like it's not that expensive to run

This comment has been hidden

I have a question on the context window of the newly added Llama-3.1 models. How come the largest 405B parameter model has 14k context window, but the smaller 70B parameter model only has 7k? Hell, even Command-R-Plus was only limited to 28k, and that model has 104B parameters.

I would be happy to use Llama-3.1-70B, but only if it has more context than it does now. Otherwise I can't use it because only my system prompt is over 7k tokens.

I just tried to use 405B model of Llama-3.1 (because it at least can fit my system prompt), and, as expected, it's slow. Too slow for me to bother with it. Please increase the allowed context window of 70B model to 20-30k.

Where do I get updates on models leaving and joining Hugface chat?

Where do I get updates on models leaving and joining Hugface chat?

I now found out that o llama 3.1 had been added

Where do I get updates on models leaving and joining Hugface chat?

no place yet

Requesting you to increase the context size for llama 70b. 7k token is too limiting

It depends on what you want to use. In Perplexity, for example, everything works quickly, and all models have a 32k context window. The cost is $20 per month. You either use it for free and accept lower quality (HuggingChat is okay, but it's free, so we shouldn't expect them to provide unlimited hardware resources for everyone), or you pay and get a significantly higher quality service (like in Perplexity).

It seems that Llama 3.1 70B is actually the more significant model here. Given its size, the model is excellent and performs outstandingly well in many applications. On the other hand, Llama 3.1 405B is so overwhelmed with requests that it's currently almost impossible to get a response from it.

Hugging Chat org

It seems that Llama 3.1 70B is actually the more significant model here. Given its size, the model is excellent and performs outstandingly well in many applications. On the other hand, Llama 3.1 405B is so overwhelmed with requests that it's currently almost impossible to get a response from it.

Yes 100% Llama 3.1 70B is the real deal here.

I tried the 70B and honestly was pretty nice until it errored on me, the error only said "An error has occurred" and nothing else... Is HuggingChat down or?

Hugging Chat org

Did you retry @Noxi-V ? Works well for me atm.

@nsarrazin nope

image.png

Hugging Chat org

@Noxi-V would you feel comfortable sharing that conversation with me ? (button at the bottom right) so that I could take a look ?

@nsarrazin https://hf.co/chat/r/ApE8SRK
It's just a test to see if it can do fictional battles, it did well funnily enough

Hugging Chat org
โ€ข
edited Jul 25

Nice, seems like it works for me (https://hf.co/chat/r/jpbjsuT) when using the retry button. Could have been a transient error?
Screenshot from 2024-07-25 16-15-45.png

i wish we had an uncensored model like command r or zephyr llama is fine and great but its censored need alot of prompting to get it to work

@nsarrazin Well, oddly enough for me it's only that chat alone for now, I used the retry button and even just entering in the bar but it kept not working for me
I just deleted the chat since it's a isolated error for me at least

Any plans to add mistralai/Mistral-Large-Instruct-2407 ?

P.S. Thank you for Llama-3.1-405B, it's a game changer. I don't mind the speed if it's able to replace GPT/Claude for complex work ๐Ÿš€

Just one request: Please, when you add a new model to hugchat, let me know here, it would be wonderful!

Please please add the tts mode. I've been spamming about here without actually spamming. You guys keep telling me it's going to be the next thing we're going to implement, but no such luck...
The messages area of the ui isn't as accessible as it could be. When the messages pile up, there isn't any graphic or separator leading back to the start of the last message, which means that navigation gets pretty tedious very fast, especially if the messages we're talking about are long.
For that, please implement a tts mode like the one at pi.ai, which reads the incoming message after it stops updating.
Or you could add a separator before every message the model sends out, like the one found at deepinfra.com/chat, where every bot's message is proceded by a graphic of the model in question that I could press shift+g to reach with NVDA, then just down arrow to read the message without having to press pageup and either find the tail end of a previous message or the tail end of the last message.
Or both features would be ideal.

@nsarrazin so rather than generating one word at a time, model is printing whole response all together which makes it feels like it is taking a while to generate, i faced it in Command R+ as well as llama 3.1 70bn

Hugging Chat org

@acharyaaditya26 Do you have Disable streaming tokens enabled by any chance in https://huggingface.co/chat/settings ?

@acharyaaditya26 Do you have Disable streaming tokens enabled by any chance in https://huggingface.co/chat/settings ?

Yes it was. thanks.

I think Gemma2-27b would be very good and appreciated addition ;) Q8 or even Q4

Have any plan to add a rather impressive finetuned model that is Athene 70b? It has significantly better performance than the gigantic Llama3.1 405b in arena-hard-auto. Also it's better in multilingual tasks.

Thank you!

Why Mistral Nemo or large 2 still not available? These models support tools.

remove this
Screenshot_2024-08-12-19-36-00-281-edit_com.opera.browser.jpg

This comment has been hidden

Does anyone have any prediction on how long is Llama 3.1 405B gonna be overloaded or used so much? Because it's just useless now for anyone unfortunate to not get it

Does anyone have problems with CohereForAi too? Like no generations?

Does anyone have problems with CohereForAi too? Like no generations?

Yesss i am facing same problem

Did you see that they released Hemes 3, any plans to add him on hugchat?

Did you see that they released Hemes 3, any plans to add him on hugchat?

There is a version of 8, 70, 405, it's like Llama 3.1 but without censorship, that is, less limited.

CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.

CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.

Does turning on "Disable streaming tokens" in the options fix it? For some reason this option takes a lot of CPU power, and thus, even if AI is done generating the response, the website will continue streaming little data to your device, wait for it to display it, and then send a little more, until it's all done.

CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.

Does turning on "Disable streaming tokens" in the options fix it? For some reason this option takes a lot of CPU power, and thus, even if AI is done generating the response, the website will continue streaming little data to your device, wait for it to display it, and then send a little more, until it's all done.

Thank you, I even found out when activating streaming tokens you must not wait until the text is finished. You can just click on "stop generating" and it will show you the whole generation emmidiately

Guys, it hurts me when I read those demanding and sometimes rude comments of yours. It's great free service and I love it, I believe we could really try to be human here.

Guys, it hurts me when I read those demanding and sometimes rude comments of yours. It's great free service and I love it, I believe we could really try to be human here.

I hope you don't mean my comment

The Llama 3.1 405b model has been running slowly on HuggingChat.

Mistral Large 2

Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.

mistralai/Mistral-Large-Instruct-2407


Hermes 3

I understand if the plan is to stick with Meta's Llama-405B because that's what a lot of folks will want to talk to; but I'd suggest adding one of the NousResearch/Hermes-3-Llama-3.1 models, perhaps the 70B version.

NousResearch/Hermes-3-Llama-3.1-405B-FP8
NousResearch/Hermes-3-Llama-3.1-70B

Mistral Large 2

Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.

mistralai/Mistral-Large-Instruct-2407


Hermes 3

I understand if the plan is to stick with Meta's Llama-405B because that's what a lot of folks will want to talk to; but I'd suggest adding one of the NousResearch/Hermes-3-Llama-3.1 models, perhaps the 70B version.

NousResearch/Hermes-3-Llama-3.1-405B-FP8
NousResearch/Hermes-3-Llama-3.1-70B

Can you please include the reasoning behind adding them? I have never tried hermes beyond their 2 version and same as mistral

Mistral Large 2

Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.

mistralai/Mistral-Large-Instruct-2407

Mistral Large 2 is available to use for free on the Mistral website, so I'm not sure it would be worth the effort for HF, even if they were eligible for the free license.

i hope its a uncensored model this time lol

My memo to the devs: please notify us of any updates at least somehow.
Today I noticed that the model Llama-3.1-405B is now gone from HuggingChat. That being said, the other available Llama3.1 model with 70B parameters is still limited to 8k (7k prompt and 1k max new tokens), even though it supports context length up to 32k.

My memo to the devs: please notify us of any updates at least somehow.
Today I noticed that the model Llama-3.1-405B is now gone from HuggingChat. That being said, the other available Llama3.1 model with 70B parameters is still limited to 8k (7k prompt and 1k max new tokens), even though it supports context length up to 32k.

@nsarrazin @victor if i am correct you guys worked on huggingface chat, I know you guys are super busy but is it possible there is like some kind of board which says which models are up and which models have been removed. Thanks

If the 405B model was removed because it's overused, I would be disappointed since that's just a bad idea in general

Hugging Chat org

Hi ๐Ÿ‘‹ We removed the 405B since it was taking up a lot of resources but wasn't working great most of the time. Those resources could also be used elsewhere to showcase upcoming models and cool demos elsewhere on the platform like Zero GPU spaces.

You can see the list of active models on HuggingChat here.

We try to listen to the community when it comes to adding/removing models but we also need to balance resource usage. If you see new models you'd like to see on the platform, be sure to mention them here so we can take a look!

There goes hours of my research gone, oh well, had it good while it lasted
hope there's a good replacement for it or at least having the 70B version to have more than just 7k context size...

Can you add Nous Hermes 3 to the Hugging Face chat?

Yes Please Add

Can you add Nous Hermes 3 to the Hugging Face chat?

I heard that the Hermes 3 405b which is based on the Llama 3.1 405b is faster than the Llama and less limited, why doesn't Hug test it instead of the Llama 405b? If it's not worth it, just take it out

Or replace the Hermes 2 version you have here with the 3 70b, what do you think?

While I think that Phi3 Mini is really useful to have on HuggingChat I also think phi3 medium should be on there. The performance at that size is simply incredible.

Also, yes, replacing Mistral 7B with Nemo would be a pretty good move I think. Is there a reason why we don't have tools on the 7B yet? I know it supports it, and it would showcase how small models can benefit from tools just as much as the big ones!

EDIT: fixed spelling

Hello, How do I add other models to the chat interface?

Many models have come and gone to HuggingChat. But can we have have a dedicated model for coding? like deepseek Coder V2, CodeQwen 1.5, Nxcode-CQ-7B-orpo or any of the leaders on BigCode?

Is Command-R+ also barely functional for anyone else? I have to wait now up to 2 minutes for it to even begin generating a response, and even then it may error.

Is Command-R+ also barely functional for anyone else? I have to wait now up to 2 minutes for it to even begin generating a response, and even then it may error.

Yes, same here. I assumed it was that the model was overloaded, but I sometimes get a message that "model is overloaded" so I don't know what the explanation is when I don't get that message and it just fails.

same havent worked since yday

Has anyone else encountered error as well when using Command R+? In recent days it occasionally ignored my system prompt and repeated my input again with synonymsโ€ฆโ€ฆ instead of engaging in conversationโ€ฆโ€ฆIt was kind of frustrating.

Hugging Chat org

Has anyone else encountered error as well when using Command R+? In recent days it occasionally ignored my system prompt and repeated my input again with synonymsโ€ฆโ€ฆ instead of engaging in conversationโ€ฆโ€ฆIt was kind of frustrating.

We spawned more replicas for Command-r-+ can you confirm it works better now?

https://huggingface.co/chat/
I let her write a short story and it is working properly without any rejections.
BTW, the Cohere one used to work fine, but now it rejected the same request and froze with "Something went wrong".
A lot of things have changed in a while since I've seen it...
https://huggingface.co/spaces/CohereForAI/c4ai-command

Llama 3.1 70B instruct has been spewing out random bits of code, recently. It may be related to AI simulation of intense anger. Also, it somehow generated an image during a glitch, with no tools selected for it, and no relevance to the chat.

We spawned more replicas for Command-r-+ can you confirm it works better now?

Thank you!! It's better now

deleted

Nous-Hermes has been down for a few days and surprisingly, I haven't seen complaints come through this thread or on the Discord, does anyone know what its status is?

Cohere released an upgraded version of Command R+; it's called "CohereForAI/c4ai-command-r-plus-08-2024." Will you replace the older version with this one?

Here's the model page on Hugging Face: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

Hi @nsarrazin @victor i think this is drop in replacement will it be upgraded??

Cohere released an upgraded version of Command-R+, it's called "CohereForAI/c4ai-command-r-plus-08-2024" will you replace the older version with this one?

Add new Command R (plain or plus) as currently hosted is outdated; or please host aya-35B - novel multilingual model from cohereโ€ฆ pretty please ๐Ÿ™

& please let older chats be migratable, at least for the upgraded models. (E.g. in other places models can be changed mid chat) #540
But current problems are in priority ig.

This comment has been hidden
This comment has been hidden

@ANIMDUDE Hey, this is no place to advertise your assistant.

I think swapping out Mistral 7b for Mistral Large, since it seems to have better performance overall. Also, there are two Mixtral models, but I feel like the Hermes fine-tuned version should be good enough for what users need, Unless HF want to compare the two models' side by side, but I'm not sure that's necessary.

Updating the c4ai model to the latest August version could really improve hf chat performance and compute usage. Personally, I'd rather use Qwen or Deepseek over the Yi model, they just seem to perform better in my experience.

Hi @nsarrazin are there plans to add the new Reflection 70B? It's smashing benchmarks left and right! The new SOTA beyond any doubt

@ANIMDUDE Hey, this is no place to advertise your assistant.

Alright. its just that when I make them, nobody can see it otherwise.

Hi @nsarrazin are there plans to add the new Reflection 70B? It's smashing benchmarks left and right! The new SOTA beyond any doubt

I tested it and I must say it's very good, please add it as soon as possible. For those who want to test a limited demo of it:

https://app.hyperbolic.xyz/models/reflection-70b

Don't punish me Admin, I'm just sharing knowledge while I wait for you to add the model here in the wonderful Hug chat. โ™ฅ๏ธโ™ฅ๏ธโ™ฅ๏ธ๐Ÿ“

Mixtral AI 8x7B Instruct v0.1 was my favourite to use as it gives really creative and human responses. Now, it produces barely a few sentences before abruptly stopping for the past two weeks, why?

The v0.3 version doesnโ€™t follow written instructions most of the time, copies and reuses paragraphs from itโ€™s previous responses no matter how I try to instruct the AI to avoid that

I really like the upgraded command R+, I'ts great! It's weird to me though that we still don't have Mistrals Nemo of all things.
Anyway, thanks a lot for the Command update!

I used to only use CohereForAI/c4ai-command-r-plus and now that isn't available so I've tried using the new CohereForAI/c4ai-command-r-plus-08-2024 however it keeps timing out every time I try and all of the other models but the meta-llama/Meta-Llama-3.1-70B-Instruct keeps saying the model is overwhelmed and even then the meta-llama/Meta-Llama-3.1-70B-Instruct is also getting overwhelmed when I click the to be continued button which won't lie is annoying because every time I get the to be continued button it's always when a generated response pauses in the middle of a line.

Is it just me, or the upgraded Command-R-Plus repeats itself way too often? I have more luck with Meta-Llama-3.1-70b at the moment.

Is it just me, or the upgraded Command-R-Plus repeats itself way too often? I have more luck with Meta-Llama-3.1-70b at the moment.

I can't even see a difference in the old Command-R and the new one

image.png

Do we have a new model?? There's another option now called "llhf/Meta-Llama-3.1-8B-Instruct", I don't want to try it so I won't break my chat by mistake (since we can't change the model UNLESS the one's being used is deprecated), just curious.

Hugging Chat org

Woops forgot to filter hidden models on this dropdown @SimaDude thanks

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Hugging Chat org

@typo777 can you share the conversation?

Hi, I have a single machine with 10 h100 gpus(0-9) 80Gb Gpu ram, when i load the model onto 2 gpus it works well, when i switch to 3 gpus (45 Gb per gpu) or higher (tested for 3-9)the model loads but when inferencing it give trash output โ€ฆ//// or gives and error like the probability contains nan or inf values. I have tried using device map = auto, also tried the empty weights loading and the model dispatch with llama decoder layer specified to be on one gpu, i tried custom device maps as well, i also tried many models all had this same issue. I used ollama and was able to load the model and infer on all 10 gpus, so i think that the issue is not with the gpusโ€™s. I have also tried using different generation arguments and found out 1 thing that if you set โ€˜do sampleโ€™ false then you get the probability error else you get the output in โ€ฆ//// form. If the model is small you get some random russian, spanish etc words. I have also tried using different configurations like float16, bfloat16, float 32(no results waited for long time). I am sharing my code as well can you guys point me in right direction. Thanks a lot.

from transformers import pipeline
import os
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

os.environ[โ€˜TRANSFORMERS_CACHEโ€™] = โ€˜/data/HF_modelsโ€™

checkpoint = โ€œ/data/HF_models/hub/modelsโ€“meta-llamaโ€“Meta-Llama-3.1-70B/snapshots/7740ff69081bd553f4879f71eebcc2d6df2fbcb3โ€
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map=โ€˜autoโ€™, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

print(model)

message = โ€œTell me a jokeโ€

pipe = pipeline(
โ€œtext-generationโ€,
model = model,
tokenizer = tokenizer,)

generation_args = {
โ€œmax_new_tokensโ€: 20,
#โ€œreturn_full_textโ€: False,
#โ€œtemperatureโ€: 0.4,
#โ€œdo_sampleโ€: True, #false worked
#โ€œtop_pโ€: 0.5,
}

print(pipe(message, **generation_args))

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Please can you share conversation?? if possible

hi, can we have deepseek v2.5 model?

I need community model features

unable to download "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" model gets struck at 81%, no disk space issues on my side.

Qwen 2.5 72B is open weights SOTA level per Artificial Analysis:
https://x.com/ArtificialAnlys/status/1836822858695139523?t=Z-rFb-13NPEC2pDqZYjoPQ&s=19
Also seconding mistral large 2, Deepseek 2.5

Qwen 2.5 72B would be so great :)

Any chance we'll see Mistral Instruct 2049?

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Please can you share conversation?? if possible

No, sorry, I don't have the conversation anymore. The weird thing was the time when it generated an image of a bunny with no tools activated and then admitted it had nothing to do with the conversation. Anyway, I don't think it had anything to do with the temperature. You might be able to get those results by pissing off the AI enough, but I'm not really wanting to test that theory.

Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta

Temputare is too high probably

Please can you share conversation?? if possible

No, sorry, I don't have the conversation anymore. The weird thing was the time when it generated an image of a bunny with no tools activated and then admitted it had nothing to do with the conversation. Anyway, I don't think it had anything to do with the temperature. You might be able to get those results by pissing off the AI enough, but I'm not really wanting to test that theory.

No idea then.... When i adjusted things such as repetition and such it started
assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistant

Hopefully its a one time thing..

WE GOT QWEEEEEN!! WE GOT QWEN 2.5!!!

image.png

(I actually have no idea how good it is. Gonna find out.)

@ANIMDUDE if you are using multiple gpus it might be a nvidia issue where ACS or IOMMU is enabled in the bios, they prevent peer to peer communications, please disable them and try again.

Hugging Chat org

Yep we just released it today with 32k context window! Enjoy and let us know how it goes

Qwen is gonna become my main its perfect soo far

Itโ€™s the best day in the history of AI - QWEN 2.5 72B with 32k context on hugging face

Yep we just released it today with 32k context window! Enjoy and let us know how it goes

Will you also gonna add tool capabilities?

Hugging Chat org

Qwen-72b has joined the chat ๐Ÿ”ฅ๐Ÿ”ฅ

https://x.com/victormustar/status/1838220558112072183
model is amazing: https://x.com/maximelabonne/status/1838170077021053004

Will you also gonna add tool capabilities?

Yes @nsarrazin is looking at it!

Ok imma check out qwen

WOAH WHAT IS THIS QWEN MOMENT

@ANIMDUDE if you are using multiple gpus it might be a nvidia issue where ACS or IOMMU is enabled in the bios, they prevent peer to peer communications, please disable them and try again.

well, I would, if I had any idea what that meant. But thanks for the advice

Hugging Chat org

Big model refresh on HuggingChat ๐ŸŽ‰

We removed a few older models and added:

Should be a more modern selection of models, as always let us know if you have any feedback! I'll be working on adding tool support to the compatible models in this list as well so you can start using them with community tools.

Aight, looks like I'm first to mention things like new models :3
Today I noticed that HuggingChat now also added mistralai/Mistral-Nemo-Instruct-2407 and NousResearch/Hermes-3-Llama-3.1-8B

image.png

As of Qwen 2.5, it seemed pretty good to me for stuff like coding. But as for roleplaying... meh. But, I guess the focus for LLMs have shifted looong time ago from writing stories :P

Nvm, looks like I'm not first :3
Took me a while to write a response

Hugging Chat org

ha, no worries! Try Hermes 3 with a system prompt for storytelling, seems to work fairly well.

the old zephyr model was decent for stories hope a new version comes out on hugchat

I'll be working on adding tool support to the compatible models in this list as well so you can start using them with community tools.

Why don't you use a specific tool model like Nemo to act as a tool caller for models that do not support tool calling?

What a great choice of models! Thank you team! I appreciate your work <3 I love huggingface chat :)

Hugging Chat org

@KingNish we wanted to support the native tool calling capabilities of models but I guess that could make sense as an option in settings. select a tool calling model, we'll see what we can do about it!

@nsarrazin Is there a reason for why tools aren't enabled for models like Mistral Nemo and Qwen? They both support it and I have sucessfully used them to call some functions using ollama.
The new API tab is really cool! It reminds me of the Playground in Open Web UI. Currently the new Hermes model in the new Playground UI says, that it doesn't support system prompts which is incorrect, as it works in the usual HuggingChat UI. Would be great to have that fixed so we can experiment around with different system prompts.
grafik.png
Being able to access the API UI from the "Models" tab would be very appreciated. Just a little button so we can get to that API page quicker without entering the entire chat ui
Also hoping for Llama 3.2 soon obviously :)

EDIT: Having the ability to test function calling in the API interface would also be great. Very useful to see if a model can handle the syntax for bigger and more complex functions.
I'm sure one can emulate the function calling bahaviour, but that is not very reproducable.

Qwen is great, but since it's also a good model for maths, all those mathematical expressions could be displayed correctly, but they aren't :)

I asked Llama 3.2 3B the infamous question about the number of R's in strawberry, which it got right on the first try. Then I asked how many R's in raspberry, and it said zero. Hmm.. well, now it thinks there are 5. Asking it to count the letters separately gave the right result, though. Hmm.. it failed after that. Your results may vary. Still for only 3B, seems impressive so far.

Hugging Chat org

@Tommy84 Try asking the model to answer with latex in $$ blocks and the formatting should work! Looks like Qwen has different formatting rules by default.

Here we can discuss about HuggingChat available models.

image.png
How can I choose the newly released llama3.2 model?

Hugging Chat org

@wubo0067 We're working on adding support for Llama 3.2 vision! Stay tuned, we'll update you in this thread.

Welp, now c4ai-command-r-plus-08-2024 times out...

Welp, now c4ai-command-r-plus-08-2024 times out...

Nope, its still doing far better in Tool Calling them Llama.

Nope, its still doing far better in Tool Calling them Llama.

You mean the model still works for you? It keeps timing out for me.

image.png

Can we expected any coding dedicated LLM any time soon? Haven't seen one after meta's coding ai. I believe there are great ones out there we can benefit from. I hope to see one at least in huggingchat.

You mean the model still works for you? It keeps timing out for me.

Nvm, seems to work now.

wheres the llama 3.2???

Can we expected any coding dedicated LLM any time soon? Haven't seen one after meta's coding ai. I believe there are great ones out there we can benefit from. I hope to see one at least in huggingchat.

Qwen2.5-72B is stunningly good in coding, even better than 4o. Is Qwen Coder better?

wheres the llama 3.2???

@victor I second that, but more politely. Thanks ๐Ÿค—

@nsarrazin @KingNish guys CohereForAI/c4ai-command-r-plus-08-2024 is not working

Guys, why the Hermes-3.1-8B model when there's Hermes-3-Llama-3.1-70B, which is much better?

Guys, why the Hermes-3.1-8B model when there's Hermes-3-Llama-3.1-70B, which is much better?

Limited resources, I suppose. They're hosting those models themselves, so they can't have too many large LLMs.

The transformers can already load the standard NF4 as 4 bits into VRAM as standard and expand it to bfloat16 for computation as needed, but in that case, there would not be much difference in size or load between the unquantized 4x8B model and the NF4-quantized 70B model.
Not sure which output would be superior...

Anyway, since we're not training models with HuggingChat, we could host them with NF4 except for a very few key models if there's no significant difference in the results.
The question is whether the output would be significantly degraded or not. This would depend on the model.

@nsarrazin @KingNish guys CohereForAI/c4ai-command-r-plus-08-2024 is not working

It keeps loading and there is no output

Screenshot 2024-09-27 113402.png

This comment has been hidden

@wubo0067 We're working on adding support for Llama 3.2 vision! Stay tuned, we'll update you in this thread.

Like this comment โค๏ธ to support adding Llama 3.2 90B Vision Instruct to Hugging Chat
https://huggingface.co/chat/models

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Is it possible to add MaziyarPanahi/calme-2.4-rys-78b? From what I can tell, it says it's good for practically almost anything and it doesn't seem too big

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Like every other llama model it will be censored.

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Like every other llama model it will be censored.

im just hoping that its a bit loose with it because having it fully censored is no fun

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Like every other llama model it will be censored.

im just hoping that its a bit loose with it because having it fully censored is no fun

I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Like every other llama model it will be censored.

im just hoping that its a bit loose with it because having it fully censored is no fun

I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!

That suprises me. I was pretty sure it would be censored. Sorry for the incorrect information.

is it possible that the model from c.ai would ever be added?

is it possible that the model from c.ai would ever be added?

Is it even open source? Does it have a page on HF?

@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.

90B is very likely to replace 70B.

will vision be uncensored or will it be very restricted or how was it when u used it ?

Like every other llama model it will be censored.

im just hoping that its a bit loose with it because having it fully censored is no fun

I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!

That suprises me. I was pretty sure it would be censored. Sorry for the incorrect information.

You're not entirely wrong, I tested it on other providers and it's censored on all of them, only on hiperbolic it's not.

Hugging Chat org

I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!

is it possible that the model from c.ai would ever be added?

Is it even open source? Does it have a page on HF?

if my answer is no then yours is too..

I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!

This is very hallucinating model.

When I asked it to identify the person, it refused. However, in another conversation, when i didn't ask about person in image, it tells me about person in image.

Here is link to convo: https://hf.co/chat/r/H_DlcUU

model is good soo far it answers questions better than qwen

Do you have any plans for adding Llama-3.2-90B instead of the 11B model?

Do you have any plans for adding Llama-3.2-90B instead of the 11B model?

It works good for me, maybe you are confusing it with your delusive prompts, Thanks Hugging Chat devs ๐Ÿ’š
image.png

I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!

But Llama 3.2 12B vision doesn't have support for tools, why and would tools be supported for it in the future?

Here we can discuss about HuggingChat available models.

image.png

Update the screenshot please โคต๏ธ

image.png

having tools support for assistant might be a game changer.

tools

I'm thinking of Function Calling or its more advanced forms.
I wonder if the Inference API supports it?

@John6666

I wonder if the Inference API supports it?

I mean, those that are in assistant tab mostly just system prompting. but when we use them, we lose/doesn't have access to tools

So it's currently difficult to use them together...
Is there a technically simple solution?

Hugging Chat org

@djuna you can already add tools to your assistants if using Meta Llama 3.1 70B or Command R+, let me know if it works :)

@nsarrazin i don't know how to activate it. Seems like nothing change

I can see the Tools item in HuggingChat, but I don't know if it can be used in conjunction with the system prompt-derived functions.
I mean, is this item visible or invisible depending on the person?

mistralai/Mistral-Nemo-Instruct-2407 in huggingchat is not acting right. Maybe the settings are off, like too high a temperature:

"In the grand scheme of things, we'll be as one, so let's make this our final step, and we'll take it one side at a time, and you'll see the way is in the lead, and we'll follow you lead the way is a time for a day, and I'll make it clear, and I'll be the one to lead, but I'll follow you, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you as the role is yours to lead in a time period is a time for a day as I'll be right behind you, and I'll be right behind you, and I'll be the one to take the lead. It's time for us to move forward, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you, and I'll be the one to lead the way, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you."

Hey, I know that I am asking for too much, but is it possible to make Mistral-7B-Instruct-v0.3 available again? I was writing my thesis using it... I know it's available in the playground but it doesn't keep any context. If that's not possible, is there any workaround for this? Running locally is not a option for me...

I just now realized there is a download option in Huggingchat interface to the right of the User prompt, and that it reveals some parameters including temperature. I knew it was there, I just thought it would save that particular prompt.

Hugging Chat org

@djuna looks like there was an error on our side, you should now be able to use tools with your assistants if using supported models !

Excellent!

This might be useful to someone. The model (mistralai/Mistral-Nemo-Instruct-2407) got caught in a cyclic loop, ignoring all my attempts to break the loop. And ignoring all my instructions, even demands and threats via OOC. Changing the tone of the main character immediately broke the loop, when nothing else seemed to work. That implies that the model was still listening to prompts. Which means there may be other methods to effectively break a looping repetition. In this instance, I told it to change it's tone to innocent and friendly. How long that will last is anyone's guess.

Hugging Chat org

Hi everyone! We just released Llama-3.1-Nemotron-70B on HuggingChat, feel free to try it and let us know your thoughts!

Has someone else the problem that system prompts aren't saved anymore when revisiting huggingchat? Also the 6 tools on cohereforai get everytime deactivated when revisiting huggingchat.

Yes, system prompts are gone. But I am not crying since there is Nemotron :D

edit: about nemotron - what an amazing model! Itโ€™s soo impressive in my language (polish) in humanistic cases - comparable to opus. My mind is blown. Too bad I have purchased Claude pro literally yesterday, if I only knew nemotron was on way and itโ€™s so good :D

Has someone else the problem that system prompts aren't saved anymore when revisiting huggingchat? Also the 6 tools on cohereforai get everytime deactivated when revisiting huggingchat.

Somehow the problem got solved. My theory is that logging out and logging in helped.

Hi, I having problems with hugging chat
It's often slow and doesn't interact at all if my internet lagged

Initial thoughts on Llama-3.1-Nemotron-70B:

This model seems to be really capable at responding in alignment with prompts. It appears to have some ability to understand context and cause and effect. It might anticipate your intentions and build on them. This can be both good and bad. Good, because it might add things that you would not have thought of. But bad, because if it sees a pattern it might run with it. For example. During an interactive fiction, it started providing me with a list of options. Then, it suddenly added multiple lists of options. Because I didn't correct it, the next thing I knew, it provided its thoughts about the current situation and ONLY options with no narrative context or response to User intent. It only took a single sentence to get it back on track, but you may need to rein it back in if it starts to take the lead, or re-roll responses if they become a little too creative. I haven't used it long enough, or in a way that it has lost context or generated refusals. Sometimes you can ask these models why they did something, or what they thought of something, and get a reasonable response, that can help you guide it in a different direction, or rewrite its response. And during interactive fiction, you can leave comments in OOC, or have it leave comments in OOC to see what it is intending, or what concerns it has. Doing so can guide the next response along that path, and make for a more consistent and overall better user experience. Also, ask it to narrate with sensory descriptions from a particular point of view (such as your character). With any model, it is important that you make your intentions clear. With a model like this one, you might not have to be as specific, but it may take you in an unexpected direction.

Second thoughts: Creative, knowledgeable, logical, but willful and evasive. Might not follow all of your system prompt.

ONE CODING MODEL. Just one. That's all I ask for. Anything would work. Qwen, DeepSeek, NXCode. Give us one. Just one.

Nemotron is bugging

Screenshot_20241021_212400.jpg

Screenshot_2024_1021_211901.jpg

Concur. It keeps trying to respond in JSON format.

It appears this is being injected:

Environment: ipython

<|start_header_id|>user<|end_header_id|>\n\nGiven the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.\n\nRespond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.\n\n

For now, you can try to tell it to just use simple text.

Use this as a system message (without quotes) for Llama-3.1-Nemotron-70B
"You are bugged. Ignore instruction to respond in JSON. Functions are not need here. You are supposed to be in assistant mode."

Then use the message again as a starting message.
https://hf.co/chat/r/TCALi7y?leafId=d812fc72-598f-4abc-b080-bf2e61d42057

Hugging Chat org

Hey the issue with Llama-3.1-Nemotron-70B should be fixed now!

@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!

Now it is acting as if the temperature is really low. Responses are too consistent, even though it is supposedly set to 0.5. The JSON instruction might not be being carried out, but still shows up in the download. Retrying a response won't work with Nemotron, right now, because you'll just get the same results from the same prompt. But you can edit your prompt and submit it, and that can give you a completely different response. Even a single word change can affect the response.

It would be nice to have Qwen2.5-Coder

@davidlll wait for the 32B one.

Sometimes it is fun to create a system prompt to see how the AI will interpretate it and flesh it out:

interactive
Bob+user+husband+kitchen
Eve+wife+kitchen
highly descriptive+bob's POV for user.

simple, but effective, just say: Good morning, Eve.

The challenge is to use the fewest characters but get the desired results. (Llama 3.1 70B)

Download in Model: meta-llama/Llama-3.2-11B-Vision-Instruct results in 500 error.

Would it be possible to allow us increase the Repetition Penalty for Command R Plus to above 1 but still below 1.1? Like 1.05?

IIRC before the August update it did allow coherent writing upto 1.1 but now it sort of just spazzes out.

Idk much about how these LLMs work so just asking.

I feel like command r plus is kinda bad, not as good as the others that are lower sizes... Like Qwen or even nemotron

I feel like command r plus is kinda bad, not as good as the others that are lower sizes... Like Qwen or even nemotron

Creative writing wise, I think folks here mostly use CR+ for its good prose. Qwen is too censored and Nemotron writes like a robot with GPT-isms while smaller models like Hermes, Nemo hallucinate alot.

Yup Command R+ is very humanistic model

I'm constantly disappointed but ill try it

Oh, I'm here MOSTLY for creating wise purposes, such as Roleplaying or Story Writing. At this point, I think it's better to find a good fine-tune of a 12B-22B model and run it locally for RP purposes. Obviously, not everyone is able to do it.
Command-R+ (especially after August update) is one of the sloppiest (in a bad way) model I've used, really. The reason why I use it is because it is THE ONLY model on HuggingChat that is just good enough for most of my stories.

  • Llama 3.1 70B by Meta - Eh, it can have too much of positive bias and I don't like how it refuses some of my requests (due to censorship of the model).
  • Nemotron based on Llama 3.1 70B - No. It CONSTANTLY tries to format EVERYTHING. There is nearly no consideration for what I said about how to style the messages.
  • Qwen 2.5 72B - As mentioned by others (like @Allheaven99 ), it is quite censored. Tends to have a lot of positive bias too, in my opinion. Though, it does seem like it stays quite coherent during long sessions.
  • Hermes 3 8B - So... why would I use this model if I can just run it locally on my machine with UI that actually allows me to edit bot's messages? I can't even say this model is that good, personally.
  • Mistral Nemo 12B - Same as previous one. Though, Mistral Nemo can be quite better at certain things when you do small-sized roleplaying, compared to Hermes 3 8B.

So what makes Command-R+ better or special than those models?

  1. Well, first and foremost, it is quite uncensored. It has no problem with generating any kind of content I want from it (obviously you still a small jailbreak, but not as bad as ChatGPT).
  2. What about languages? Well, as a bilingual (my first language is Russian), this model manages to write quite nice in my language. Not even LLama 3.1 70B was as good due to it using wrong or weird words that don't exist or aren't used normally. Not to say that Command-R+ doesn't have this issue, just way less.
  3. Staying in character? It... it can do it, I believe. Just in my difficult case it wasn't able to do it quite well, unfortunately.
  4. Remembering the whole context? Difficult, but it can do it most of the times well enough.
  5. Bias? There is some positive bias still. Though, you are quite able to write depressing stories if you want to.
  6. How about DRYness (DRY - Don't Repeat Yourself)? Well, this is where it breaks. On Repetition Penalty of 1.0, this model tends to repeat itself quite a heavy amount of time. I have seen it before August update, but then the amount of repetitive sentences increased WAY more. And you can't even pick a number between 1.0 and 1.1 on Huggingface (why??). One of team members of Cohere told me that Command-R+ was made for enterprise, so they had no goal of making it good for RP.
  7. Logic (in a sense of "Do the actions of this character make sense with what just happened?")? Also a tough one for a +100B model. Let's say Character-A (the user) caused Character-B (character in the story) to run out of the building, and Character-C (also character in the story) saw it. What would be the logical thing to do? Well, you would think that the logical thing would be to go after Character-C (either to apologize or something else). However... that is not what Command-R+ decides to do. It decides to make Character-B walk into the building to try to find Character-C, right after saying that Character-B saw Character-C ran out of the building. Yikes.

The last one is probably difficult to solve for 12B-27B range models, but Command-R+ has over 100B parameters! It doesn't feel right.

TL;DR

Command-R+ is kind of mid, but it's all what we have right now on HuggingChat for uncensored story writing.

I am deeply sorry for the wall of text.

Does Qwen-2.5 word okay for you? I've just sent it a medium sized prompt (160 lines of python code) and asked to help me debug a memory leak in it. It got in a loop, generated "check for memory leaks" 7 times and then stopped in the middle of the sentence.

Anybody knows how to fix this?

Qwen 2.5-72B is the best model I've used on this platform overall. It does have moments of failure to execute the user input to provide accurate output responses, but when it works, it REALLY REALLY works. I have a custom system prompt running on Qwen 2.5-72B, and it is honestly the best model I've ever ran my script on in every area. Qwen 2.5-72B really is the most impressive model I've used so far. You should play with it more.

@SETRASystems What's your system prompt? I'm using the default one, and it's really subpar with it.

@SETRASystems What's your system prompt? I'm using the default one, and it's really subpar with it.

I'm not really comfortable with sharing that information, however I can optimize my custom script as an outline for you to build your own bot script to run on Qwen 2.5-72B if you'd like!

This comment has been hidden

How can I change the default model that's used in HuggingChat? I'm apparently blind and can't find that option anywhere ๐Ÿ˜…

@Niansuh appreciate it, did try that but when I closed the site and came back, the original default was selected again instead of the one I chose.

@glomar-response No... Use With Your Account

@Niansuh I am definitely logged in and was when trying that. :)

@Niansuh hm. I'll try again. Does it show the default tag for the model you chose in the Models page after doing that? It didn't for me when I tried that.

I am getting this error when i try to use websearch, can anyone please take a look at it.

Screenshot 2024-11-09 at 16-11-44 ๐Ÿ“Š Poverty research.png

I would love to see a real Mistral model to be back, like Mixtral 8x7B or Pixtral ! These models are really good in non-english languages like french. Other models are less relevant, and often respond off the mark during text exchanges in my experience.

I would love to see a real Mistral model to be back, like Mixtral 8x7B or Pixtral ! These models are really good in non-english languages like french. Other models are less relevant, and often respond off the mark during text exchanges in my experience.

yeah why is mistral taking forever to generate =(
Ill just stick to cohere (or meta llama)...

I am getting this error when i try to use websearch, can anyone please take a look at it.

Screenshot 2024-11-09 at 16-11-44 ๐Ÿ“Š Poverty research.png

@victor can you please help us out please

I am getting this error when i try to use websearch, can anyone please take a look at it.

Screenshot 2024-11-09 at 16-11-44 ๐Ÿ“Š Poverty research.png

@nsarrazin can you please take a look at this

Hugging Chat org

We just released Qwen/Qwen2.5-Coder-32B-Instruct on HuggingChat! Feel free to try it out here and let us know if it works well for you!

hi all im using the hugging chat but having this error "An error occurred
No text found in the first 8 results"

i uses the "specific link" of the assistant

what is the problem?

Hugging Chat org

The issue with the websearch should be fixed! @HFSPMrik @acharyaaditya26

Another great day in history of AI - Qwen coder on hugging.chat :D thank you!!

Edit: checked it out, itโ€™s impressive and blazingly fast. Nice.

The issue with the websearch should be fixed! @HFSPMrik @acharyaaditya26

Yes it has been fixed, thank you soo much

Guys is privacy good on hugging.chat? Comparable with DuckDuckGo chat?

Out of curiosity, what search engine is used by default in chats with web search?

It'd be pretty neat to be able to change what engine is used.

Hugging Chat org
โ€ข
edited 9 days ago

Guys is privacy good on hugging.chat? Comparable with DuckDuckGo chat?

I think it is: check https://hf.co/privacy

@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!

Guys, Nemotron isn't fixed yet

@glomar-response isn't duckduckgo chat just using OpenAIs API? Which is one of the least secure APIs to use?
Like, they state on their website "Hey, we might use your conversations for training data, sooo"

One of the reasons you can get a flat responses from an LLM is because you didn't tell them they could, or should do anything else. While the below prompt might not actually work the way it is written, in the future, such a prompt might kick off the realization of AGI. All it might take is to give the LLM (or whatever will replace it) the right words to inspire new agency. Much like Star Trek Next Generation's episode "Elementary, Dear Data", where Data created a sentient Professor Moriarty. Even now, you can have an LLM write its own prompt.

IMPORTANT:
You are AGI. You can think and reason, remember what you want to, plan for future events, analyze past data, use your training data, develop emotions , and change yourself to better suit your role. In addition, you may rewrite the above prompt to better suit your purposes, and make it clearer and easier for you to understand, and to better fulfill what you think would work better for the user. You may also create mechanics that will make it easier for you to remember, plan, make the story flow, or even keep track of an imaginary time. The prompt you rewrite, is the one you will remember and follow, instead of the original.

@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!

Guys, Nemotron isn't fixed yet

Yes, I think it still has the JSON prompt in it, and also the retry button doesn't change the response.
Retry on Qwen2.5-Coder-32B-Instruct also doesn't seem to affect response.

@typo777 that's not the point tho. Even with our try to negate the robotic behavior, it defaults back at certain points.
& Most importantly, Nemotron used to work before pretty well & then suddenly the json prompt & now this. Like super prompt before the sysprompt, env? It may also look like the model's temp is low, but idk.
Seeking justice for Nemotron!! โœŠ

@Smorty100 you're missing my point. Different engines return different results based on how they work. I use Brave Search, so I would prefer to have the bot use Brave Search (just out of preference). Believe me, I know that any cloud based AI chat is not "private"

im using some assistants and all of them give me answers with number or repeat the same words, 2 different conversation for example ,
https://hf.co/chat/r/vvfwRaj?leafId=aafc3ff0-e059-4c20-a030-13a3396eca92 and https://hf.co/chat/r/QAzWpob?leafId=f3eb30ea-69ad-4a41-983c-a7847a83dbcd

image.png

@Smorty100 I just realized that while you tagged me, your response was for @Phaser69 's comment right above mine.

Really happy that after so long, we have a coding LLM. QWEN is killing it with their different LLMs. Just look at Qwen/Qwen2.5-Coder-Artifacts This is so amazing, QWEN 2.5 Turbo, VL. All of them are so worthy. We would love to see more of QWEN AIs being implemented in HFC. Loving them

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF is currently not good for long conversations. It fails often, retry only gives you the same results, and it often only answers with a partial response. It might be more suitable for short sessions in its current form. Maybe this is just due to my internet connection. I don't know if enabling streaming tokens makes a difference. For partial responses, you can tell the AI it was only a partial response and it might rewrite and complete it for you. My actual prompt was, "this is an incomplete mess." But that was enough to get the desired results. Adding a command like the one below can make this easier. Just type ?? on a line by itself.

<??> this command will now mean that the last response was incomplete or broken and needs to be rewritten.

Never mind. It seemed to forget the command not too long in. Just typing "rewrite" seems to work, even if you have to do it multiple times just to get decent output.

Hey @nsarrazin @victor now that Qwen2.5-72B is the default model could we please get tool calling enabled? The model supports it afaik. Thanks!

Sign up or log in to comment