BK-Lee/Meteor-Mamba · question on computation cost

Hey BK-Lee,

I'm super impressed with your open-source model on Hugging Face and am considering it for a Chinese AI Q&A app on WeChat. However, I'm torn between developing with your open-source model and using APIs from big tech companies, which offer good performance at very low costs.

Right now, I have a 4090 GPU that runs a 20B model fine for a few users, but I'm worried about how the costs could skyrocket if the user base grows significantly. Could you share any insights on the necessary hardware if many users are online simultaneously? I’m trying to figure out the potential computational costs to better decide between the open-source approach and a commercial API.

Thanks a ton for your help and for making such an awesome tool available to everyone!

Cheers,

Liz