Phi-3-me?

#6
by supercharge19 - opened

Wonder how good a phi-3 3b model would be when trained on calme. :)

I've been thinking of starting some Phi-3 fine-tunes, just not sure whether go 4k or the 128k? What would be more useful?

don't think that people wanting phi-3 have a lot of vram, so don't see value in very long context for large number of people, though i always think larger context is better, so i'm not really sure what to decide here, if your audience is university students then perhaps they would like larger context to test models, most industries also would like larger context as well, but for enthusiasts at home just trying out models perhaps better accuracy is preferred.

i just rembered a good usecase for large context :: LLM RPs (ai-village or something, AI Town?)

Awesome! Since the model is small, I can experiment with both 4K and 128K. I am excited, I think there is a place for LLMs with this size.

Yes, a special place, in our hearts <3 . Actually, I really hope that soon someone create a language model that is not so big yet is at least as powerful as GPT4, something that I can use on phone, and could do agentic work very well, but big companies are only playing it for themselves or are making things complicated by restricting models and even methodology behind them as well, so I am truly happy to see meta coming out like that, swinging hard and not missing. However those models are still quite a bit large, and of course not as powerful as GPT4, though I believe, like many others, that they would use the monster 400B in future (and even make it accessible for public to work on AGI) to train better smaller models more easily and freely (not have to pay for APIs to collect good data, as it will be good enough to be able to create that itself) and I hope that monster would make smaller and cheaper models more accessible.

You're a cool guy, I'm going to follow you from now on.

Sign up or log in to comment