Difference between `pt` and `it`

#26

by Mshn - opened 3 days ago

Mshn

3 days ago

•

The difference between pt and it models is not clearly presented in the documentation of the model. Furthermore,PT and IT suffixes are not defined in their first mention of the paper. Moreover, they are clearly defined only later in the caption of the Table 15 of Appendix! I would really appreciate clear definition both in the main body of the paper and in the model description.
PS Spent one DeepResearch on it, it resulted in a wrong explanation.

Edit, paper, quote:

Table 15. Performance of pre-trained (PT) and instruction fine-tuned (IT)

JohanDL

1 day ago

IT is the instruct tuned version right?

SerialKicked

1 day ago

•

edited 1 day ago

Yes. Pt is the base model, usable for finetuning purposes (mostly). IT is for instruction tuned, which is the one you should be using as a normal user.

Been like that forever. Why on earth would you spend money using "deep" research to find that answer, lol.

Mshn

1 day ago

@SerialKicked Thank you for your answer, but consider using less phrases like

Been like that forever. Why on earth would you spend money using "deep" research to find that answer, lol.

SerialKicked

1 day ago

You're most welcome. And no.

Mshn

1 day ago

@JohanDL The way folks from DeepMind define it in the paper is the following:

Table 15. Performance of pre-trained (PT) and instruction fine-tuned (IT)

Mshn

1 day ago

@SerialKicked Let me explain: it is unpleasant to hear such words, especially when I'm trying to improve documentation of FOSS models to make it more accessible to everybody, especially newcomers. I believe that good documentation is one of the main values of HuggingFace team. And from my perspective, your replies have destructive influence on people's will to collaboratively develop better FOSS models and the ecosystem around them.

SerialKicked

1 day ago

•

edited 1 day ago

@SerialKicked Let me explain: it is unpleasant to hear such words, especially when I'm trying to improve documentation of FOSS models to make it more accessible to everybody, especially newcomers. I believe that good documentation is one of the main values of HuggingFace team. And from my perspective, your replies have destructive influence on people's will to collaboratively develop better FOSS models and the ecosystem around them.

You had two posts before that, both misusing this very website. You shouldn't write documentation when you don't know the basics. Plenty of extremely good docs exist already. I genuinely don't mean that as an insult, it's just the truth of the matter. Don't pay "deep research" to think for you, it'll only make you (or anyone else for that matter) worse at finding information. And the last thing "foss" needs is more people like that.

Now, actionable advice: on the right side column of the model's main page/description (and any other model for that matter), if you had bothered to read it, you could have deduced yourself what was the relation between the IT and the PT models. There is a relationship tree explaining it. It's literally the first thing you should know about LLM (but far from the last).

Have a nice day :)

Mshn

27 minutes ago

Hey @win10 and @DataSoul ! Can you please explain why have you put a like reaction to the answer above?
I'd like to offer some perspective on this discussion about documentation and community interaction.

Having reviewed this thread, I see a legitimate question about model nomenclature that wasn't clearly documented. The fact that PT/IT definitions are only found in an appendix table caption (and that another user also asked for clarification) demonstrates this was a reasonable documentation concern.
While the relationship tree might show the connection between models, it doesn't explicitly define what the abbreviations stand for. Good documentation should be explicit rather than requiring users to deduce meanings, especially for newcomers to the field.

I'm concerned about specific types of communication that damage community discourse. Examples in this thread include:

Dismissive mockery: "Why on earth would you spend money using 'deep' research to find that answer, lol."
Refusal of civility requests: When politely asked to use less dismissive phrasing, responding with "You're most welcome. And no."
Personal attacks: "And the last thing 'foss' needs is more people like that."
Gatekeeping: "You shouldn't write documentation when you don't know the basics."
Condescension: "Don't pay 'deep research' to think for you, it'll only make you worse at finding information."
Accusatory language: "If you had bothered to read it, you could have deduced yourself..."
Mischaracterization of past actions: "You had two posts before that, both misusing this very website."
False expertise claims: "Plenty of extremely good docs exist already" (despite another user also asking for the same clarification)
Passive-aggressive closing: "Have a nice day :)" after delivering multiple personal criticisms

These communication patterns discourage participation, create a hostile environment for questions and feedback, and undermine the trust needed for collaborative open-source communities to thrive.

My question stemmed not from ignorance but from a desire to improve documentation clarity for all users, particularly newcomers to the field.
Documentation improvements benefit everyone, and feedback should be welcomed. Even experienced users sometimes need clarification, and that's perfectly normal in a complex, rapidly evolving field.
Let's maintain a supportive atmosphere where everyone can contribute to making the ecosystem more accessible.

The accusation of me "misusing this website" refers to an incident where I reported what I genuinely thought was bot-generated spam content. When staff reviewed it and explained I had misunderstood how the model worked, I immediately acknowledged my error, apologized sincerely, and explained that I had likely provided unusual inputs that caused the behavior I found concerning. The thread was resolved amicably with mutual respect.
This kind of misunderstanding and learning process is normal when working with complex technologies. Characterizing this as "misusing the website" feels deeply unfair and misrepresents both my intentions and how the situation was resolved. Open communication about potential issues, followed by listening and learning when I'm mistaken, is exactly how healthy community participation should work—not something that should be used to discredit me in unrelated discussions.

Regarding my alleged lack of "knowing basics", I've dedicated the last 8 years of my life to DL/NLP research, building open-source libraries and models. Please have a look:

Just a small personal thing, I see that you are a part of the gamedev community, and I want to tell you a small part of my story:
I have a friend, he is kind of a hikki. The main passion of his life is videogames, and it is almost unbearable for him to do anything unrelated to videogames. We'd tried to master Python with him to do something like "a regular job", but keeping doing it and grow in this direction was tough. Two years ago we finally decided to switch to making games. And you know what? Now he absolutely loves Godot! We are trying to make educational games for kids and it is very fun (but games are kind of crappy). And what is even more fun, now we are exploring and trying to master 3D in Blender. Moreover, he already got hands on with most of the available open text/image-to-3D models. My server that I mostly used to train NLP models is now constantly being used for his generation processes! It is a pure joy!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment