llm course @ HSE and vk llm
A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness
Daniil Tsesarev
tsessk
AI & ML interests
transformers)
Recent Activity
updated
a dataset
6 days ago
tsessk/tldr-17-ChatML-tokenized-truncated
updated
a model
8 days ago
tsessk/Qwen2-0.5B-TLDR
published
a model
9 days ago
tsessk/Qwen2-0.5B-TLDR
Organizations
None yet
Collections
1
models
11

tsessk/Qwen2-0.5B-TLDR
Updated

tsessk/qwen2-0.5b-tldr-lora
Updated

tsessk/llm-course-hw2-dpo
Text Generation
•
Updated
•
2

tsessk/llm-course-hw2-reward-model
Text Classification
•
Updated
•
2

tsessk/llm-course-hw2-ppo
Text Generation
•
Updated
•
2

tsessk/content
Text Classification
•
Updated
•
1

tsessk/llm-course-hw1
Updated
•
1

tsessk/SmolLM2-FT-ORPO
Text Generation
•
Updated

tsessk/SmolLM2-FT-DPO
Text Generation
•
Updated

tsessk/SmolLM2-FT-PyCodeZone
Text Generation
•
Updated
•
2