llm-course-hw2 Collection llm course @ HSE and vk llm A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness • 3 items • Updated Mar 8