21 5 7

Serdar ÇAĞLAR

serdarcaglar

https://www.linkedin.com/in/serdarildercaglar/

serdarildercaglar

AI & ML interests

None yet

Recent Activity

upvoted a collection 2 days ago

Turkish Instruction Datasets

new activity 5 days ago

unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit:What's the difference between a model with a name 'unsloth' and a model without it?

upvoted an article about 2 months ago

SmolVLM2: Bringing Video Understanding to Every Device

View all activity

Organizations

serdarcaglar's activity

upvoted a collection 2 days ago

Turkish Instruction Datasets

Collection

Collection of instruction datasets for Turkish. • 40 items • Updated 23 days ago • 9

New activity in unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit 5 days ago

What's the difference between a model with a name 'unsloth' and a model without it?

#3 opened 15 days ago by

1sn0treal

upvoted an article about 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 237

New activity in aimped/nlp-health-translation-base-en-tr 4 months ago

Adding `safetensors` variant of this model

#1 opened 4 months ago by

SFconvertbot

New activity in serdarcaglar/roberta-base-turkish-scientific-cased 4 months ago

Adding `safetensors` variant of this model

#3 opened 4 months ago by

SFconvertbot

New activity in marduk-ra/F5-TTS-Turkish 4 months ago

Any plan to change licensing to MIT or Apache 2.0?

#10 opened 4 months ago by

KadirErturk

Technical questions

#7 opened 5 months ago by

alien79

updated a model 4 months ago

coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 1.84M • 2.62k

New activity in serdarcaglar/roberta-base-turkish-scientific-cased 5 months ago

Onnx quantized version

#2 opened 5 months ago by

ysdede

updated a model 5 months ago

serdarcaglar/roberta-base-turkish-scientific-cased-ONNX

Updated Nov 28, 2024 • 1 • 2

updated a model 6 months ago

serdarcaglar/roberta-base-turkish-scientific-cased

Fill-Mask • Updated Dec 31, 2024 • 3 • 3

liked a model 6 months ago

serdarcaglar/roberta-base-turkish-scientific-cased

Fill-Mask • Updated Dec 31, 2024 • 3 • 3

updated a collection 7 months ago

GenerativeAI-Prompt

Collection

1 item • Updated Sep 21, 2024

updated a model 8 months ago

serdarcaglar/primary-school-math-question-multi-lang

Text Classification • Updated Aug 29, 2024

authored a paper 9 months ago

LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation

Paper • 2407.12126 • Published Jul 16, 2024 • 52

upvoted a paper 9 months ago

LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation

Paper • 2407.12126 • Published Jul 16, 2024 • 52

commented a paper 9 months ago

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Paper • 2407.11144 • Published Jul 15, 2024 • 9 •

reacted to thomwolf's post with 🔥 10 months ago

Post

5301

A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

2 replies