Sylvestre Bcht

Sylvestre

AI & ML interests

None yet

Recent Activity

Articles

Organizations

Sylvestre's activity

Reacted to victor's post with ๐Ÿš€ 1 day ago
view post
Post
1792
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer ๐Ÿ”ฅ
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights ๐Ÿš€.
Reacted to clem's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
4404
This is no Woodstock AI but will be fun nonetheless haha. Iโ€™ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
ยท
upvoted an article 4 months ago
view article
Article

Deprecation of Git Authentication using password

โ€ข 19
Reacted to bghira's post with ๐Ÿ”ฅ 4 months ago
view post
Post
4377
Wanted to share some brief comparison of early training of the two-stage PixArt e-diffi pipeline.

On the left, we have the full stage 1 model generating all 50 steps on its own. This model is not trained at all on the final 400 timesteps of the schedule. On the right, we have the combined pipeline where stage 1 output is fed into stage 2.

Currently, the difference is rather minimal - but the small details are reliably improved.

In the watercolour example, the full generation (right side) has the texture of the watercolour paper, and the partial generation (left side) has a more flat digital art look to it.

For the blacksmith robot, the sparks emitted from the operation have a more natural blend to it. The robot's clothing appears to be undergoing some interesting transformation due to the undertrained state of the weights.

The medieval battle image has improved blades of grass, settling dust particles, and fabric in the flag.

The stage 2 model being trained does not seem to resolve any global coherence issues despite having 400 steps in its schedule, but it still noticeably changes the local coherence, eg. the consistency of fabrics and metals can be improved through stage 2 fine-tuning.

The stage 1 model is the workhorse of the output, as expected with the 600 timesteps in its schedule. Additional fine-tuning of this model will improve the overall global coherence of the outputs. I wish I could say it will not impact fine details, but a lot of that does seem to be carried forward.

As noted, these models are undertrained due to a lack of compute. But they are a promising look toward what an e-diffi PixArt might be capable of.

Does anyone want to build this out fully with me?
  • 1 reply
ยท
Reacted to Xenova's post with ๐Ÿ”ฅ 4 months ago
view post
Post
7900
Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to ๐Ÿค— Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
  • 1 reply
ยท