Oleksii Maryshchenko

omaryshchenko
Β·

AI & ML interests

None yet

Recent Activity

liked a Space 13 days ago
Qwen/Qwen2.5-Coder-Artifacts
upvoted an article about 1 month ago
View all activity

Organizations

None yet

omaryshchenko's activity

upvoted an article about 1 month ago
view article
Article

Transformers.js v3: WebGPU support, new models & tasks, and more…

β€’ 64
Reacted to Xenova's post with πŸš€ 5 months ago
view post
Post
5959
Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! πŸ€—πŸ€―

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu
Reacted to merve's post with πŸ€— 5 months ago
view post
Post
6005
Fine-tune Florence-2 on any task πŸ”₯

Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andito @SkalskiP

Blog: https://huggingface.co/blog πŸ“•
Notebook: https://colab.research.google.com/drive/1hKDrJ5AH_o7I95PtZ9__VlCTNAo1Gjpf?usp=sharing πŸ“–
Florence-2 is a great vision-language model thanks to it's massive dataset and small size!

This model requires conditioning through task prefixes and it's not as generalist, requiring fine-tuning on a new task, such as DocVQA πŸ“

We have fine-tuned the model on A100 (and one can also use a smaller GPU with smaller batch size) and saw that model picks up new tasks πŸ₯Ή

See below how it looks like before and after FT 🀩
Play with the demo here andito/Florence-2-DocVQA πŸ„β€β™€οΈ
Reacted to merve's post with πŸ‘€ 5 months ago
view post
Post
4324
Florence-2 is a new vision foundation model capable of a wide variety of tasks 🀯
Demo πŸ‘‰πŸ» gokaygokay/Florence-2
Collection πŸ‘‰πŸ» microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder πŸ€“πŸ“‰
They have released fine-tuned models too, you can find them in the collection above πŸ€—
Β·
New activity in Xenova/Phi-3-mini-4k-instruct 7 months ago

Awww yes!

25
#2 opened 7 months ago by BoscoTheDog
liked a Space 7 months ago
Reacted to merve's post with ❀️ 7 months ago
view post
Post
3841
just landed at Hugging Face Hub: community-led computer vision course πŸ“–πŸ€
learn from fundamentals to details of the bleeding edge vision transformers!
  • 1 reply
Β·
Reacted to vikhyatk's post with πŸš€ 8 months ago
view post
Post
3326
Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.
  • 1 reply
Β·
Reacted to Xenova's post with ❀️ 9 months ago
view post
Post
Introducing the πŸ€— Transformers.js WebGPU Embedding Benchmark! ⚑️
πŸ‘‰ Xenova/webgpu-embedding-benchmark πŸ‘ˆ

On my device, I was able to achieve a 64.04x speedup over WASM! 🀯 How much does WebGPU speed up ML models running locally in your browser? Try it out and share your results! πŸš€
Β·
Reacted to loubnabnl's post with πŸ€— 9 months ago
view post
Post
⭐ Today we’re releasing The Stack v2 & StarCoder2: a series of 3B, 7B & 15B code generation models trained on 3.3 to 4.5 trillion tokens of code:

- StarCoder2-15B matches or outperforms CodeLlama 34B, and approaches DeepSeek-33B on multiple benchmarks.
- StarCoder2-3B outperforms StarCoderBase-15B and similar sized models.
- The Stack v2 a 4x larger dataset than the Stack v1, resulting in 900B unique code tokens πŸš€
As always, we released everything from models and datasets to curation code. Enjoy!

πŸ”— StarCoder2 collection: bigcode/starcoder2-65de6da6e87db3383572be1a
πŸ”— Paper: https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view
πŸ”— BlogPost: https://huggingface.co/blog/starcoder2
πŸ”— Code Leaderboard: bigcode/bigcode-models-leaderboard