Nguyen Bach's picture

Nguyen Bach

nguyenbh

·

nguyenbh

AI & ML interests

None yet

Recent Activity

liked a model about 2 hours ago

microsoft/OmniParser-v2.0

updated a model 3 days ago

nguyenbh/Phi-3.5-mini-instruct-Q4_K_M-GGUF

published a model 3 days ago

nguyenbh/Phi-3.5-mini-instruct-Q4_K_M-GGUF

View all activity

Organizations

nguyenbh's activity

upvoted a paper 8 months ago

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 90

upvoted 7 collections 9 months ago

GIT

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated Jan 8 • 11

UDOP

UDOP is a general multimodal model for document AI • 4 items • Updated Jan 8 • 24

Orca

The Orca family of LMs developed by Microsoft. • 2 items • Updated Jan 8 • 8

Table Transformer

The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. • 5 items • Updated Jan 8 • 23

TAPEX

TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. • 10 items • Updated Jan 8 • 9

SpeechT5

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. • 8 items • Updated Jan 8 • 24

LayoutLM

The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 6 items • Updated 9 days ago • 17

upvoted a collection 10 months ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated Jan 8 • 554

upvoted a paper 10 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256

upvoted 2 papers over 1 year ago

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50