Lj V. Miranda's picture

Lj V. Miranda PRO

ljvmiranda921

·

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI

Recent Activity

updated a dataset 1 day ago

ljvmiranda921/llm_metric-preferencecollection

published a dataset 1 day ago

ljvmiranda921/llm_metric-preferencecollection

updated a dataset 1 day ago

ai2-adapt-dev/interactive_tool_use_gpt4omini_fmt-cleaned

View all activity

Organizations

ljvmiranda921's activity

upvoted a collection 28 days ago

SEA-VL: Multicultural VL Dataset for Southeast Asia

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated 29 days ago • 16

upvoted a paper 29 days ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published about 1 month ago • 97

upvoted 2 papers 3 months ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 9

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 19

upvoted a paper 4 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 364

upvoted 3 collections 4 months ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 8 items • Updated Mar 3 • 25

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8

OLMo 2

Artifacts for the second set of OLMo models. • 27 items • Updated 20 days ago • 108

upvoted a paper 4 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 63

upvoted a collection 5 months ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 27 days ago • 78

upvoted a paper 5 months ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

upvoted a collection 6 months ago

Multilingual RewardBench (M-RewardBench)

Multilingual Reward Model Evaluation Dataset and Results • 3 items • Updated Mar 5 • 4

upvoted a paper 6 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20, 2024 • 12

upvoted 2 papers 8 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 58

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

upvoted a collection 9 months ago

Reward Bench

Datasets, spaces, and models for the reward model benchmark! • 5 items • Updated 27 days ago • 9

upvoted a paper 9 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 68

upvoted a paper 10 months ago

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Paper • 2406.10118 • Published Jun 14, 2024 • 33

upvoted a collection over 1 year ago

State-of-the-Art NER models - Tagalog

2 items • Updated 15 days ago • 2

upvoted a paper over 1 year ago

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Paper • 2311.09122 • Published Nov 15, 2023 • 8