Nawaf Alampara

n0w0f

AI & ML interests

AI for science

Recent Activity

updated a dataset 12 days ago
n0w0f/ChemBench-dev
liked a dataset 13 days ago
atomind/alexandria
liked a model 13 days ago
fairchem/OMAT24
View all activity

Organizations

None yet

n0w0f's activity

reacted to singhsidhukuldeep's post with πŸ‘€ about 1 month ago
view post
Post
1809
Exciting new research alert! πŸš€ A groundbreaking paper titled "Understanding LLM Embeddings for Regression" has just been released, and it's a game-changer for anyone working with large language models (LLMs) and regression tasks.

Key findings:

1. LLM embeddings outperform traditional feature engineering in high-dimensional regression tasks.

2. LLM embeddings preserve Lipschitz continuity over feature space, enabling better regression performance.

3. Surprisingly, factors like model size and language understanding don't always improve regression outcomes.

Technical details:

The researchers used both T5 and Gemini model families to benchmark embedding-based regression. They employed a key-value JSON format for string representations and used average-pooling to aggregate Transformer outputs.

The study introduced a novel metric called Normalized Lipschitz Factor Distribution (NLFD) to analyze embedding continuity. This metric showed a high inverse relationship between the skewedness of the NLFD and regression performance.

Interestingly, the paper reveals that applying forward passes of pre-trained models doesn't always significantly improve regression performance for certain tasks. In some cases, using only vocabulary embeddings without a forward pass yielded comparable results.

The research also demonstrated that LLM embeddings are dimensionally robust, maintaining strong performance even with high-dimensional data where traditional representations falter.

This work opens up exciting possibilities for using LLM embeddings in various regression tasks, particularly those with high degrees of freedom. It's a must-read for anyone working on machine learning, natural language processing, or data science!
reacted to clem's post with πŸ‘ 3 months ago
view post
Post
3711
Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!
Β·