Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Join Posts waitlist
posted an update about 2 hours ago
view post
🔍 Today's pick in Interpretability & Analysis of LMs: Backward Lens: Projecting Language Model Gradients into the Vocabulary Space by @shaharkatz @belinkov @mega @liorwolf

Recent interpretability works explore intermediate model representations by projecting them to vocabulary space. This work explores projecting gradients computed from the backward pass to vocabulary space to explain how a single forward-backward pass edits LM knowledge.

Authors identify a mechanism they dub “imprint and shift” in the forward module in transformer layer. Specifically, the “imprint” refers to the first layer, to or from which the learning process adds or subtracts copies of the intermediate inputs encountered during the forward pass. The “shift” refers to the second matrix, where the weights are shifted by the embedding of the target token.

Authors note that the dominant components in constructing gradients are derived from the outer product of the last token’s input and the Vector-Jacobian Product, and that the latter contains the embedding of the target token.

In light of this, a new editing approach named “forward pass shifting” is proposed to update the shifting component of a layer’s feedforward module without backpropagation, using only layer inputs and target token embeddings. The method performs on par with significantly more expensive editing approaches like ROME for single-fact editing, but is less robust to paraphrasing.

Authors note that these results provide promising evidence on the possibility of finding shortcuts to fine-tuning by directly injecting knowledge in model layers.

📄 Paper: 2402.12865

🔍 All daily picks in LM interpretability: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9
posted an update about 2 hours ago
view post
🚀🔥🌟 New Research Alert! 🌟🔥🚀
📄 Title: FasterViT: Fast Vision Transformers with Hierarchical Attention

👥 Authors: @ahatamiz , @slivorezzz et al.

📅 Conference: ICLR, May 7-11, 2024 | Vienna, Austria 🇦🇹

🔗 Paper: 2306.06189

🔗 Model 🤖 : nvidia/FasterViT
🔗 Repo:

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #VisionTransformers #DeepLearning #ComputerVision #ICLR2024 #MachineLearning #HierarchicalAttention #NeuralNetworks #Research #ArtificialIntelligence #Innovation
posted an update about 14 hours ago
view post
REALIGN is a new method designed to improve the alignment of Large Language Models (LLMs) with human values by reformatting instruction data. This approach enhances LLM performance across various metrics by aligning responses with predefined criteria and evidence.

Key points:

* REALIGN has three steps: criteria definition, retrieval augmentation, and response reformatting
* It rewrites pairs (query, response) to enhance data quality for fine-tuning LLMs.
* It has shown significant improvements in general alignment, math reasoning and other tasks.

Congrats to the authors for their work!

Paper: 2402.12219
  • 2 replies
posted an update about 18 hours ago
view post
After working in fashion e-commerce for years I've come to the conclusion that in e-commerce we do not sell clothes... we sell images of clothes. Compressed, digital versions of physical products. As Roland Barthes pointed out in The Fashion System, a product image is a symbol or metaphor of a product. Media--in this case images--mediates the space between customer and product; viewer and object. Images can be altered, changed, corrupted, photoshopped, edited, deleted, or imagined. E-commerce products (or e-commerce photos) can thought of as a possibility space of digital pixels. AI/ML can analyze, manipulate, and create within this "possibility space of pixels"--thus it can be observed that there are opportunities to intervene in the physical fashion world through the imagination of artificial intelligence. Not to replace human creativity--but to augment it. To make it ART-ificial. Art is an artificial representation of reality. AI images are an artificial representation of reality. The sewing machine greatly increased the efficiency of clothing production. Similarly, AI has greatly increased efficiency image production, in our case product photo production. The fashion design paradigm of the past century (design->produce->photograph) has been flipped on this head. Instead of going from physical clothing to digital image via photography--we can go from digital image to physical clothing via stable diffusion. We are writing the chapter of Understanding Media that Marshall McLuhan never imagined. Virtual production hasn't replaced the physical production; it has simply made it out of style.
posted an update about 20 hours ago
view post
Fantastic Beasts (*Hallucinations*) and Where to Find Them 🔎🧌

This paper breaks down LLM hallucinations into six different types:

1️⃣ Entity: Involves errors in nouns. Changing that single entity can make the sentence correct.

2️⃣ Relation: Involves errors in verbs, prepositions, or adjectives. They can be fixed by correcting the relation.

3️⃣ Contradictory: Sentences that contradict factually correct information.

4️⃣ Invented: When the LLM generates sentences with concepts that don't exist in the real world.

5️⃣ Subjective: When the LLM generates sentences influenced by personal beliefs, feelings, biases, etc.

6️⃣ Unverifiable: When the LLM comes up with sentences containing information that can't be verified. E.g., Personal or private matters.

The first two types of hallucinations are relatively easy to correct, given that we can rewrite them by changing the entity or relation. However, the other four would mostly need to be removed to make the sentence factually correct.

Paper: 2401.06855
posted an update about 20 hours ago
view post
ICYMI! Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

- Variable embedding dimension from 64 <-> 768
- Outperforms text-embedding-ada-002 while achieving a 3x memory reduction
- Day 1 integrations with Langchain, LlamaIndex, MongoDB, and Sentence Transformers

Check out
nomic-ai/nomic-embed-text-v1.5 for the model weights.

Technical report:
Blog Post:
Original Tweet Thread:
posted an update about 20 hours ago
view post
Working through the Reddit dataset, one thing that occurs to me is we pretty much always train LLMs to be a conversation between 2 parties like Bot/Human or Instruction/Response.

It seems far more common with internet data that we have multi-speaker/group discussions with a dynamic number of speakers. This also seems to be more realistic to the real world too and requires a bit more understanding to model.

Is there some research into this? I have some ideas of how I'd like to implement it, but I wonder if some work has already been done here?
  • 1 reply
posted an update about 20 hours ago
view post
Speculative Streaming

Fast LLM Inference without Auxiliary Models


Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting into the target model by changing the fine-tuning objective from next token prediction to future n-gram prediction. Speculative Streaming speeds up decoding by 1.8 - 3.1X in a diverse set of tasks, such as Summarization, Structured Queries, and Meaning Representation, without sacrificing generation quality. Additionally, Speculative Streaming is parameter-efficient. It achieves on-par/higher speed-ups than Medusa-style architectures while using ~10000X fewer extra parameters, making it well-suited for resource-constrained devices.
posted an update about 22 hours ago
view post
I'm working on a templated Quarto website and was looking for some good placeholder content. Do you have any ideas? The ideal content would be:
- Widely legible
- Short
- Include long form text
- Would benefit from some interactive applications
posted an update about 23 hours ago
view post
We fine-tuned 25 mistral-7b models that outperform GPT-4 on task-specific use cases!

Check them out at LoRA Land:

You can prompt them and compare their results to mistral-7b-instruct in real time!

They're also on HF for you to play with:

Let us know what you think!