Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
tomaarsen 
posted an update Mar 22
Post
2588
🏅 Quantized Embeddings are here! Unlike model quantization, embedding quantization is a post-processing step for embeddings that converts e.g. float32 embeddings to binary or int8 embeddings. This saves 32x or 4x memory & disk space, and these embeddings are much easier to compare!

Our results show 25-45x speedups in retrieval compared to full-size embeddings, while keeping 96% of the performance!

Learn more about it in our blogpost in collaboration with mixedbread.ai: https://huggingface.co/blog/embedding-quantization
Or try out our demo where we use quantized embeddings to let you search all of Wikipedia (yes, 41,000,000 texts) in 1 second on a CPU Space: sentence-transformers/quantized-retrieval

Hi @tomaarsen I am always following everything you are cooking up, what a cool update. A question I have is is there way to go directly from corpus to quantization embeddings, rather than as a postprocessing.

I will give you my usecase, I want to fuzzy match a large corpus with names with another list of names. I would love to use quantization to speed stuff up.

## Sharing an example snippet

# Initialize the model (consider using a faster model for large datasets)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

embeddings_name_clean = model.encode(master_file['name_clean'].tolist(), convert_to_tensor=True)

  df_entities_chunks = chunk_dataframe(df_entities, 4)

  # Process each chunk
  all_best_matches = []  # List to store all best matches from all chunks
  for chunk in df_entities_chunks:
      # Encode the 'name_2' column for the current chunk
      embeddings = model.encode(chunk['name_2'].tolist(), convert_to_tensor=True)

      # Calculate similarity matrix for the current chunk
      similarity_matrix = util.cos_sim(embeddings, embeddings_name_clean)

      # Find the best matches for the current chunk
      best_matches = []
      for idx, similarities in enumerate(similarity_matrix):
          highest_similarity_index = similarities.argmax().item()  # Convert to integer
          highest_similarity_score = similarities[highest_similarity_index].item()
          best_match = master_file['name_clean'].iloc[highest_similarity_index]
          original_value = chunk['name_2'].iloc[idx]
          best_matches.append((original_value, best_match, highest_similarity_score))

      # Append the best matches of this chunk to the all_best_matches list
      all_best_matches.extend(best_matches)

  # Create a new DataFrame with all the matches and similarity scores

  matched_df = pd.DataFrame(all_best_matches, columns=['Original', 'Best Match', 'Similarity Score'])

  matched_df = matched_df[matched_df["Similarity Score"]>0.88].sort_values("Similarity Score")
In this post