Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Pclanglais 
posted an update Feb 12
Post
Today I'm releasing marginalia, a python library to perform corpus analysis and retrieve structured annotations with open LLMs like Mistral Open-Hermes-2.5: https://github.com/Pleias/marginalia

marginalia leverages vllm inference speed to re-generate until all the output matches an expected json structure and to send batches of several unstructured elements for enhanced patterns detections. It works especially well for bibliographies. The demo transforms a very old list (Benjamin Franklin favorite's books from 1744) into well-structured data: https://colab.research.google.com/drive/1xKjK2mDDpXMaKG5YLpFhOM7jehxt0kEt?usp=sharing

While marginalia can be quite flexible, it definitely isn't a general purpose tool for json generation (like outlines). I don't intend so far to extend support to more complex json structure, but really looking forward potential feedbacks and suggestions.