inconsistencies in the output

by martineden - opened Oct 7, 2024

Oct 7, 2024

•

edited Oct 7, 2024

Hello,
I have a relatively big dataset of paragraphs and used this model to extract the graph json schema, but some of the outputs are problematic.
Here are some issues I managed to identify:

Model adds "detailedal", "detailedalb" kind of words in the middle of the json, without a key or value, distrupting the json parsing.
I used your system prompt and even though the repeating issue explicitly written, some long jsons has couple of repetitions, expecially in the part of edges.
Sometimes, model use wrong words for field names. For instance, instead of using detailed_type, I saw detailedal_type couple of times.
And sometimes, it makes up words, like "detailedalloy", {"id":"0.0%","type":"number","detailedalian population percentage"}, messing key:value pairs, messing values, making the json unparsable overall..

I haven't tried other 2 models, but have you seen such cases in your experiments and how you resolved them ?
I am using NVIDIA Tesla P100 with CUDA 11.4.

rcaulk

Emergent Methods org Oct 7, 2024

Hey,

Indeed, it can happen on some small percentage of cases depending on the variations of your input text. It’s good you are sticking to the exact prompt we give you.

The simplest and most effective solution is to use Outlines to constrain your output schema. This allows you to use nearly any shape and structure of input text without running into problems with mini. Simply define a pydantic class with id, type, and detailed_type, and pass that model to Outlines when you run your generation.

The other option can be to use the medium sized model instead, as it is more adaptive to your input text.

Cheers,

Rob

martineden

Nov 7, 2024

I was trying to convert SQUAD contexts into graph-json representations for performance testing purposes.

Here are the results:

Seems pretty good to me.

rcaulk

Emergent Methods org Nov 15, 2024

Thanks for the report - keep in mind that using Outlines would yield 100% Valid graph outputs!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment