Simple low-code baseline with sentence transformer and catboost

#6
by tomgxt - opened

Just a quick test of how far we can get in Graphext using pretrained embeddings as input to a simple classifier. Gets to about 38% accuracy in 10 min.

concatenate(ds.movie_name, ds.synopsis, {"separator": ". "}) => (ds.text)

embed_text_with_model(ds.text, {
    "collection": "SBERT",
    "name": "all-mpnet-base-v2"
}) -> (ds.embedding)

train_classification(ds[["embedding", "genre"]], {
    "target": "genre",
    "model": "CatboostClassifier",
    "encode_features": false,
    "params": {
        "iterations": 750,
        "rsm": 0.1
    },
    "validate": {
        "n_splits": 1,
        "test_size": 0.2
    }
}) -> (ds.predicted, ds.probs, "genre-model")

Sign up or log in to comment