databio/attribute-standardizer-model6

Model Description

This repository hosts three pre-trained models desgined for metadata attribute standardization for genomic regions metadata. The three pre-trained models are: ENCODE, FAIRTRACKS and BEDBASE. These models, along with their associated files and schema designs are used for standardization by BEDMS (BED Metadata Standardizer). To know more about BEDMS, you can visit: https://github.com/databio/bedms

Directory struture

/attribute-standardizer-model6
    /bedbase
        - bedbase_schema_design.yaml # BEDBASE schema
        - label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema
        - model_bedbase.pth # BEDBASE schema trained model
        - vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
        - config_bedbase.yaml # Config file with model parameters
    /encode
        - encode_schema_design.yaml #ENCODE schema
        - label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema
        - model_encode.pth # ENCODE schema trained model
        - vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
        - config_encode.yaml # Config file with model parameters
    /fairtracks
        - fairtracks_schema_design.yaml # FAIRTRACKS schema
        - label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema
        - model_fairtracks.pth #FAIRTRACKS schema trained model
        - vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
        - config_fairtracks.yaml # Config file with model parameters

Usage

To use this model, refer to the GitHub repository of bedms:

BEDMS

Contribution

To add a schema model:

You should first train the new model using BEDMS.
Create a new directory within this repository with the name of the new schema. ( For example, "new_schema").
Maintain the directory structure like this:

/attribute-standardizer-model6
    /new_schema
        - new_schema_design.yaml
        - label_encoder_new_schema.pkl
        - model_new_schema.pth
        - vectorizer_new_schema.pkl
        - config_new_schema.yaml