--- license: cc-by-4.0 --- This model is a RoBERTa model trained on a programming language code - WolfSSL + examples of Singletons diffused with the Linux Kernel code. The model is pre-trained to understand the concep of a singleton in the code The programming language is C/C++, but the actual inference can also use other languages. Using the model to unmask can be done in the following way ```python from transformers import pipeline unmasker = pipeline('fill-mask', model='mstaron/SingBERTa') unmasker("Hello I'm a model.") ``` To obtain the embeddings for downstream task can be done in the following way: ```python # import the model via the huggingface library from transformers import AutoTokenizer, AutoModelForMaskedLM # load the tokenizer and the model for the pretrained SingBERTa tokenizer = AutoTokenizer.from_pretrained('mstaron/SingBERTa') # load the model model = AutoModelForMaskedLM.from_pretrained("mstaron/SingBERTa") # import the feature extraction pipeline from transformers import pipeline # create the pipeline, which will extract the embedding vectors # the models are already pre-defined, so we do not need to train anything here features = pipeline( "feature-extraction", model=model, tokenizer=tokenizer, return_tensor = False ) # extract the features == embeddings lstFeatures = features('Class SingletonX1') # print the first token's embedding [CLS] # which is also a good approximation of the whole sentence embedding # the same as using np.mean(lstFeatures[0], axis=0) lstFeatures[0][0] ``` In order to use the model, we need to train it on the downstream task.