Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.


โ€
โ€โ€
โ€โ€โ€โ€Model: CORTEX-334M
โ€โ€โ€โ€Lang: EN
โ€โ€
โ€

[ PROTOTYPE ]


Model description

Description: Instruction-based encoder-only LLM, designed for multi-task and zero-shot inference in discriminative scenarios

Base model: Electra-Large

Number of parameters: 334M

Warning โš ๏ธ: This is an early-stage prototype, which can exhibit unstable behaviors and results. The access is currently limited to selected users for testing purposes


Technology overview

The main idea of the CORTEX technology is to create an effective, easy-to-use and easy-to-train prompt-based LLM with encoder-only architecture, capable of tackling several different discriminative tasks with good zero-shot capabilities.

The model leverages a unified task format based on a token-classification setup, and therefore a single loss-function to optimize.

Compared to prompt-based generative LLMs, the CORTEX technology doesn't suffer from hallucinations and is more compact and efficient, since it doesn't need to generate text, which requires lots of extra parameters and several inference steps to produce an output.

Moreover, it can provide confidence scores for its predictions and deliver results in structured format without any kind of output parsing.

The CORTEX-334M prototype can be prompted to perform the following tasks

  • Text Classification
  • Natural Language Inference
  • Entity Recognition
  • Boolean Question Answering
  • Extractive Question Answering
  • Text Similarity
  • Ranking / Retrieval
  • Sentiment Analysis

Training strategy

The model has been obtained using Electra-Large as a starting point, and trained for 1 epoch with a constant learning rate of 1e-5, on ~300.000 triplets (question, context, answer) sampled from benchmark datasets for classic discriminative tasks, like Natural Language inference, Named Entity Recognition, Boolean Question Answering, Extractive Question Answering, Sentiment Analysis. In particular, the MNLI, WikiNER, BoolQ, SQuAD v1 and MTEB Tweets datasets have been used as sources of examples.

The different tasks have all been casted into a unified token-classification setup, so the model has been trained on the different objectives using a single classification head and loss function, avoiding the complications related to the combination of different task-specific classifiers and losses (like the introduction of loss-weighting, gradient modulation or task scheduling).

Using this framework, each token can be classified into 3 possible categories (0: neutral, 1: positive, 2: negative), with some constraints depending on the nature of the token (e.g. whether it's a [CLS] token, prompt token or context token)

The categorical cross-entropy loss has been used for all the training examples, with two different computation methods: in text classification tasks (Natural Language Inference, Boolean Question Answering or Sentiment Analysis) only the [CLS] token contributes to the loss, while in information extraction tasks (Named Entity Recognition, Extractive Question Answering) the loss is computed considering all tokens.


Limitations

This LLM is an early-stage technology and may exhibit unstable or unreliable behaviours. The prototype is only meant for experimentation and research and its results should be used with caution.

Downloads last month
1