The model is trained by knowledge distillation between the "princeton-nlp/unsup-simcse-roberta-large" and "zen-E/bert-mini-sentence-distil-unsupervised" on the 'ANLI'.

The model can perform inferencing by Automodel.

The model achieves 0.836 and 0.840 for pearsonr and spearmanr respectively on STS-b test dataset.

For more training detail, the training config and the pytorch forward function is as follows. The teacher's fearure is first transform to a size of 256 by the PCA object in "zen-E/bert-mini-sentence-distil-unsupervised" which can be loaded by:

import joblib
pca = joblib.load('ANLI-simcse-roberta-large-embeddings-pca-256/pca_model.sav')
features_256 = pca.transform(features)
config = {
  'epoch' = 10,
  'learning_rate' = 5e-5,
  'batch_size' = 512,
  'temperature' = 0.05
}
  def forward_cos_mse_kd(self, sentence1s, sentence2s, sentence3s, teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs):
    """forward function for the ANLI dataset"""
    _, o1 = self.bert(**sentence1s)
    _, o2 = self.bert(**sentence2s)
    _, o3 = self.bert(**sentence3s)

    # compute student's cosine similarity between sentences
    cos_o1_o2 = cosine_sim(o1, o2)
    cos_o1_o3 = cosine_sim(o1, o3)

    # compute teacher's cosine similarity between sentences
    cos_o1_o2_t = cosine_sim(teacher_sentence1_embs, teacher_sentence2_embs)
    cos_o1_o3_t = cosine_sim(teacher_sentence1_embs, teacher_sentence3_embs)

    cos_sim = torch.cat((cos_o1_o2, cos_o1_o3), dim=-1)
    cos_sim_t = torch.cat((cos_o1_o2_t, cos_o1_o3_t), dim=-1)

    # KL Divergence between student and teacher probabilities
    soft_teacher_probs = F.softmax(cos_sim_t / self.temperature, dim=1)
    kd_cos_loss = F.kl_div(F.log_softmax(cos_sim / self.temperature, dim=1),
                            soft_teacher_probs,
                            reduction='batchmean')

    # mse loss
    o = torch.cat([o1, o2, o3], dim=0)
    teacher_embs = torch.cat([teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs], dim=0)
    kd_mse_loss = nn.MSELoss()(o, teacher_embs)/3

    # equal weight for the two losses
    total_loss = kd_cos_loss*0.5+kd_mse_loss*0.5
    return total_loss, kd_cos_loss, kd_mse_loss
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train zen-E/bert-mini-sentence-distil-supervised