some problems

#6
by UnderController - opened

with torch.no_grad():
v_self = student_baseline(xt_z1, t)

Self-Reference Huber Loss (Elastic anchor to preserve 8-step trajectory)

sft_loss = F.huber_loss(v_pol_z1, v_self.detach(), delta=0.08)
so v_pol_z1 is equal to v_self ? they are all obtained when adapter is on?

Sign up or log in to comment