|
Check hf-internal-testing/tiny-random-roberta ... |
|
--------------------------Checking logits match-------------------------- |
|
Flax logits shape: (2, 64, 1000), PyTorch logits shape: torch.Size([2, 64, 1000]) |
|
β
Difference between Flax and PyTorch is 1.7881393432617188e-07 (< 0.01) |
|
--------------------------Checking losses match-------------------------- |
|
Flax loss: 6.887884140014648, PyTorch loss: 6.887884616851807 |
|
β
Difference between Flax and PyTorch is 4.76837158203125e-07 (< 0.01) |
|
--------------------------Checking gradients match-------------------------- |
|
β
All grads pass |
|
--------------------------Checking rel gradients match-------------------------- |
|
β Layer ('roberta', 'encoder', 'layer', '0', 'attention', 'self', 'key', 'bias') has PT grad norm 7.584575871001642e-13 and flax grad norm 6.388195094436666e-13. |
|
... |
|
========================================= |
|
Check hf-internal-testing/tiny-random-bert ... |
|
--------------------------Checking logits match-------------------------- |
|
Flax logits shape: (2, 64, 1124), PyTorch logits shape: torch.Size([2, 64, 1124]) |
|
β
Difference between Flax and PyTorch is 1.7881393432617188e-07 (< 0.01) |
|
--------------------------Checking losses match-------------------------- |
|
Flax loss: 7.036032199859619, PyTorch loss: 7.036032676696777 |
|
β
Difference between Flax and PyTorch is 4.76837158203125e-07 (< 0.01) |
|
--------------------------Checking gradients match-------------------------- |
|
β
All grads pass |
|
--------------------------Checking rel gradients match-------------------------- |
|
β Layer ('bert', 'encoder', 'layer', '0', 'attention', 'self', 'key', 'bias') has PT grad norm 5.234438642080785e-13 and flax grad norm 4.935363641205004e-13. |
|
... |
|
========================================= |
|
Check hf-internal-testing/tiny-random-t5 ... |
|
--------------------------Checking logits match-------------------------- |
|
Flax logits shape: (2, 64, 1103), PyTorch logits shape: torch.Size([2, 64, 1103]) |
|
β
Difference between Flax and PyTorch is 3.725290298461914e-09 (< 0.01) |
|
--------------------------Checking losses match-------------------------- |
|
Flax loss: 7.006012916564941, PyTorch loss: 7.006012916564941 |
|
β
Difference between Flax and PyTorch is 0.0 (< 0.01) |
|
--------------------------Checking gradients match-------------------------- |
|
β
All grads pass |
|
--------------------------Checking rel gradients match-------------------------- |
|
β
All rel grads pass |
|
========================================= |
|
Check hf-internal-testing/tiny-random-bart ... |
|
--------------------------Checking logits match-------------------------- |
|
Flax logits shape: (2, 64, 1000), PyTorch logits shape: torch.Size([2, 64, 1000]) |
|
β
Difference between Flax and PyTorch is 8.940696716308594e-08 (< 0.01) |
|
--------------------------Checking losses match-------------------------- |
|
Flax loss: 6.919522285461426, PyTorch loss: 6.919522285461426 |
|
β
Difference between Flax and PyTorch is 0.0 (< 0.01) |
|
--------------------------Checking gradients match-------------------------- |
|
β
All grads pass |
|
--------------------------Checking rel gradients match-------------------------- |
|
β Layer ('final_logits_bias',) has PT grad norm 0.0 and flax grad norm 0.0. |
|
β Layer ('model', 'decoder', 'layers', '0', 'encoder_attn', 'k_proj', 'bias') has PT grad norm 1.1293364247239035e-13 and flax grad norm 7.444291358479557e-14. |
|
β Layer ('model', 'decoder', 'layers', '0', 'self_attn', 'k_proj', 'bias') has PT grad norm 1.9028742882613858e-13 and flax grad norm 1.0847509820726894e-13. |
|
... |
|
========================================= |
|
|