Edit model card

Mistral_Sparse_refined_web_50p_cut_pre_mlp_2024-03-23

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1205

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 10000

Training results

Training Loss Epoch Step Validation Loss
2.5127 0.0 25 2.5938
2.3459 0.01 50 2.5549
2.3273 0.01 75 2.5028
2.3381 0.02 100 2.5017
2.2772 0.02 125 2.4983
2.2464 0.03 150 2.4843
2.2732 0.03 175 2.4808
2.3294 0.03 200 2.4697
2.1752 0.04 225 2.4677
2.3093 0.04 250 2.4660
2.3592 0.05 275 2.4681
2.3321 0.05 300 2.4595
2.2232 0.05 325 2.4572
2.2089 0.06 350 2.4553
2.204 0.06 375 2.4508
2.2677 0.07 400 2.4514
2.2544 0.07 425 2.4482
2.2969 0.08 450 2.4442
2.3415 0.08 475 2.4489
2.3428 0.08 500 2.4489
2.2938 0.09 525 2.4393
2.3459 0.09 550 2.4389
2.2487 0.1 575 2.4457
2.197 0.1 600 2.4433
2.272 0.11 625 2.4396
2.2425 0.11 650 2.4367
2.2543 0.11 675 2.4387
2.2598 0.12 700 2.4352
2.2381 0.12 725 2.4408
2.3656 0.13 750 2.4307
2.352 0.13 775 2.4299
2.1816 0.13 800 2.4344
2.24 0.14 825 2.4305
2.3039 0.14 850 2.4245
2.3169 0.15 875 2.4318
2.184 0.15 900 2.4287
2.2618 0.16 925 2.4308
2.2207 0.16 950 2.4327
2.2786 0.16 975 2.4244
2.3708 0.17 1000 2.4275
2.3165 0.17 1025 2.4286
2.2927 0.18 1050 2.4272
2.2849 0.18 1075 2.4297
2.2898 0.19 1100 2.4294
2.3798 0.19 1125 2.4188
2.4131 0.19 1150 2.4314
2.1314 0.2 1175 2.4265
2.3814 0.2 1200 2.4254
2.2761 0.21 1225 2.4238
2.2327 0.21 1250 2.4327
2.2236 0.22 1275 2.4245
2.2343 0.22 1300 2.4280
2.265 0.22 1325 2.4186
2.1813 0.23 1350 2.4303
2.2276 0.23 1375 2.4231
2.2444 0.24 1400 2.4234
2.3472 0.24 1425 2.4225
2.3111 0.24 1450 2.4240
2.3111 0.25 1475 2.4288
2.3205 0.25 1500 2.4291
2.3389 0.26 1525 2.4234
2.2517 0.26 1550 2.4255
2.3416 0.27 1575 2.4245
2.1858 0.27 1600 2.4184
2.1582 0.27 1625 2.4182
2.1512 0.28 1650 2.4246
2.248 0.28 1675 2.4253
2.2535 0.29 1700 2.4246
2.3005 0.29 1725 2.4195
2.2144 0.3 1750 2.4236
2.198 0.3 1775 2.4237
2.1911 0.3 1800 2.4203
2.2513 0.31 1825 2.4250
2.2442 0.31 1850 2.4231
2.2877 0.32 1875 2.4239
2.3341 0.32 1900 2.4187
2.2493 0.32 1925 2.4262
2.2687 0.33 1950 2.4222
2.2674 0.33 1975 2.4200
2.2928 0.34 2000 2.4126
2.2556 0.34 2025 2.4283
2.1929 0.35 2050 2.4195
2.1952 0.35 2075 2.4249
2.2114 0.35 2100 2.4234
2.2207 0.36 2125 2.4223
2.3071 0.36 2150 2.4223
2.2019 0.37 2175 2.4152
2.2224 0.37 2200 2.4230
2.1832 0.38 2225 2.4188
2.291 0.38 2250 2.4179
2.228 0.38 2275 2.4234
2.1592 0.39 2300 2.4178
2.2529 0.39 2325 2.4169
2.1175 0.4 2350 2.4169
2.3012 0.4 2375 2.4243
2.2626 0.4 2400 2.4165
2.1595 0.41 2425 2.4215
2.2097 0.41 2450 2.4179
2.2954 0.42 2475 2.4183
2.2535 0.42 2500 2.4167
2.2211 0.43 2525 2.4181
2.2505 0.43 2550 2.4264
2.1676 0.43 2575 2.4108
2.1906 0.44 2600 2.4152
2.2112 0.44 2625 2.4152
2.2729 0.45 2650 2.4147
2.2493 0.45 2675 2.4228
2.2266 0.46 2700 2.4186
2.2447 0.46 2725 2.4186
2.2216 0.46 2750 2.4132
2.3827 0.47 2775 2.4202
2.3067 0.47 2800 2.4126
2.1683 0.48 2825 2.4149
2.1962 0.48 2850 2.4131
2.2222 0.48 2875 2.4154
2.3168 0.49 2900 2.4141
2.2526 0.49 2925 2.4142
2.3378 0.5 2950 2.4183
2.2296 0.5 2975 2.4125
2.2563 0.51 3000 2.4137
2.3374 0.51 3025 2.4189
2.1736 0.51 3050 2.4094
2.3238 0.52 3075 2.4124
2.2334 0.52 3100 2.4152
2.3054 0.53 3125 2.4113
2.3322 0.53 3150 2.4123
2.2122 0.54 3175 2.4139
2.3256 0.54 3200 2.4085
2.2293 0.54 3225 2.4141
2.2341 0.55 3250 2.4148
2.2464 0.55 3275 2.4169
2.2551 0.56 3300 2.4115
2.3158 0.56 3325 2.4185
2.2789 0.56 3350 2.4138
2.3503 0.57 3375 2.4213
2.3434 0.57 3400 2.4154
2.3048 0.58 3425 2.4161
2.259 0.58 3450 2.4166
2.219 0.59 3475 2.4117
2.1541 0.59 3500 2.4193
2.2086 0.59 3525 2.4143
2.1673 0.6 3550 2.4184
2.1865 0.6 3575 2.4197
2.2537 0.61 3600 2.4141
2.2065 0.61 3625 2.4174
2.159 0.62 3650 2.4147
2.3402 0.62 3675 2.4175
2.2399 0.62 3700 2.4181
2.3507 0.63 3725 2.4153
2.2658 0.63 3750 2.4170
2.3211 0.64 3775 2.4088
2.2072 0.64 3800 2.4126
2.2433 0.65 3825 2.4160
2.225 0.65 3850 2.4088
2.1458 0.65 3875 2.4121
2.3704 0.66 3900 2.4097
2.2315 0.66 3925 2.4092
2.2295 0.67 3950 2.4141
2.2763 0.67 3975 2.4149
2.217 0.67 4000 2.4139
2.2287 0.68 4025 2.4113
2.2748 0.68 4050 2.4077
2.1584 0.69 4075 2.4121
2.2214 0.69 4100 2.4166
2.3557 0.7 4125 2.4076
2.2453 0.7 4150 2.4151
2.2167 0.7 4175 2.4140
2.3674 0.71 4200 2.4119
2.2979 0.71 4225 2.4146
2.2178 0.72 4250 2.4152
2.2091 0.72 4275 2.4101
2.3138 0.73 4300 2.4104
2.2504 0.73 4325 2.4136
2.2348 0.73 4350 2.4150
2.2141 0.74 4375 2.4174
2.1284 0.74 4400 2.4094
2.2926 0.75 4425 2.4178
2.1642 0.75 4450 2.4102
2.2263 0.75 4475 2.4196
2.3722 0.76 4500 2.4099
2.1992 0.76 4525 2.4114
2.2651 0.77 4550 2.4149
2.289 0.77 4575 2.4078
2.2911 0.78 4600 2.4073
2.2206 0.78 4625 2.4061
2.1851 0.78 4650 2.4094
2.2674 0.79 4675 2.4064
2.2032 0.79 4700 2.4055
2.1522 0.8 4725 2.4138
2.3039 0.8 4750 2.4096
2.2066 0.81 4775 2.4122
2.2193 0.81 4800 2.4156
2.2599 0.81 4825 2.4098
2.2994 0.82 4850 2.4053
2.2463 0.82 4875 2.4052
2.1318 0.83 4900 2.4072
2.1696 0.83 4925 2.4086
2.2104 0.83 4950 2.4082
2.3455 0.84 4975 2.4070
2.165 0.84 5000 2.4092
2.2742 0.85 5025 2.4096
2.3341 0.85 5050 2.4103
2.2294 0.86 5075 2.4082
2.2256 0.86 5100 2.4136
2.1586 0.86 5125 2.4132
2.2623 0.87 5150 2.4126
2.2405 0.87 5175 2.4120
2.1848 0.88 5200 2.4158
2.216 0.88 5225 2.4126
2.2648 0.89 5250 2.4093
2.2928 0.89 5275 2.4100
2.2365 0.89 5300 2.4081
2.1913 0.9 5325 2.4041
2.1835 0.9 5350 2.4097
2.2158 0.91 5375 2.4083
2.2001 0.91 5400 2.4067
2.2133 0.91 5425 2.4122
2.2104 0.92 5450 2.4169
2.3368 0.92 5475 2.4124
2.2057 0.93 5500 2.4108
2.1003 0.93 5525 2.4058
2.1589 0.94 5550 2.4154
2.1885 0.94 5575 2.4058
2.2291 0.94 5600 2.4113
2.2688 0.95 5625 2.4097
2.3387 0.95 5650 2.4123
2.2701 0.96 5675 2.4108
2.2732 0.96 5700 2.4070
2.2823 0.97 5725 2.4057
2.2029 0.97 5750 2.4096
2.2392 0.97 5775 2.4099
2.1963 0.98 5800 2.4165
2.2922 0.98 5825 2.4105
2.1884 0.99 5850 2.4119
2.2883 0.99 5875 2.4087
2.3162 1.0 5900 2.4069
2.2246 1.0 5925 2.4028
2.2586 1.0 5950 2.4107
2.1367 1.01 5975 2.4095
2.2341 1.01 6000 2.4152
2.2638 1.02 6025 2.4048
2.1898 1.02 6050 2.4097
2.1071 1.02 6075 2.4133
2.2763 1.03 6100 2.4056
2.159 1.03 6125 2.4060
2.2005 1.04 6150 2.4111
2.3398 1.04 6175 2.4146
2.2017 1.05 6200 2.4085
2.202 1.05 6225 2.4093
2.1532 1.05 6250 2.4086
2.1735 1.06 6275 2.4106
2.1104 1.06 6300 2.4105
2.2282 1.07 6325 2.4117
2.2969 1.07 6350 2.4063
2.2284 1.08 6375 2.4044
2.2823 1.08 6400 2.4114
2.1878 1.08 6425 2.4115
2.3074 1.09 6450 2.4090
2.238 1.09 6475 2.4104
2.2031 1.1 6500 2.4075
2.1617 1.1 6525 2.4113
2.1508 1.1 6550 2.4047
2.1803 1.11 6575 2.4170
2.2613 1.11 6600 2.4116
2.1954 1.12 6625 2.4092
2.3341 1.12 6650 2.4116
2.2835 1.13 6675 2.4058
2.2413 1.13 6700 2.4150
2.32 1.13 6725 2.4130
2.2163 1.14 6750 2.4042
2.3013 1.14 6775 2.4119
2.2821 1.15 6800 2.4124
2.1525 1.15 6825 2.4123
2.2313 1.16 6850 2.4108
2.1835 1.16 6875 2.4084
2.2945 1.16 6900 2.4134
2.233 1.17 6925 2.4033
2.3066 1.17 6950 2.4069
2.3245 1.18 6975 2.4074
2.1988 1.18 7000 2.4095
2.1995 1.18 7025 2.4101
2.2988 1.19 7050 2.4085
2.1385 1.19 7075 2.4079
2.2207 1.2 7100 2.3976
2.1971 1.2 7125 2.4097
2.2652 1.21 7150 2.4052
2.1848 1.21 7175 2.4023
2.2584 1.21 7200 2.4040
2.2193 1.22 7225 2.4069
2.2586 1.22 7250 2.3954
2.2102 1.23 7275 2.4041
2.2741 1.23 7300 2.3994
2.2261 1.24 7325 2.3986
2.2745 1.24 7350 2.3970
2.2266 1.24 7375 2.4001
2.2462 1.25 7400 2.4028
2.2968 1.25 7425 2.3983
2.1915 1.26 7450 2.3978
2.2201 1.26 7475 2.3957
2.126 1.26 7500 2.3922
2.2625 1.27 7525 2.4001
2.24 1.27 7550 2.3976
2.2113 1.28 7575 2.4051
2.1994 1.28 7600 2.4024
2.2568 1.29 7625 2.3984
2.243 1.29 7650 2.4095
2.2187 1.29 7675 2.4072
2.1955 1.3 7700 2.4030
2.2341 1.3 7725 2.3987
2.3218 1.31 7750 2.3983
2.1958 1.31 7775 2.3980
2.222 1.32 7800 2.4046
2.2631 1.32 7825 2.3974
2.1505 1.32 7850 2.3952
2.1824 1.33 7875 2.3976
2.2468 1.33 7900 2.4025
2.1383 1.34 7925 2.3926
2.0483 1.34 7950 2.3984
2.32 1.34 7975 2.3971
2.3582 1.35 8000 2.3988
2.2773 1.35 8025 2.3919
2.2302 1.36 8050 2.4016
2.152 1.36 8075 2.3958
2.2021 1.37 8100 2.4047
2.2351 1.37 8125 2.4041
2.1452 1.37 8150 2.4009
2.2575 1.38 8175 2.4004
2.1978 1.38 8200 2.3994
2.2648 1.39 8225 2.3982
2.2322 1.39 8250 2.3990
2.2488 1.4 8275 2.3997
2.2343 1.4 8300 2.3982
2.2011 1.4 8325 2.4020
2.2347 1.41 8350 2.3990
2.2446 1.41 8375 2.4003
2.2258 1.42 8400 2.4069
2.1781 1.42 8425 2.4104
2.3193 1.43 8450 2.4069
2.2015 1.43 8475 2.3985
2.2139 1.43 8500 2.3998
2.2006 1.44 8525 2.3986
2.2181 1.44 8550 2.4072
2.3598 1.45 8575 2.4098
2.3421 1.45 8600 2.4073
2.2152 1.45 8625 2.4090
2.2308 1.46 8650 2.4059
2.1773 1.46 8675 2.4078
2.2713 1.47 8700 2.4028
2.2826 1.47 8725 2.4051
2.2942 1.48 8750 2.4051
2.1512 1.48 8775 2.3998
2.1678 1.48 8800 2.4036
2.1948 1.49 8825 2.4052
2.1395 1.49 8850 2.3990
2.1999 1.5 8875 2.4053
2.2187 1.5 8900 2.4014
2.2549 1.51 8925 2.4035
2.1782 1.51 8950 2.4066
2.2073 1.51 8975 2.4083
2.1925 1.52 9000 2.3987
2.2846 1.52 9025 2.4008
2.1969 1.53 9050 2.4071
2.2831 1.53 9075 2.4040
2.3457 1.53 9100 2.4057
2.2346 1.54 9125 2.4002
2.2253 1.54 9150 2.4078
2.3162 1.55 9175 2.3958
2.2181 1.55 9200 2.4020
2.1335 1.56 9225 2.4077
2.2222 1.56 9250 2.4029
2.118 1.56 9275 2.4011
2.1778 1.57 9300 2.4068
2.1706 1.57 9325 2.4020
2.2519 1.58 9350 2.3994
2.1389 1.58 9375 2.4033
2.3475 1.59 9400 2.4030
2.2375 1.59 9425 2.4060
2.1758 1.59 9450 2.4113
2.2083 1.6 9475 2.4064
2.2299 1.6 9500 2.4085
2.1834 1.61 9525 2.4042
2.1631 1.61 9550 2.4086
2.3827 1.61 9575 2.4068
2.181 1.62 9600 2.4083
2.2252 1.62 9625 2.4039
2.2509 1.63 9650 2.4104
2.2198 1.63 9675 2.4096
2.2605 1.64 9700 2.4149
2.2177 1.64 9725 2.4067
2.0864 1.64 9750 2.4106
2.1742 1.65 9775 2.4012
2.254 1.65 9800 2.4116
2.2758 1.66 9825 2.4114
2.1822 1.66 9850 2.4149
2.2293 1.67 9875 2.4034
2.2322 1.67 9900 2.4086
2.2173 1.67 9925 2.4115
2.1781 1.68 9950 2.3963
2.2739 1.68 9975 2.4091
2.1899 1.69 10000 2.4050

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Inference API (serverless) does not yet support model repos that contain custom code.

Finetuned from