flan-t5-xl (16bit) outperform flan-t5-xxl (8bit)

#14
by dzmltzack - opened

Hello , i notice a significant performance drop since the update to flan-t5-xxl ( in 8bit ) ,

The performance drop is in term of following instructions, it's does answer some questions better since it's has more data but doesn't follow the instruction well

You can see the results of 2 of same prompt in the new flan-t5-xxl ( in 8bit ) and the ( flan-t5-xl 16bfloat ) ( running on colab currently )
And how the flan-t5-xl (16bfloat ) is doing better following the instruction

Example 1 :
( flan-t5-xl 16bfloat )
Screenshot from 2023-02-04 00-33-40.png
flan-t5-xxl ( in 8bit )
Screenshot from 2023-02-04 00-33-28.png

Example 2:
( flan-t5-xl 16bfloat )
Screenshot from 2023-02-04 00-34-55.png

flan-t5-xxl ( in 8bit )
Screenshot from 2023-02-04 00-35-11.png

i have run tests with about 30 more examples , the result for flan-t5-xl 16bfloat ( 27 Correct ) and for flan-t5-xxl ( in 8bit ) ( 12 Correct )

same when i try multiple prompts

Summarize the following text: Peter and Elizabeth took a taxi to attend the night party in the city. While in the party, Elizabeth collapsed and was rushed to the hospital. Since she was diagnosed with a brain injury, the doctor told Peter to stay besides her until she gets well. Therefore, Peter stayed with her at the hospital for 3 days without leaving.

flan-t5-xl ( 16bfloat )
Peter was a good husband.

flan-t5-xxl ( in 8bit )
Peter stayed with Elizabeth at the hospital for 3 days.

Interesting, I noticed for a lot of common reasoning, the xxl seemed to do better.

Interesting, I noticed for a lot of common reasoning, the xxl seemed to do better.

Would you mind sharing some examples? I believe the xxl on 8bit will do better only on having more information i can't make it work better then bloom now for little complex information, while i could achieve amazing results with xl on 32bit

Sign up or log in to comment