[4., 0.5. 2.] + q / k values trainable + 48 effective batch size Make your batch size as large as possible in order to get better results. Avoiding all_modules trainable keeps some level of intelligence.
[4., 0.5. 2.] + q / k values trainable + 48 effective batch size Make your batch size as large as possible in order to get better results. Avoiding all_modules trainable keeps some level of intelligence.