add remote code and hf-format "pytorch_model.bin"

#20

Modified by chuhac for a timm-free implementation
Model can be directly imported with from_pretrained and trust_remote_code = True in the huggingface format
Diff from HF CLIP Implementation:

  1. pre-norm instead of post-norm in Vision Tower (the original implementation is right but the module registration order is misleading)
  2. CLS Pooling with MLP in Text Tower
  3. Remove pre norm in Vision Tower
  4. CNN bias in Vision Tower
  5. Change layer_norm eps from 1e-5 to 1e-12, which introduce a little numerical variations (1e-5 level)

Could you please give some instructions about how to use this? This PR has not been merged, so we cannot directly download your modification.

chuhac changed pull request status to closed

Sign up or log in to comment