timm
/

Image Classification
timm
PyTorch
Safetensors

Timm Models as AutoModel

#1
by mikehemberger - opened

Hi there,
Love your work 👍🏻
Are there any plans to integrate timm more into the hf framework and from_pretrained()?
I’m especially interested in those fine tuned models on the iNaturalist dataset.
All the best,
Mike

PyTorch Image Models org

@mikehemberger I'd say 'transformers framework' as timm and any other library that integrates with the hub is integrated with the broader HF ecosystem but in their own ways. There are timm equivalents for from_pretrained(), etc but understand many users would prefer a consistent interface.

This has been discussed in the past, it would be possible to wrap timm models. Probably 90% coverage of functionality with transformers only wrapper and timm as is, and I could prioritize adding some functionality on the timm side to close the gap and make timm wrapped models in transformers almost indistinguishable from native transformers vision models. However, this was shot down in the past so it'd be a matter of convincing transformers maintainers. @lysandre @ArthurZ

Thank you for clarifying this @rwightman .
Indeed, I do appreciate the interface similarities and will continue to build with those. Code turns out to be nagging me somehow, using hf-datasets, PyTorch for dataloading (ok, off-topic) and timm for feature/embedding extraction besides the AutoModel option.
While I expected that this is not a priority I’m a bit puzzled about the lack of a long-term „solution“ in the making.
Maybe @ArthurZ or @lysandre can share their thoughts or link to some previous discussion?
All the best,
M

Replacing the colon („:“) in the string here would move in the right direction :) :
IMG_4979.jpeg

PyTorch Image Models org
edited Dec 5, 2023

If it's a help, I could plan to add functionality timm side to make it a bit more transformers like...

  1. make a timm.AutoModel / AutoModelForImageClassification that'd wrap timm create_model and append the hf_hub: scheme (it's modeled after URI ) so it'd make timm.AutoModel.from_pretrained('timm/eva02...') possible. The builtin model configs in timm are code based, not hub based, so for bwd compat the hf_hub: scheme was added to specify to source from hub. Aside from AutoModel, another method could also be added to assume hub based, like timm.from_pretrained()
  2. make a AutoImageProcessor that wraps the appropriate transforms
  3. make a Pipeline equivalent for feat extraction and image classification

I feel doing it transformers side so that timm is wrapped makes for a more seemless experience though. The above would also not address any of the model API differences which I feel are more significant, like output_hidden_states, etc. That would require wrapping the model and the mentioned adding support in timm.

Thanks for giving me a better idea of the actual complexity of such a task.
I think point 1 is a bit too advanced for me. But I like idea number 2 and can see how this would smooth over some of the edges mentioned. Point 3 sounds great, too!

I didnt think about the output_hidden_states yet. You’re right there, too. Would it help if I bring this up on timm/transformers github? Maybe constrain the request towards the transformers architectures? Let me know @rwightman

PyTorch Image Models org

2 would build on 1, 3 on both, etc. I think for now I could make a timm.from_pretrained() that mirrors AutoModel.from_pretrained() in behaviour. It's still fitting the timm design but making it a bit more 'Hub first' for those who don't care about builtin/offline configs (pretrained=False)

Mirroring Auto* interfaces in timm would be a bigger change, and I feel it makes more sense to do it transformers side by wrapping timm for a tighter integration, but this needs buy in from transformers team.

Is there an example of a wrapper for another library within transformers? I’d be happy to check it out and - if I can - draw some insights into how this could be achieved?
In the meantime I think I can draw from my code to provide an illustration of those nagging interface differences

PyTorch Image Models org

Someone had a sketch proposal here https://github.com/huggingface/transformers/issues/25282#issuecomment-1664960742 and looks like it got fleshed out into a full impl in their own codebase here https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_timm.py

Thanks! I will link that up :)

Sign up or log in to comment