--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards {} --- # Vec2Vec GEO hg38 ## Model Details ### Model Description This is a Vec2Vec model that encodes embedding vectors of natural language into embedding vectors of BED files. This model was trained with BED files and natural language metadata from [GEO](https://www.ncbi.nlm.nih.gov/geo/) data. The embedding vectors of natural language were encoded by [sentence-transformers](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2). The BED files were embedded by pretrained [Region2Vec](https://huggingface.co/databio/r2v-ChIP-atlas-hg38-v2) - **Developed by:** Ziyang "Claude" Hu - **Model type:** Vec2Vec - **BED genotype:** hg38 ### Model Sources [optional] - **Repository:** https://github.com/databio/geniml - **Paper [optional]:** N/A ## Uses This model can be used to search BED files with natural language query strings. In the search interface, the query strings will be encoded by same sentence-transformers model, and the output vector will be encoded into the final query vector by this Vec2Vec. The K BED files whose embedding vectors (embedded by same Region2Vec) are closest to the final query vector are results. It is limited to hg38. It is not recommended to use this model for data with genotype outside of hg38 ## How to Get Started with the Model You can download and start encoding new genomic region data using the following code: ```python from geniml.text2bednn import Vec2VecFNN model = Vec2VecFNN("databio/v2v-geo-hg38") ``` [More Information Needed] ## Training Details ### Training Data TODO