|
# Multilingual Colbert embeddings as a service |
|
|
|
## Goal |
|
|
|
- Deploy [Antoine Louis](https://huggingface.co/antoinelouis)' [colbert-xm](https://huggingface.co/antoinelouis/colbert-xm) as an inference service: text(s) in, vector(s) out |
|
|
|
## Motivation |
|
|
|
- use the service in a broader RAG solution |
|
|
|
## Steps followed |
|
|
|
- Clone the original repo following [this procedure](https://huggingface.co/docs/hub/repositories-next-steps#how-to-duplicate-or-fork-a-repo-including-lfs-pointers) |
|
- Add a custom handler script as described [here](https://huggingface.co/docs/inference-endpoints/guides/custom_handler) |
|
|
|
## Local development and testing |
|
|
|
### Build and start docker container hf_endpoints_emulator |
|
|
|
See [hf_endpoints_emulator](https://pypi.org/project/hf-endpoints-emulator/) |
|
|
|
````bash |
|
docker-compose up -d --build |
|
```` |
|
|
|
This can take a few moments to load, given the size of the model (> 3 GB)! |
|
|
|
## How to test locally |
|
|
|
```bash |
|
./embed_single_query.sh |
|
./embed_two_chunks.sh |
|
``` |
|
|
|
```bash |
|
docker-compose exec hf_endpoints_emulator pytest |
|
``` |
|
|
|
## Check output |
|
|
|
```bash |
|
docker-compose logs --follow hf_endpoints_emulator |
|
``` |
|
|