|
## Regex to remove iou scores in `infer.json` |
|
|
|
```json |
|
}, |
|
"logits": { |
|
"iou_scores": [ |
|
0.95166015625, |
|
0.94873046875, |
|
0.82177734375 |
|
] |
|
} |
|
``` |
|
|
|
```re |
|
,\n\s*"logits": \{\n\s*"iou_scores":\s*\[\n\s*([\d.]+)\s*,\n\s*([\d.]+)\s*,\n\s*([\d.]+)\n\s*\]\n\s*\} |
|
``` |
|
|
|
## List of captioner models |
|
|
|
Salesforce/blip-image-captioning-large |
|
Salesforce/blip-image-captioning-base |
|
|
|
Salesforce/blip2-opt-2.7b |
|
Salesforce/blip2-opt-6.7b-coco |
|
Salesforce/blip2-opt-6.7b |
|
Salesforce/blip2-opt-2.7b-coco |
|
|
|
<!-- Need prompts --> |
|
<!-- Salesforce/instructblip-vicuna-7b --> |
|
<!-- Salesforce/instructblip-vicuna-13b --> |
|
|
|
microsoft/git-large-coco |
|
microsoft/git-large-textcaps |
|
microsoft/git-base |
|
microsoft/git-base-coco |
|
microsoft/git-base-textcaps |
|
microsoft/git-large |
|
microsoft/git-large-r |
|
microsoft/git-large-r-coco |
|
microsoft/git-large-r-textcaps |
|
|
|
<!-- No official code --> |
|
<!-- laion/mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k --> |
|
<!-- laion/CoCa-ViT-B-32-laion2B-s13B-b90k --> |
|
<!-- laion/CoCa-ViT-L-14-laion2B-s13B-b90k --> |
|
<!-- laion/mscoco_finetuned_CoCa-ViT-B-32-laion2B-s13B-b90k --> |
|
|
|
```shell |
|
for model in \ |
|
Salesforce/blip2-opt-2.7b \ |
|
Salesforce/blip2-opt-2.7b-coco \ |
|
Salesforce/blip2-opt-6.7b \ |
|
Salesforce/blip2-opt-6.7b-coco |
|
do |
|
python \ |
|
-m src.train \ |
|
train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \ |
|
+model=base_sam_captioner \ |
|
training.do_train=False \ |
|
training.do_eval=False \ |
|
training.do_inference=True \ |
|
+data.streaming=False \ |
|
training.fp16=True \ |
|
training.output_dir=tmp/sam_captioner/$model \ |
|
training.dataloader_num_workers=4 \ |
|
model.captioner_model_name_or_path=$model |
|
done |
|
``` |
|
|
|
## The process of batch generation of language model |
|
|
|
`transformers/generation/utils.py:GenerationMixin:generate` |
|
|
|
|
|
|
|
## Chunckified inference |
|
|
|
Regional chunk size is set to 16 |
|
|
|
| SAM Model | Captioner | fp16 | region chunk size | Memory (GB) | Speed (s/it) | |
|
| --------- | -------------------------------------- | ---- | ----------------- | ----------- | ------------ | |
|
| ViT-huge | Salesforce/blip-image-captioning-base | Yes | 16 | ~ 9 | ~ 5.02 | |
|
| ViT-huge | Salesforce/blip-image-captioning-base | No | 16 | ~ 8 | ~ 8.29 | |
|
| ViT-huge | Salesforce/blip-image-captioning-large | Yes | 16 | ~ 10 | ~ 6.28 | |
|
| ViT-huge | Salesforce/blip-image-captioning-large | No | 16 | ~ 9.7 | ~ 14.99 | |
|
| ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 16 | ~ 34 | ~ 5.82 | |
|
| ViT-huge | Salesforce/blip2-opt-2.7b | No | 16 | ~ 32 | ~ 18.19 | |
|
| ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 4 | ~ 34 | ~ 11.56 | |
|
| ViT-huge | microsoft/git-large-coco | Yes | 16 | ~ 14 | ~ 7.06 | |
|
| ViT-huge | microsoft/git-base-coco | Yes | 16 | ~ 12 | ~ 3.26 | |
|
|
|
## Bugs in SAM batch inference when transformers<=4.30.2 |
|
|
|
Remember to update the `requirements.txt` file. Otherwise we should always set batch_size=1. |
|
|
|
Here is the fixing pr which was merged already after version 4.30.2: https://github.com/huggingface/transformers/pull/25074 |
|
|
|
## Debug the distributed training |
|
|
|
Inside the trainer, we can access the main process by: |
|
|
|
```python |
|
if args.local_process_index == 0: |
|
breakpoint() |
|
torch.distributed.barrier() |
|
# the problematic line |
|
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100) |
|
``` |
|
|
|
`try-catch` does not trigger the pdb interface: |
|
|
|
```python |
|
try: |
|
# the problematic line |
|
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100) |
|
except Error as e: |
|
if args.local_process_index == 0: |
|
breakpoint() |
|
finally: |
|
torch.distributed.barrier() |
|
``` |
|
|
|
## Amulet T4 instance is maintained into wrong information about the number of GPUs |
|
|
|
Wrong T4 instance information is maintained by singularity, where `Standard_NC{4,8,16,32}as_T4_v3` only have 1, 1, 1, and 2 GPUs separately, but they are showed to have 1, 2, 4, and 4 GPUs separately. |
|
|
|
in `amlt/helpers/sing_instances.py`, we add the below code: |
|
|
|
```python |
|
# add at 377, in amlt/helpers/sing_instances.py:fetch_instances_for_series |
|
# NOTE(xiaoke): Fix T4 wrong number of GPU |
|
if accelerator == "T4": |
|
instance_name_to_num_gpu = { |
|
"NC8as_T4_v3": ["2", "1"], |
|
"NC16as_T4_v3": ["4", "1"], |
|
"NC32as_T4_v3": ["4", "2"], |
|
} |
|
if instance_name in instance_name_to_num_gpu: |
|
description = description.replace(f"GPU x {instance_name_to_num_gpu[instance_name][0]}", f"GPU x {instance_name_to_num_gpu[instance_name][1]}") |
|
info = re.search(match, description) |
|
``` |
|
|
|
Note that we need to print the instance out explicitly. Sometimes we fail to get 4 cards while only get 1 card. |
|
|
|
```python |
|
# add at 422, amlt/client/sing_client.py:_setup_script_run_config |
|
print(f"instance: {job.sku.instance}, sku: {job.sku}") |
|
``` |
|
|
|
How to check: |
|
|
|
```shell |
|
amlt cache instance-types |
|
amlt cache instance-types -s NCast4v3 |
|
``` |
|
|
|
## Debug the commands generated by amlt |
|
|
|
``` |
|
amlt show EXP JOB |
|
``` |
|
|
|
(deprecated) |
|
```python |
|
# /anaconda/envs/sca-v2/lib/python3.9/site-packages/amlt/client/aml_client.py:create_context |
|
# At the end of this function |
|
inspect_amlt_job_dir = "tmp/amlt_job/" |
|
try: |
|
print(f"Copy code from {code_resource.remote_dir} to {temp_dir}.") |
|
if os.path.exists(inspect_amlt_job_dir): |
|
shutil.rmtree(inspect_amlt_job_dir, ignore_errors=True) |
|
shutil.copytree(temp_dir, inspect_amlt_job_dir) |
|
except Exception as e: |
|
print(f"Cannot copy code from {temp_dir} to {inspect_amlt_job_dir} due to {e}") |
|
yield temp_dir |
|
``` |
|
|
|
## Test tokenizer |
|
|
|
|
|
```python |
|
from transformers import AutoProcessor |
|
|
|
gpt2_large_tokenizer_cfg = dict( |
|
pretrained_model_name_or_path="gpt2-large", |
|
use_fast=True) |
|
|
|
openllama_tokenizer_cfg = dict( |
|
pretrained_model_name_or_path='openlm-research/open_llama_3b_v2', |
|
use_fast=False) |
|
|
|
def print_func(tokenizer, list_of_str): |
|
print(f"{list_of_str}: {tokenizer(list_of_str)['input_ids']}") |
|
|
|
tokenizer = AutoProcessor.from_pretrained(**gpt2_large_tokenizer_cfg) |
|
print_func(tokenizer, ["car", "Car", "CAR"]) |
|
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"]) |
|
|
|
tokenizer = AutoProcessor.from_pretrained(**openllama_tokenizer_cfg) |
|
print_func(tokenizer, ["car", "Car", "CAR"]) |
|
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"]) |
|
``` |