Regex to remove iou scores in infer.json
},
"logits": {
"iou_scores": [
0.95166015625,
0.94873046875,
0.82177734375
]
}
,\n\s*"logits": \{\n\s*"iou_scores":\s*\[\n\s*([\d.]+)\s*,\n\s*([\d.]+)\s*,\n\s*([\d.]+)\n\s*\]\n\s*\}
List of captioner models
Salesforce/blip-image-captioning-large Salesforce/blip-image-captioning-base
Salesforce/blip2-opt-2.7b Salesforce/blip2-opt-6.7b-coco Salesforce/blip2-opt-6.7b Salesforce/blip2-opt-2.7b-coco
microsoft/git-large-coco microsoft/git-large-textcaps microsoft/git-base microsoft/git-base-coco microsoft/git-base-textcaps microsoft/git-large microsoft/git-large-r microsoft/git-large-r-coco microsoft/git-large-r-textcaps
for model in \
Salesforce/blip2-opt-2.7b \
Salesforce/blip2-opt-2.7b-coco \
Salesforce/blip2-opt-6.7b \
Salesforce/blip2-opt-6.7b-coco
do
python \
-m src.train \
train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \
+model=base_sam_captioner \
training.do_train=False \
training.do_eval=False \
training.do_inference=True \
+data.streaming=False \
training.fp16=True \
training.output_dir=tmp/sam_captioner/$model \
training.dataloader_num_workers=4 \
model.captioner_model_name_or_path=$model
done
The process of batch generation of language model
transformers/generation/utils.py:GenerationMixin:generate
Chunckified inference
Regional chunk size is set to 16
SAM Model | Captioner | fp16 | region chunk size | Memory (GB) | Speed (s/it) |
---|---|---|---|---|---|
ViT-huge | Salesforce/blip-image-captioning-base | Yes | 16 | ~ 9 | ~ 5.02 |
ViT-huge | Salesforce/blip-image-captioning-base | No | 16 | ~ 8 | ~ 8.29 |
ViT-huge | Salesforce/blip-image-captioning-large | Yes | 16 | ~ 10 | ~ 6.28 |
ViT-huge | Salesforce/blip-image-captioning-large | No | 16 | ~ 9.7 | ~ 14.99 |
ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 16 | ~ 34 | ~ 5.82 |
ViT-huge | Salesforce/blip2-opt-2.7b | No | 16 | ~ 32 | ~ 18.19 |
ViT-huge | Salesforce/blip2-opt-2.7b | Yes | 4 | ~ 34 | ~ 11.56 |
ViT-huge | microsoft/git-large-coco | Yes | 16 | ~ 14 | ~ 7.06 |
ViT-huge | microsoft/git-base-coco | Yes | 16 | ~ 12 | ~ 3.26 |
Bugs in SAM batch inference when transformers<=4.30.2
Remember to update the requirements.txt
file. Otherwise we should always set batch_size=1.
Here is the fixing pr which was merged already after version 4.30.2: https://github.com/huggingface/transformers/pull/25074
Debug the distributed training
Inside the trainer, we can access the main process by:
if args.local_process_index == 0:
breakpoint()
torch.distributed.barrier()
# the problematic line
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
try-catch
does not trigger the pdb interface:
try:
# the problematic line
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
except Error as e:
if args.local_process_index == 0:
breakpoint()
finally:
torch.distributed.barrier()
Amulet T4 instance is maintained into wrong information about the number of GPUs
Wrong T4 instance information is maintained by singularity, where Standard_NC{4,8,16,32}as_T4_v3
only have 1, 1, 1, and 2 GPUs separately, but they are showed to have 1, 2, 4, and 4 GPUs separately.
in amlt/helpers/sing_instances.py
, we add the below code:
# add at 377, in amlt/helpers/sing_instances.py:fetch_instances_for_series
# NOTE(xiaoke): Fix T4 wrong number of GPU
if accelerator == "T4":
instance_name_to_num_gpu = {
"NC8as_T4_v3": ["2", "1"],
"NC16as_T4_v3": ["4", "1"],
"NC32as_T4_v3": ["4", "2"],
}
if instance_name in instance_name_to_num_gpu:
description = description.replace(f"GPU x {instance_name_to_num_gpu[instance_name][0]}", f"GPU x {instance_name_to_num_gpu[instance_name][1]}")
info = re.search(match, description)
Note that we need to print the instance out explicitly. Sometimes we fail to get 4 cards while only get 1 card.
# add at 422, amlt/client/sing_client.py:_setup_script_run_config
print(f"instance: {job.sku.instance}, sku: {job.sku}")
How to check:
amlt cache instance-types
amlt cache instance-types -s NCast4v3
Debug the commands generated by amlt
amlt show EXP JOB
(deprecated)
# /anaconda/envs/sca-v2/lib/python3.9/site-packages/amlt/client/aml_client.py:create_context
# At the end of this function
inspect_amlt_job_dir = "tmp/amlt_job/"
try:
print(f"Copy code from {code_resource.remote_dir} to {temp_dir}.")
if os.path.exists(inspect_amlt_job_dir):
shutil.rmtree(inspect_amlt_job_dir, ignore_errors=True)
shutil.copytree(temp_dir, inspect_amlt_job_dir)
except Exception as e:
print(f"Cannot copy code from {temp_dir} to {inspect_amlt_job_dir} due to {e}")
yield temp_dir
Test tokenizer
from transformers import AutoProcessor
gpt2_large_tokenizer_cfg = dict(
pretrained_model_name_or_path="gpt2-large",
use_fast=True)
openllama_tokenizer_cfg = dict(
pretrained_model_name_or_path='openlm-research/open_llama_3b_v2',
use_fast=False)
def print_func(tokenizer, list_of_str):
print(f"{list_of_str}: {tokenizer(list_of_str)['input_ids']}")
tokenizer = AutoProcessor.from_pretrained(**gpt2_large_tokenizer_cfg)
print_func(tokenizer, ["car", "Car", "CAR"])
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])
tokenizer = AutoProcessor.from_pretrained(**openllama_tokenizer_cfg)
print_func(tokenizer, ["car", "Car", "CAR"])
print_func(tokenizer, ["tokenizer", "Tokenizer", "TOKENIZER"])