Update README.md

d2a2733 verified 12 months ago

4.46 kB

	---
	license: cc-by-nc-4.0
	base_model: MCG-NJU/videomae-base
	tags:
	- generated_from_trainer
	- vandalism
	- video-classification
	- ucf-crime
	- vandalism-dectection
	- videomae
	metrics:
	- accuracy
	model-index:
	- name: videomae-base-finetuned-ucfcrime-full2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# videomae-base-finetuned-ucfcrime-full2

	This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the [UCF-CRIME](https://paperswithcode.com/dataset/ucf-crime)
	dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.5014
	- Accuracy: 0.225

	## Model description

	More information needed

	## Intended uses & limitations

	Usage:
	```python
	import av
	import torch
	import numpy as np

	from transformers import AutoImageProcessor, VideoMAEForVideoClassification
	from huggingface_hub import hf_hub_download

	np.random.seed(0)


	def read_video_pyav(container, indices):
	'''
	Decode the video with PyAV decoder.
	Args:
	container (`av.container.input.InputContainer`): PyAV container.
	indices (`List[int]`): List of frame indices to decode.
	Returns:
	result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
	'''
	frames = []
	container.seek(0)
	start_index = indices[0]
	end_index = indices[-1]
	for i, frame in enumerate(container.decode(video=0)):
	if i > end_index:
	break
	if i >= start_index and i in indices:
	frames.append(frame)
	return np.stack([x.to_ndarray(format="rgb24") for x in frames])


	def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
	'''
	Sample a given number of frame indices from the video.
	Args:
	clip_len (`int`): Total number of frames to sample.
	frame_sample_rate (`int`): Sample every n-th frame.
	seg_len (`int`): Maximum allowed index of sample's last frame.
	Returns:
	indices (`List[int]`): List of sampled frame indices
	'''
	converted_len = int(clip_len * frame_sample_rate)
	end_idx = np.random.randint(converted_len, seg_len)
	start_idx = end_idx - converted_len
	indices = np.linspace(start_idx, end_idx, num=clip_len)
	indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
	return indices


	# video clip consists of 300 frames (10 seconds at 30 FPS)
	file_path = hf_hub_download(
	repo_id="nielsr/video-demo", filename="eating_spaghetti.mp4", repo_type="dataset"
	)
	container = av.open(file_path)

	# sample 16 frames
	indices = sample_frame_indices(clip_len=16, frame_sample_rate=1, seg_len=container.streams.video[0].frames)
	video = read_video_pyav(container, indices)

	image_processor = AutoImageProcessor.from_pretrained("videomae-base-finetuned-ucfcrime-full")
	model = VideoMAEForVideoClassification.from_pretrained("videomae-base-finetuned-ucfcrime-full")

	inputs = image_processor(list(video), return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits

	# model predicts one of the 13 ucf-crime classes
	predicted_label = logits.argmax(-1).item()
	print(model.config.id2label[predicted_label])
	```
	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- training_steps: 700

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 2.5836 \| 0.13 \| 88 \| 2.4944 \| 0.2080 \|
	\| 2.3212 \| 1.13 \| 176 \| 2.5855 \| 0.1773 \|
	\| 2.2333 \| 2.13 \| 264 \| 2.6270 \| 0.1046 \|
	\| 1.985 \| 3.13 \| 352 \| 2.4058 \| 0.2109 \|
	\| 2.194 \| 4.13 \| 440 \| 2.3654 \| 0.2235 \|
	\| 1.9796 \| 5.13 \| 528 \| 2.2609 \| 0.2235 \|
	\| 1.8786 \| 6.13 \| 616 \| 2.2725 \| 0.2341 \|
	\| 1.71 \| 7.12 \| 700 \| 2.2228 \| 0.2226 \|


	### Framework versions

	- Transformers 4.38.1
	- Pytorch 2.1.2
	- Datasets 2.1.0
	- Tokenizers 0.15.2