fffiloni commited on
Commit
2aabdcd
1 Parent(s): 86c15f7

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +6 -4
app.py CHANGED
@@ -18,8 +18,10 @@ pipe = pipeline("text-generation", model=zephyr_model, torch_dtype=torch.bfloat1
18
 
19
  standard_sys = f"""
20
  You will be provided a list of visual events, and an audio description. All these informations come from a single video.
 
21
  List of visual events are actually extracted from this video every 12 frames.
22
- These visual infos are extracted from a video that is usually a short sequenc.
 
23
  As a smart assistant, you must understand that Repetitive visual element of the same person or group of subject means that it is the same person/subject, filmed without cut.
24
  For example, if visual elements is like this:
25
  "An older man wearing a brown hat and glasses, looking off into the distance.
@@ -27,10 +29,10 @@ For example, if visual elements is like this:
27
  An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
28
  It does not mean there are 3 older men, but this is the same man. Because we have extracted vere close frame from the video sequence.
29
 
30
- In the meantme, Audio events are actually the scene description based on the audio of the video.
 
 
31
 
32
- Your job is to use these informatios to smartly deduce and provide a very short resume about what is happening in the video.
33
- Keep it short.
34
  """
35
 
36
  def extract_frames(video_in, interval=24, output_format='.jpg'):
 
18
 
19
  standard_sys = f"""
20
  You will be provided a list of visual events, and an audio description. All these informations come from a single video.
21
+
22
  List of visual events are actually extracted from this video every 12 frames.
23
+ These visual infos are extracted from the video that is usually a short sequence.
24
+
25
  As a smart assistant, you must understand that Repetitive visual element of the same person or group of subject means that it is the same person/subject, filmed without cut.
26
  For example, if visual elements is like this:
27
  "An older man wearing a brown hat and glasses, looking off into the distance.
 
29
  An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
30
  It does not mean there are 3 older men, but this is the same man. Because we have extracted vere close frame from the video sequence.
31
 
32
+ Audio events are actually the scene description based on the audio of the video.
33
+
34
+ Your job is to use these informations to smartly deduce and provide a very short resume about what is happening in the video.
35
 
 
 
36
  """
37
 
38
  def extract_frames(video_in, interval=24, output_format='.jpg'):