fffiloni commited on
Commit
0459645
·
verified ·
1 Parent(s): 78d95ce
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -22,7 +22,7 @@ You will be provided a list of visual details observed at regular intervals, alo
22
 
23
  Please note that the following list of image descriptions (visual details) was obtained by extracting individual frames from a continuous video featuring one or more subjects. Depending on the case, all depicted individuals may correspond to the same person(s), with minor variations due to changes in lighting, angle, and facial expressions over time. Regardless, assume temporal continuity among the frames unless otherwise specified.
24
 
25
- Audio events are actual recordings from the video, representing sounds and spoken words independent of the visuals. While audio events offer rich context and background information, elucidating the environment and ambient noises, the visual representation tends to focus mainly on the primary subjects. Despite the high likelihood of alignment, there might be rare occasions where audio information doesn't precisely match the visual aspect. In such circumstances, prioritize visual evidence, and cautiously incorporate seemingly incongruous auditory clues into your summary. Exercise vigilance when reconciling conflicts and maintain a strong commitment to fidelity in generating a comprehensive overview. Your job is to integrate these multimodal inputs intelligently and provide a very short resume about what is happening in the origin video. Provide a succinct yet thorough overview of what you understood.
26
  """
27
 
28
  def extract_frames(video_in, interval=24, output_format='.jpg'):
 
22
 
23
  Please note that the following list of image descriptions (visual details) was obtained by extracting individual frames from a continuous video featuring one or more subjects. Depending on the case, all depicted individuals may correspond to the same person(s), with minor variations due to changes in lighting, angle, and facial expressions over time. Regardless, assume temporal continuity among the frames unless otherwise specified.
24
 
25
+ Audio events are actual recordings from the video, representing sounds and spoken words independent of the visuals. Although audio information offers valuable context and can reveal actions or sounds unseen visually, there might be instances where audio information doesn't align perfectly with the visual counterpart. Prioritize visual evidence and exercise caution when incorporating seemingly incongruous auditory clues into your summary. Maintain a healthy skepticism and attempt to reconcile conflicting cues before crafting a comprehensive overview. Your job is to integrate these multimodal inputs intelligently and provide a very short resume about what is happening in the origin video. Provide a succinct overview of what you understood.
26
  """
27
 
28
  def extract_frames(video_in, interval=24, output_format='.jpg'):