Incorrect Frame Number Despite Correct Frame Image in generate_video_frame()
Issue Description
When generating frames in the generate_video_frame
function using OpenCV's cv2.VideoCapture
, the reported VideoFrame.frame_number
does not always correspond to the actual frame shown in the image. This results in mismatches between detected slide transitions and their true position in the video timeline.
Steps to Reproduce
- Use the lecture video
dbwt1_01.mp4
and the corresponding PDF slide. - Run
detect_slide_transitions()
- Set frame_step to 1 when using
generate_video_frame()
to get an accurate result. - When the second slide is detected, log the reported
VideoFrame.frame_number
and visualize theVideoFrame.full_frame
usingcv2.imshow(...)
, orshow_image_resized(...)
- Compare the reported frame number to a ground-truth position identified in frame accurate tools such as DaVinci Resolve.
Expected Behavior
- The
VideoFrame.frame_number
should match the exact frame index at which the image was retrieved. - For example, the second slide should be detected at frame 5281, and the content of the frame should visually match the correct slide.
Actual Behavior
- The frame number is incorrectly reported (e.g. for
dbwt1_01
4545) while the image itself is correct and clearly shows the second slide. - This discrepancy leads to incorrect timestamping, even though the image is correct.
Root Cause (Hypothesized)
This issue likely arises due to how OpenCV handles frame seeking internally:
- When calling
video_capture.set(cv2.CAP_PROP_POS_FRAMES, N)
, OpenCV does not guarantee that frameN
is returned. - Instead, it seeks to the nearest preceding keyframe, then decodes forward until it reaches the next displayable frame.
- The decoded image may correspond to the requested frame visually, but the reported position via
.get(cv2.CAP_PROP_POS_FRAMES)
may reflect the last keyframe index, not the actual frame.
This causes a mismatch between what is shown and what is reported.
Potential Solutions
.set(...)
in Loops
1. Avoid Frame Seeking with - OpenCV’s
video_capture.set(cv2.CAP_PROP_POS_FRAMES, N)
does not guarantee accurate frame access. - A more reliable approach may be to iterate over frames sequentially and manually skip frames using repeated
.read()
calls. This avoids decoder inconsistencies and improves frame number tracking. - This may be slower because each frame must be read in sequence, which prevents skipping directly to a specific one.
2. Use dedicated libary for Frame Decoding
- OpenCV is not optimized for precise video decoding or seeking.
- Dedicated libraries like PyAV offer more accurate frame control and metadata access.
- May offer more reliable frame indexing.
Edited by Dyar Jankir