Incorrect Frame Number Despite Correct Frame Image in generate_video_frame()

Issue Description

When generating frames in the generate_video_frame function using OpenCV's cv2.VideoCapture, the reported VideoFrame.frame_number does not always correspond to the actual frame shown in the image. This results in mismatches between detected slide transitions and their true position in the video timeline.

Steps to Reproduce

Use the lecture video dbwt1_01.mp4 and the corresponding PDF slide.
Run detect_slide_transitions()
Set frame_step to 1 when using generate_video_frame() to get an accurate result.
When the second slide is detected, log the reported VideoFrame.frame_number and visualize the VideoFrame.full_frame using cv2.imshow(...), or show_image_resized(...)
Compare the reported frame number to a ground-truth position identified in frame accurate tools such as DaVinci Resolve.

Expected Behavior

The VideoFrame.frame_number should match the exact frame index at which the image was retrieved.
For example, the second slide should be detected at frame 5281, and the content of the frame should visually match the correct slide.

Actual Behavior

The frame number is incorrectly reported (e.g. for dbwt1_01 4545) while the image itself is correct and clearly shows the second slide.
This discrepancy leads to incorrect timestamping, even though the image is correct.

Root Cause (Hypothesized)

This issue likely arises due to how OpenCV handles frame seeking internally:

When calling video_capture.set(cv2.CAP_PROP_POS_FRAMES, N), OpenCV does not guarantee that frame N is returned.
Instead, it seeks to the nearest preceding keyframe, then decodes forward until it reaches the next displayable frame.
The decoded image may correspond to the requested frame visually, but the reported position via .get(cv2.CAP_PROP_POS_FRAMES) may reflect the last keyframe index, not the actual frame.

This causes a mismatch between what is shown and what is reported.

Potential Solutions

1. Avoid Frame Seeking with `.set(...)` in Loops

OpenCV’s video_capture.set(cv2.CAP_PROP_POS_FRAMES, N) does not guarantee accurate frame access.
A more reliable approach may be to iterate over frames sequentially and manually skip frames using repeated .read() calls. This avoids decoder inconsistencies and improves frame number tracking.
This may be slower because each frame must be read in sequence, which prevents skipping directly to a specific one.

2. Use dedicated libary for Frame Decoding

OpenCV is not optimized for precise video decoding or seeking.
Dedicated libraries like PyAV offer more accurate frame control and metadata access.
May offer more reliable frame indexing.

Edited Jul 11, 2025 by Dyar Jankir