- Shot detection using PySceneDetect
- Face detection with MediaPipe
- Object detection using YOLO (Ultralytics)
- Smooth camera path optimization to avoid jitter
- Output encoded with OpenCV
-
Install Python Dependencies: The script requires Python 3.8+ and the packages listed in
requirements.txt.pip install -r requirements.txt
-
Install FFmpeg (for audio processing): FrameShift uses FFmpeg to process and include audio from the original video into the reframed output. If FFmpeg is not installed or not found in your system's PATH, the script will still process the video but the output will not contain audio (a warning will be displayed).
- Download FFmpeg: You can download FFmpeg from ffmpeg.org.
- Installation: Follow the installation instructions for your operating system. Ensure the directory containing the
ffmpeg(orffmpeg.exe) executable is added to your system's PATH environment variable. - Verify Installation: Open a new terminal/command prompt and type
ffmpeg -version. If it's installed correctly, you should see version information.
FrameShift automatically reframes videos, using a stationary (fixed) crop for each detected scene.
1. Default Behavior: Fill Frame (Pan & Scan) Converts a landscape video to portrait, filling the new 9:16 frame by cropping and centering.
python -m frameshift.main my_landscape_video.mp4 video_for_stories.mp4 --ratio 9:162. Padding with Black Bars:
If you want to see the entire optimally cropped region without parts being cut off to fill the frame, use --padding. This will add black bars by default.
python -m frameshift.main input_video.mp4 square_video_fit.mp4 --ratio 1:1 --padding3. Padding with Blurred Background: To use blurred background bars instead of black:
python -m frameshift.main input_video.mp4 output_blur_padded.mp4 --ratio 16:9 --padding --padding_type blur --blur_amount 5(Adjust --blur_amount from 0 (minimal) to 10 (maximal) for desired blur intensity.)
4. Padding with a Custom Color: To use solid green bars:
python -m frameshift.main input_video.mp4 output_color_padded.mp4 --ratio 4:3 --padding --padding_type color --padding_color_value greenOr with a specific RGB color (e.g., blue):
python -m frameshift.main input_video.mp4 output_rgb_padded.mp4 --ratio 4:3 --padding --padding_type color --padding_color_value "(0,0,255)"5. Batch Processing:
Reframes all videos in my_video_folder to 4:5, saving to output_folder, using default fill behavior.
python -m frameshift.main my_video_folder/ output_folder/ --ratio 4:5 --batch6. Running the Test Suite:
To run a predefined suite of tests using my_video.mp4 as input, saving test outputs relative to test_run_output.mp4, and generating a detailed log:
python -m frameshift.main my_video.mp4 test_run_output.mp4 --ratio 16:9 --test(Check test_run_output_test_outputs/ directory and test_run_output_test_suite.log or similar for results. If an error occurs in one test scenario, the suite will log it and continue to the next.)
FrameShift intelligently reframes videos using a stationary (fixed) crop per scene approach:
- Scene Detection: Divides the video into scenes using PySceneDetect.
- Content Analysis:
Both detection models (
yolo11n.ptandyolov11n-face.pt) are automatically downloaded to a localmodels/subdirectory if not found during the first run.- Faces: Detected using a specialized model,
yolov11n-face.pt.- This model is trained to detect one primary class:
{0: 'face'}. - If this model fails to load or operate, MediaPipe's face detection is used as a fallback.
- This model is trained to detect one primary class:
- Other Objects: Detected using the general-purpose
yolo11n.ptmodel.- This model can identify a wide range of objects from the COCO dataset. Common detectable classes include:
person,bicycle,car,motorcycle,airplane,bus,train,truck,boat,traffic light,fire hydrant,stop sign,parking meter,bench,bird,cat,dog,horse,sheep,cow,elephant,bear,zebra,giraffe,backpack,umbrella,handbag,tie,suitcase,frisbee,skis,snowboard,sports ball,kite,baseball bat,baseball glove,skateboard,surfboard,tennis racket,bottle,wine glass,cup,fork,knife,spoon,bowl,banana,apple,sandwich,orange,broccoli,carrot,hot dog,pizza,donut,cake,chair,couch,potted plant,bed,dining table,toilet,tv,laptop,mouse,remote,keyboard,cell phone,microwave,oven,toaster,sink,refrigerator,book,clock,vase,scissors,teddy bear,hair drier,toothbrush. (This is not an exhaustive list, but covers the 80 standard COCO classes). - This object detection step is performed only if you specify weights for object classes (other than 'face' and 'default') with a weight greater than 0 in the
--object_weightsargument. This optimizes performance when only face-centric reframing is needed.
- This model can identify a wide range of objects from the COCO dataset. Common detectable classes include:
- The importance of all detected elements in guiding the crop is determined by the
--object_weightsargument.
- Faces: Detected using a specialized model,
- Optimal Stationary Crop: For each scene, calculates a fixed crop window that best frames the weighted area of interest at the target aspect ratio.
- Output Generation (Cropping/Padding):
- Default (Fill/Pan & Scan): If
--paddingis NOT specified, the determined crop is scaled to completely fill the output frame. Excess parts of the crop are trimmed to match the target aspect ratio without deforming the image. No bars are added. - With Padding (
--paddingis specified): The determined crop is scaled to fit entirely within the output frame, preserving its aspect ratio.- If
--padding_typeisblack(default when--paddingis used) or an unrecognized type: Black bars are added. - If
--padding_typeisblur: Blurred bars from the original video background are added. Intensity is controlled by--blur_amount. - If
--padding_typeiscolor: Bars of the color specified by--padding_color_valueare added. The script will attempt to preserve the audio track from the original video using FFmpeg. If FFmpeg is not found, the output video will be silent.
- If
- Default (Fill/Pan & Scan): If
Future Development Ideas: Enhancements could include automatic selection of reframing strategies (like dynamic tracking or panning) and more advanced path smoothing.
input: Path to the input video file or input directory (if--batchis used).output: Path to the output video file or output directory (if--batchis used).--ratio R: Target aspect ratio for the output video (e.g.,9:16,1:1, or0.5625). Default:9/16.--padding: (Flag, default:False) If set, enables padding to ensure the entire optimally cropped content is visible. Defaults to black bars if no other padding type is specified. If not set, the content will be cropped to fill the output frame (Pan & Scan).--padding_type TYPE: (Default:black) Type of padding if--paddingis enabled.black: Solid black bars.blur: Blurred background bars.color: Solid color bars (use with--padding_color_value).
--blur_amount INT: (Default:5) Integer from 0 (minimal) to 10 (maximal) for blur intensity whenpadding_typeisblur.--padding_color_value STR: (Default:black) Color for padding ifpadding_typeiscolor.- Accepts names:
black,white,red,green,blue,yellow,cyan,magenta. - Accepts RGB tuple as string:
"(R,G,B)"(e.g.,"(255,0,0)"for red).
- Accepts names:
--output_height H: (Default:1080, integer) Target height for the output video in pixels (e.g., 720, 1080, 1280, 1920). The width will be calculated automatically based on the target aspect ratio specified by--ratio. This allows control over the final output resolution.--interpolation METHOD: (Default:lanczos) Specifies the interpolation algorithm used for resizing video frames.- Choices:
nearest,linear,cubic,area,lanczos. lanczos(Lanczos over 8x8 neighborhood) orcubicare generally recommended for upscaling (enlarging images) as they can produce sharper results.areais often best for downscaling (shrinking images).linearis faster but might be less sharp thancubicorlanczos.nearestis the fastest but produces blocky results, usually not recommended for video.
- Choices:
--content_opacity O: (Default:1.0) Opacity of the main video content (0.0-1.0). If < 1.0, the content (including any padding) is blended with a blurred version of the full original frame.--object_weights "label:w,...": (Default:"face:1.0,person:0.8,default:0.5") Comma-separatedlabel:weightpairs.- Assigns importance weights to detected elements. The label
'face'refers to faces detected byyolov11n-face.pt(or MediaPipe fallback). Other labels (e.g.,'person','car','dog', see list under "Content Analysis") correspond to objects detected byyolo11n.pt. - Detection of these other objects with
yolo11n.ptis only run if weights are specified for their labels with a weight > 0. - Example:
"--object_weights \"face:1.0,dog:0.7,default:0.2\""(this would triggeryolo11n.ptto look for dogs).
- Assigns importance weights to detected elements. The label
--log_file FILE_PATH: (Optional) Path to a file where verbose (DEBUG level) logs will be written. If not specified, logs (INFO level and above for FrameShift's own messages, plus any output from underlying libraries) are printed to the console.--test: (Flag, default:False) Run in test mode. This will execute a predefined suite of scenarios based on the providedinput,output, andratioarguments, varying other parameters like padding, interpolation, etc. Test outputs are saved to a subdirectory (e.g.,*_test_outputs/) relative to the specified output path. A detailed log of the test suite is also generated (e.g.,*_test_suite.log).--batch: (Flag) Process all videos in the input directory.
The cropping logic determines an optimal stationary (fixed) crop for each scene, prioritizing important content based on --object_weights. How this crop is presented in the final output (filled, or with padding) is controlled by --padding and its related arguments.
For users who prefer a graphical interface over the command line, FrameShift now includes a GUI.
- To Run:
(Ensure you are in the project's root directory and have installed all dependencies from
python frameshift_gui.py
requirements.txt). - Features: The GUI provides access to most of FrameShift's command-line functionalities, including input/output selection, aspect ratio, padding options, object weights, and more, all through an intuitive interface.
- Documentation: For detailed instructions on using the GUI, please refer to the FrameShift GUI User Manual.
FrameShift is released under the MIT License.