Video-MAE
Sports and human action recognition in videos.
Video MAE (Masked Auto Encoder) is a network for doing video classification that uses the ViT (Vision Transformer) backbone.
Technical Details
Model checkpoint:Kinectics-400
Input resolution:224x224
Number of parameters:87.7M
Model size:335 MB
Applicable Scenarios
- Camera
- Action Recognition
Licenses
Source Model:CC-BY-4.0
Deployable Model:AI Model Hub License
Tags
- backbone
Supported Compute Devices
- Snapdragon X Elite CRD
- Snapdragon X Plus 8-Core CRD
Supported Compute Chipsets
- Snapdragon® X Elite
- Snapdragon® X Plus 8-Core
Related Models
See all modelsLooking for more? See models created by industry leaders.
Discover Model Makers