Video recognition is computationally expensive.
Here we propose a Temporal Shift Module (TSM) to enable
efficient video recognition on edge devices.
Here is a low-power board NVIDIA Jetson Nano.
It costs only $99 and it runs at only 8 watts.
We show a hand gesture recognition demo running real-time on this board.
Here is the output of our model and here is the frame rate of the demo.
And with the demo we can recognize hand gestures like
thumb up, thumb down you can also recognize zoom in, zoom out.
It's useful for driving scenarios where we can tell the map to zoom in or zoom out.
And you can also recognize gestures like swiping left.
You can also push your hand in to tell the car to stop.
It can also recognize gestures like drumming fingers.
Our model runs at about 300M MACs per frame and the model size is only 14 MB.
With the lightweight shift operation, our model can achieve 3D CNN performance at 2D cost,
enable real-time AI applications