How does PoseNet work?
Even with camera-based input, the process of performing pose detection does not change. We start with an input image (a single still of a video is good enough for this). The image is passed through a CNN to do the first part and identify where the people are in the scene. The next step takes the output from the CNN, passes it through a pose decoding algorithm (we'll come back to this in a moment), and uses this to decode poses.
The reason we said pose decoding algorithm was to gloss over the fact that we actually have two decoding algorithms. We can detect single poses, or if there are multiple people, we can detect multiple poses.
We have opted to go with the single pose algorithm because it is the simpler and faster ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access