Community Computer Vision Course documentation

Camera models

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Camera models

Pinhole Cameras

Pinhole camera from https://commons.wikimedia.org/wiki/File:Pinhole-camera.svg

The simplest kind of camera - perhaps one that you have made yourself - consists of a lightproof box, with a small hole made in one side and a screen or a photographic film on the other. Light rays passing through the hole generate an inverted image on the rear wall of the box. This simple model for a camera is commonly used in 3D graphics applications.

Camera axes conventions

Blender camera axes conventions There are a number of different conventions for the direction of the camera axes. Here we will follow the convention of Blender (see diagram), where the camera points along the negative Z-axis, the camera X-axis points to the left (looking from the camera) and the camera Y-axis points up.

Pinhole camera coordinate transformation

Pinhole transformation Each point in 3D space maps to a single point on the 2D plane. To find the map between 3D and 2D coordinates, we first need to know the intrinsics of the camera, which for a pinhole camera are:

  • the focal lengths,fxf_x andfyf_y.
  • the coordinates of the principle point,cxc_xandcyc_y, which is the optical centre of the image. This point is where the optical axis intersects the image plane.

Using these intrinsic parameters, we construct the camera matrix: K=(fx0cx0fycy001) K = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \\ \end{pmatrix}

In order to apply this to a pointp=[x,y,z] p=[x,y,z] to a point in 3D space, we multiply the point by the camera matrixK@p K @ p to give a new 3x1 vector[u,v,w] [u,v,w]. This is a homogeneous vector in 2D, but where the last component isn’t 1. To find the position of the point in the image plane we have to divide the first two coordinates by the last one, to give the point[u/w,v/w][u/w, v/w].

Whilst this is the textbook definition of the camera matrix, if we use the Blender camera convention it will flip the image left to right and up-down (as points in front of the camera will have negative z-values). One potential way to fix this is to change the signs of some of the elements of the camera matrix: K=(βˆ’fx0cx0βˆ’fycy001) K = \begin{pmatrix} -f_x & 0 & c_x \\ 0 & -f_y & c_y \\ 0 & 0 & 1 \\ \end{pmatrix}

Camera Transformation Matrices

Usually, the camera isn’t just at the origin, but we have to transform points from world coordinates to coordinates relative to the camera. To do so, we first apply the world-to-camera matrix to the points, and then we apply the camera matrix.

More complex camera models

More complicated camera models are possible, modeling the distortion generated by a real lens. For a discussion of such models, see Multiple View Geometry in Computer Vision.

< > Update on GitHub