In this article by Robert Laganiere, the author of the book OpenCV 3 Computer Vision Application Programming Cookbook Third Edition, has covered the following recipes:

Calibrating a camera

Recovering camera pose

(For more resources related to this topic, see here.)

Digital image formation

Let us now redraw a new version of the figure describing the pin-hole camera model. More specifically, we want to demonstrate the relation between a point in 3D at position (X,Y,Z) and its image (x,y), on a camera specified in pixel coordinates.

reconstructing-3d-scenes-img-0

Note the changes that have been made to the original figure. First, a reference frame was positioned at the center of the projection; then, the Y-axis was alligned to point downwards to get a coordinate system that is compatible with the usual convention, which places the image origin at the upper-left corner of the image. Finally, we have identified a special point on the image plane, by considering the line coming from the focal point that is orthogonal to the image plane. The point (u0,v0) is the pixel position at which this line pierces the image plane and is called the principal point. It would be logical to assume that this principal point is at the center of the image plane, but in practice, this one might be off by few pixels depending on the precision of the camera.

Since we are dealing with digital images, the number of pixels on the image plane (its resolution) is another important characteristic of a camera. We learned previously that a 3D point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by the pixel width (px) and then the height (py). Note that by dividing the focal length given in world units (generally given in millimeters) by px, we obtain the focal length expressed in (horizontal) pixels. We will then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel unit. The complete projective equation is therefore as shown:

reconstructing-3d-scenes-img-1

We know that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. Also, the physical size of a pixel can be obtained by dividing the size of the image sensor (generally in millimeters) by the number of pixels (horizontally or vertically). In modern sensor, pixels are generally of square shape, that is, they have the same horizontal and vertical size.

The preceding equations can be rewritten in matrix form. Here is the complete projective equation in its most general form:

reconstructing-3d-scenes-img-2

Calibrating a camera

Camera calibration is the process by which the different camera parameters (that is, the ones appearing in the projective equation) are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. However, accurate calibration information can be obtained by undertaking an appropriate camera calibration step.

An active camera calibration procedure will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process that has been made easy by the availability of OpenCV calibration functions.

How to do it...

To calibrate a camera, the idea is to show it a set of scene points for which the 3D positions are known. Then, you need to observe where these points project on the image. With the knowledge of a sufficient number of 3D points and associated 2D image points, the exact camera parameters can be inferred from the projective equation. Obviously, for accurate results, we need to observe as many points as possible. One way to achieve this would be to take a picture of a scene with known 3D points, but in practice, this is rarely feasible. A more convenient way is to take several images of a set of 3D points from different viewpoints. This approach is simpler, but it requires you to compute the position of each camera view in addition to the computation of the internal camera parameters, which is fortunately feasible.

OpenCV proposes that you use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0, with the X and Y axes well-aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. The following is an example of a calibration pattern image made of 7x5 inner corners as captured during the calibration step:

reconstructing-3d-scenes-img-3

The good thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (the number of horizontal and vertical inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false, as shown:

//output vectors of image points
std::vector<cv::Point2f> imageCorners;
//number of inner corners on the chessboard
cv::Size boardSize(7,5);
//Get the chessboard corners
bool found = cv::findChessboardCorners(image,  // image of chessboard pattern 
boardSize,     // size of pattern
imageCorners); // list of detected corners

The output parameter, imageCorners, will simply contain the pixel coordinates of the detected inner corners of the shown pattern. Note that this function accepts additional parameters if you need to tune the algorithm, which are not discussed here. There is also a special function that draws the detected corners on the chessboard image, with lines connecting them in a sequence:

//Draw the corners
cv::drawChessboardCorners(image, boardSize, imageCorners, found); // corners have been found

The following image is obtained:

reconstructing-3d-scenes-img-4

The lines that connect the points show the order in which the points are listed in the vector of detected image points. To perform a calibration, we now need to specify the corresponding 3D points. You can specify these points in the units of your choice (for example, in centimeters or in inches); however, the simplest is to assume that each square represents one unit. In that case, the coordinates of the first point would be (0,0,0) (assuming that the board is located at a depth of Z=0), the coordinates of the second point would be (1,0,0), and so on, the last point being located at (6,4,0). There are a total of 35 points in this pattern, which is too less to obtain an accurate calibration. To get more points, you need to show more images of the same calibration pattern from various points of view. To do so, you can either move the pattern in front of the camera or move the camera around the board; from a mathematical point of view, this is completely equivalent. The OpenCV calibration function assumes that the reference frame is fixed on the calibration pattern and will calculate the rotation and translation of the camera with respect to the reference frame.

Let‘s now encapsulate the calibration process in a CameraCalibrator class. The attributes of this class are as follows:

// input points:
  // the points in world coordinates
  // (each square is one unit)
  std::vector<std::vector<cv::Point3f>> objectPoints;
  // the image point positions in pixels
  std::vector<std::vector<cv::Point2f>> imagePoints;
  // output Matrices
  cv::Mat cameraMatrix;
  cv::Mat distCoeffs;
  // flag to specify how calibration is done
  int flag;

Note that the input vectors of the scene and image points are in fact made of std::vector of point instances; each vector element is a vector of the points from one view. Here, we decided to add the calibration points by specifying a vector of the chessboard image filename as input, the method will take care of extracting the point coordinates from the images:

// Open chessboard images and extract corner points
int CameraCalibrator::addChessboardPoints(const std::vector<std::string>& filelist, // list of filenames
cv::Size & boardSize) {
  // calibration noard size
  // the points on the chessboard
  std::vector<cv::Point2f> imageCorners;
  std::vector<cv::Point3f> objectCorners;

  // 3D Scene Points:
  // Initialize the chessboard corners 
  // in the chessboard reference frame
  // The corners are at 3D location (X,Y,Z)= (i,j,0)
  for (int i=0; i<boardSize.height; i++) {
    for (int j=0; j<boardSize.width; j++) {
      objectCorners.push_back(cv::Point3f(i, j, 0.0f));
    }
  }

  // 2D Image points:
  cv::Mat image; // to contain chessboard image
  int successes = 0;
  // for all viewpoints
  for (int i=0; i<filelist.size(); i++) {

    // Open the image
    image = cv::imread(filelist[i],0);

    // Get the chessboard corners
    bool found = cv::findChessboardCorners(image,  //image of chessboard pattern 
boardSize,     // size of pattern
imageCorners); // list of detected corners

    // Get subpixel accuracy on the corners
    if (found) {
      cv::cornerSubPix(image, imageCorners,
      cv::Size(5, 5), // half size of serach window
      cv::Size(-1, -1),
      cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS,30,// max number of iterations 0.1));  // min accuracy

      // If we have a good board, add it to our data
      if (imageCorners.size() == boardSize.area()) {
        // Add image and scene points from one view
        addPoints(imageCorners, objectCorners);
        successes++;
      }
    }

    //If we have a good board, add it to our data
    if (imageCorners.size() == boardSize.area()) {
      // Add image and scene points from one view
      addPoints(imageCorners, objectCorners);
      successes++;
    }
  }
  return successes;
}

The first loop inputs the 3D coordinates of the chessboard and the corresponding image points are the ones provided by the cv::findChessboardCorners function; this is done for all the available viewpoints. Moreover, in order to obtain a more accurate image point location, the cv::cornerSubPix function can be used, and as the name suggests, the image points will then be localized at subpixel accuracy. The termination criterion that is specified by the cv::TermCriteria object defines the maximum number of iterations and the minimum accuracy in subpixel coordinates. The first of these two conditions that is reached will stop the corner refinement process.

When a set of chessboard corners have been successfully detected, these points are added to the vectors of the image and scene points using our addPoints method. Once a sufficient number of chessboard images have been processed (and consequently, a large number of 3D scene point / 2D image point correspondences are available), we can initiate the computation of the calibration parameters as shown:

// Calibrate the camera
// returns the re-projection error
double CameraCalibrator::calibrate(cv::Size &imageSize){
  //Output rotations and translations
  std::vector<cv::Mat> rvecs, tvecs;

  // start calibration
  return 
  calibrateCamera(objectPoints, // the 3D points
imagePoints,   // the image points
imageSize,      // image size
cameraMatrix, // output camera matrix
distCoeffs,      // output distortion matrix
rvecs, tvecs,    // Rs, Ts 
flag);               // set options
}

In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. These will be described in the next section.

How it works...

In order to explain the result of the calibration, we need to go back to the projective equation presented in the introduction of this article. This equation describes the transformation of a 3D point into a 2D point through the successive application of two matrices. The first matrix includes all of the camera parameters, which are called the intrinsic parameters of the camera. This 3x3 matrix is one of the output matrices returned by the cv::calibrateCamera function. There is also a function called cv::calibrationMatrixValues that explicitly returns the value of the intrinsic parameters given by a calibration matrix.

The second matrix is there to have the input points expressed into camera-centric coordinates. It is composed of a rotation vector (a 3x3 matrix) and a translation vector (a 3x1 matrix). Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (made of a rotation component represented by the matrix entries r1 to r9 and a translation represented by t1, t2, and t3) that must be computed for each view. These are in the output parameter list of the cv::calibrateCamera function. The rotation and translation components are often called the extrinsic parameters of the calibration and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system.

The calibration results provided by the cv::calibrateCamera are obtained through an optimization process. This process aims to find the intrinsic and extrinsic parameters that minimizes the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all the points specified during the calibration is called the re-projection error.

The intrinsic parameters of our test camera obtained from a calibration based on the 27 chessboard images are fx=409 pixels; fy=408 pixels; u0=237; and v0=171. Our calibration images have a size of 536x356 pixels. From the calibration results, you can see that, as expected, the principal point is close to the center of the image, but yet off by few pixels. The calibration images were taken using a Nikon D500 camera with an 18mm lens. Looking at the manufacturer specifitions, we find that the sensor size of this camera is 23.5mm x 15.7mm which gives us a pixel size of 0.0438mm. The estimated focal length is expressed in pixels, so multiplying the result by the pixel size gives us an estimated focal length of 17.8mm, which is consistent with the actual lens we used.

Let us now turn our attention to the distortion parameters. So far, we have mentioned that under the pin-hole camera model, we can neglect the effect of the lens. However, this is only possible if the lens that is used to capture an image does not introduce important optical distortions. Unfortunately, this is not the case with lower quality lenses or with lenses that have a very short focal length. Even the lens we used in this experiment introduced some distortion, that is, the edges of the rectangular board are curved in the image. Note that this distortion becomes more important as we move away from the center of the image. This is a typical distortion observed with a fish-eye lens and is called radial distortion.

It is possible to compensate for these deformations by introducing an appropriate distortion model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation, which will correct the distortions, can be obtained together with the other camera parameters during the calibration phase. Once this is done, any image from the newly calibrated camera will be undistorted. Therefore, we have added an additional method to our calibration class.

//remove distortion in an image (after calibration)
cv::Mat CameraCalibrator::remap(const cv::Mat &image) {

  cv::Mat undistorted;

  if (mustInitUndistort) { //called once per calibration

    cv::initUndistortRectifyMap(cameraMatrix,  // computed camera matrix
distCoeffs,  // computed distortion matrix
cv::Mat(),    // optional rectification (none) 
cv::Mat(),    // camera matrix to generate undistorted
image.size(),    // size of undistorted
CV_32FC1,     // type of output map
map1, map2);   // the x and y mapping functions

    mustInitUndistort= false;
  }

  // Apply mapping functions
  cv::remap(image, undistorted, map1, map2, cv::INTER_LINEAR); // interpolation type

  return undistorted;
}

Running this code on one of our calibration image results in the following undistorted image:

reconstructing-3d-scenes-img-5

To correct the distortion, OpenCV uses a polynomial function that is applied to the image points in order to move them at their undistorted position. By default, five coefficients are used; a model made of eight coefficients is also available. Once these coefficients are obtained, it is possible to compute two cv::Mat mapping functions (one for the x coordinate and one for the y coordinate) that will give the new undistorted position of an image point on a distorted image. This is computed by the cv::initUndistortRectifyMap function, and the cv::remap function remaps all the points of an input image to a new image. Note that because of the nonlinear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you now obtain output pixels that have no values in the input image (they will then be displayed as black pixels).

There‘s more...

More options are available when it comes to camera calibration.

Calibration with known intrinsic parameters

When a good estimate of the camera’s intrinsic parameters is known, it could be advantageous to input them in the cv::calibrateCamera function. They will then be used as initial values in the optimization process. To do so, you just need to add the cv::CALIB_USE_INTRINSIC_GUESS flag and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (cv::CALIB_FIX_PRINCIPAL_POINT), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (cv::CALIB_FIX_RATIO); in which case, you assume that the pixels have a square shape.

Using a grid of circles for calibration

Instead of the usual chessboard pattern, OpenCV also offers the possibility to calibrate a camera by using a grid of circles. In this case, the centers of the circles are used as calibration points. The corresponding function is very similar to the function we used to locate the chessboard corners, for example:

cv::Size boardSize(7,7);
std::vector<cv::Point2f> centers;
bool found = cv:: findCirclesGrid(image, boardSize, centers);

Recovering camera pose

When a camera is calibrated, it becomes possible to relate the captured with the outside world. If the 3D structure of an object is known, then one can predict how the object will be imaged on the sensor of the camera. The process of image formation is indeed completely described by the projective equation that was presented at the beginning of this article. When most of the terms of this equation are known, it becomes possible to infer the value of the other elements (2D or 3D) through the observation of some images. In this recipe, we will look at the camera pose recovery problem when a known 3D structure is observed.

How to do it...

Lets consider a simple object here, a bench in a park. We took an image of it using the camera/lens system calibrated in the previous recipe. We have manually identified 8 distinct image points on the bench that we will use for our camera pose estimation.

reconstructing-3d-scenes-img-6

Having access to this object makes it possible to make some physical measurements. The bench is composed of a seat of size 242.5cmx53.5cmx9cm and a back of size 242.5cmx24cmx9cm that is fixed 12cm over the seat. Using this information, we can then easily derive the 3D coordinates of the eight identified points in an object-centric reference frame (here we fixed the origin at the left extremity of the intersection between the two planes). We can then create a vector of cv::Point3f containing these coordinates.

//Input object points
std::vector<cv::Point3f> objectPoints;
objectPoints.push_back(cv::Point3f(0, 45, 0));
objectPoints.push_back(cv::Point3f(242.5, 45, 0));
objectPoints.push_back(cv::Point3f(242.5, 21, 0));
objectPoints.push_back(cv::Point3f(0, 21, 0));
objectPoints.push_back(cv::Point3f(0, 9, -9));
objectPoints.push_back(cv::Point3f(242.5, 9, -9));
objectPoints.push_back(cv::Point3f(242.5, 9, 44.5));
objectPoints.push_back(cv::Point3f(0, 9, 44.5));

The question now is where the camera was with respect to these points when the shown picture was taken. Since the coordinates of the image of these known points on the 2D image plane are also known, it becomes easy to answer this question using the cv::solvePnP function. Here, the correspondence between the 3D and the 2D points has been established manually, but as a reader of this book, you should be able to come up with methods that would allow you to obtain this information automatically.

//Input image points
std::vector<cv::Point2f> imagePoints;
imagePoints.push_back(cv::Point2f(136, 113));
imagePoints.push_back(cv::Point2f(379, 114));
imagePoints.push_back(cv::Point2f(379, 150));
imagePoints.push_back(cv::Point2f(138, 135));
imagePoints.push_back(cv::Point2f(143, 146));
imagePoints.push_back(cv::Point2f(381, 166));
imagePoints.push_back(cv::Point2f(345, 194));
imagePoints.push_back(cv::Point2f(103, 161));

// Get the camera pose from 3D/2D points
cv::Mat rvec, tvec;
cv::solvePnP(objectPoints, imagePoints,  // corresponding 3D/2D pts 
cameraMatrix, cameraDistCoeffs, // calibration 
rvec, tvec);    // output pose

// Convert to 3D rotation matrix
cv::Mat rotation;
cv::Rodrigues(rvec, rotation);

This function computes the rigid transformation (rotation and translation) that brings the object coordinates in the camera-centric reference frame (that is, the ones that has its origin at the focal point). It is also important to note that the rotation computed by this function is given in the form of a 3D vector. This is a compact representation in which the rotation to apply is described by a unit vector (an axis of rotation) around which the object is rotated by a certain angle. This axis-angle representation is also called the Rodrigues’ rotation formula. In OpenCV, the angle of rotation corresponds to the norm of the output rotation vector, which is later aligned with the axis of rotation. This is why the cv::Rodrigues function is used to obtain the 3D matrix of rotation that appears in our projective equation.

The pose recovery procedure described here is simple, but how do we know we obtained the right camera/object pose information. We can visually assess the quality of the results using the cv::viz module that gives us the ability to visualize 3D information. The use of this module is explained in the last section of this recipe. For now, lets display a simple 3D representation of our object and the camera that captured it:

reconstructing-3d-scenes-img-7

It might be difficult to judge of the quality of the pose recovery just by looking at this image but if you test the example of this recipe on your computer, you will have the possibility to move this representation in 3D using your mouse which should give you a better sense of the solution obtained.

How it works...

In this recipe, we assumed that the 3D structure of the object was known as well as the correspondence between sets of object points and image points. The camera’s intrinsic parameters were also known through calibration. If you look at our projective equation presented at the end of the Digital image formation section of the introduction of this article, this means that we have points for which coordinates (X,Y,Z) and (x,y) are known. We also have the elements of first matrix known (the intrinsic parameters). Only the second matrix is unknown; this is the one that contains the extrinsic parameters of the camera that is the camera/object pose information. Our objective is to recover these unknown parameters from the observation of 3D scene points. This problem is known as the Perspective-n-Point problem or PnP problem.

Rotation has three degrees of freedom (for example, angle of rotation around the three axes) and translation also has three degrees of freedom. We therefore have a total of 6 unknowns. For each object point/image point correspondence, the projective equation gives us three algebraic equations but since the projective equation is up to a scale factor, we only have 2 independent equations. A minimum of three points is therefore required to solve this system of equations. Obviously, more points provide a more reliable estimate.

In practice, many different algorithms have been proposed to solve this problem and OpenCV proposes a number of different implementation in its cv::solvePnP function. The default method consists in optimizing what is called the reprojection error. Minimizing this type of error is considered to be the best strategy to get accurate 3D information from camera images. In our problem, it corresponds to finding the optimal camera position that minimizes the 2D distance between the projected 3D points (as obtained by applying the projective equation) and the observed image points given as input.

Note that OpenCV also has a cv::solvePnPRansac function. As the name suggest, this function uses the RANSAC algorithm in order to solve the PnP problem. This means that some of the object points/image points correspondences may be wrong and the function will returns which ones have been identified as outliers. This is very useful when these correspondences have been obtained through an automatic process that can fail for some points.

There‘s more...

When working with 3D information, it is often difficult to validate the solutions obtained. To this end, OpenCV offers a simple yet powerful visualization module that facilitates the development and debugging of 3D vision algorithms. It allows inserting points, lines, cameras, and other objects in a virtual 3D environment that you can interactively visualize from various points of views.

cv::Viz, a 3D Visualizer module

cv::Viz is an extra module of the OpenCV library that is built on top of the VTK open source library. This Visualization Tooolkit (VTK) is a powerful framework used for 3D computer graphics. With cv::viz, you create a 3D virtual environment to which you can add a variety of objects. A visualization window is created that displays the environment from a given point of view. You saw in this recipe an example of what can be displayed in a cv::viz window. This window responds to mouse events that are used to navigate inside the environment (through rotations and translations). This section describes the basic use of the cv::viz module.

The first thing to do is to create the visualization window. Here we use a white background:

// Create a viz window
cv::viz::Viz3d visualizer(“Viz window“);
visualizer.setBackgroundColor(cv::viz::Color::white());

Next, you create your virtual objects and insert them into the scene. There is a variety of predefined objects. One of them is particularly useful for us; it is the one that creates a virtual pin-hole camera:

// Create a virtual camera
cv::viz::WCameraPosition cam(cMatrix,  // matrix of intrinsics
image,    // image displayed on the plane
30.0,     // scale factor
cv::viz::Color::black());
// Add the virtual camera to the environment
visualizer.showWidget(“Camera“, cam);

The cMatrix variable is a cv::Matx33d (that is,a cv::Matx<double,3,3>) instance containing the intrinsic camera parameters as obtained from calibration. By default this camera is inserted at the origin of the coordinate system. To represent the bench, we used two rectangular cuboid objects.

// Create a virtual bench from cuboids
cv::viz::WCube plane1(cv::Point3f(0.0, 45.0, 0.0), 
cv::Point3f(242.5, 21.0, -9.0), 
true,  // show wire frame 
cv::viz::Color::blue());
plane1.setRenderingProperty(cv::viz::LINE_WIDTH, 4.0);
cv::viz::WCube plane2(cv::Point3f(0.0, 9.0, -9.0), 
cv::Point3f(242.5, 0.0, 44.5), 
true,  // show wire frame 
cv::viz::Color::blue());
plane2.setRenderingProperty(cv::viz::LINE_WIDTH, 4.0);
// Add the virtual objects to the environment
visualizer.showWidget(“top“, plane1);
visualizer.showWidget(“bottom“, plane2);

This virtual bench is also added at the origin; it then needs to be moved at its camera-centric position as found from our cv::solvePnP function. It is the responsibility of the setWidgetPose method to perform this operation. This one simply applies the rotation and translation components of the estimated motion.

cv::Mat rotation;
// convert vector-3 rotation
// to a 3x3 rotation matrix
cv::Rodrigues(rvec, rotation);

// Move the bench	
cv::Affine3d pose(rotation, tvec);
visualizer.setWidgetPose(“top“, pose);
visualizer.setWidgetPose(“bottom“, pose);

The final step is to create a loop that keeps displaying the visualization window. The 1ms pause is there to listen to mouse events.

// visualization loop
while(cv::waitKey(100)==-1 && !visualizer.wasStopped()) {

  visualizer.spinOnce(1,    // pause 1ms 
true);  // redraw
}

This loop will stop when the visualization window is closed or when a key is pressed over an OpenCV image window. Try to apply inside this loop some motion on an object (using setWidgetPose); this is how animation can be created.

Summary

This article teaches us how, under specific conditions, the 3D structure of the scene and the 3D pose of the cameras that captured it can be recovered. We have seen how a good understanding of projective geometry concepts allows to devise methods enabling 3D reconstruction.