music Hello! Earlier we showed that using two images it is possible to obtain three-dimensional coordinates of terrain objects. If the spatial coordinates of a large number of objects are obtained in this way, we can get a real terrain model. That is why all modern photogrammetric algorithms are focused on the problem of determining identical points on pairs of photographs. If you automatically receive sets of identical points for a large number of photographs, it is easy to automatically receive high-quality terrain models. The operation principles of all photogrammetric software are based on this. There are two main approaches to identifying points in a pair of photographs. The first group of methods is called areal. These are methods that are based on the analysis of pixel and image values within an area. These methods are divided into correlation methods and methods based on the least-squares method. The second methods group is based on the detection of the image “features” (Feature detection algorithms). In the first group of methods, the computer sequentially compares some fragments of images with each other by sequential search. This method is very simple to implement, it is undemanding to computing resources. However, it has limitations. It works only in the case of perfect high-quality images - - no noise, no highlights, with an ideal aerial photography route. Any rotation will immediately “break the algorithm”. We understand that it is impossible to achieve perfect aerial photography, which means that the areal method will not work almost never. In all other cases, feature detection algorithms will save us. There are a lot of such methods, but their work can be summarized to a unified scenario of three stages. At the first stage, the special points (keypoints) on all images are selected. The second stage is the keypoints descriptors computing. And the third stage is the descriptors comparison on different photographs and the keypoints matching. Let’s deal with each step in more detail. What is a keypoint? As a rule, a keypoint is understood as a point that significantly and unambiguously differs from the background and from neighboring points. It is a pixel of a digital image that can be easily identified. The mathematics of the process can be different. There are several searching operators - Moravic operator, Forstner operator, SIFT operator. Modern photogrammetric algorithms make it possible to automatically detect and find millions of such points in any photographs, because there much more keypoints for a computer than for the human eye. After we have found these keypoints, they need to be described. A description refers to a set of data about a point and its environment, which will allow you to uniquely match point to the same point on different image. Descriptor computing is a process that can be based on a large number of different algorithms using, for example, brightness gradients. After we have found and described the keypoints, the last task remains - the comparison of these points. The quality of the comparison problem solution directly depends on the quality of the points search and on the descriptors quality. Obviously, even with a high-quality description - - in different pictures there can be different conditions of brightness and illumination, which managed to change between these pictures. That is why, not all keypoints will be interconnected. Therefore, it requires the search for a large number of points. If we were able to match a large number of descriptors in two images, then the problem is considered solved. Let's go back to the theory a bit and recall that to build an accurate model we need to know the images parameters. Interior orientation elements are known. From the exterior orientation elements, we know the coordinates of the photographing point - - they are determined from GNSS measurements. Using the keypoints search, we can determine the angular exterior orientation elements and the relative orientation elements of the images. That is, after searching and comparing, we know all the necessary values in order to build a terrain model from the images. Note that the model will have the correct scale, because image coordinates are defined in the geodetic system. Let's talk about the SIFT algorithm. His appearance has greatly changed digital photogrammetry and computer vision. SIFT - scale-invariant feature transform. It was patented by David Lowe in 1999. This algorithm is widely used for creating panoramas, creating stereo images, reconstructing a 3D objects from its two-dimensional photographs, tracking the movement of an object, recognizing objects, and so on. The human brain can easily compare two images and conclude that there is the same object photographed in the images but from different angles. However, it should be taken into account that for a computer, an image is a set of nothing-talking units and zeros organized in some order. As we already mentioned - you can compare images with pieces or single points, but it is very negatively affected by any imperfections in the images - displacement of objects, differences in the scale of photographing, noise, lighting dynamics and other. And it would be nice to compare images not at all points, among which there are bad ones, but only at good ones, giving a more or less unambiguous result. As you understand, this is the first stage - the search for keypoints. It is necessary to come up with a detector method for extracting these keypoints. And, in fact, we will not compare photographs, but their certain models, composed of keypoints. Once again, recall the chain: Detector - Descriptor calculation – Comparison and matching - Constructing a Transformation Model. How is the keypoints search implemented? The main detection mechanism is the Gaussians construction and the difference of Gaussians (DoG) for each image. What is a Gaussian? This is a Gaussian blurred image using the Gauss function. This is a fairly well-known technique; such a tool is in the popular Photoshop editor. Gaussian Blur is a type of image blur filter that uses the Gauss function (which also expresses the normal distribution in statistics) to calculate the conversion applied to each pixel in the image. When it is applied in two dimensions, this formula creates a surface whose contours are concentric circles with a Gaussian distribution from a central point. The values from this distribution are used to construct a convolution matrix that is applied to the original image. The new value of each pixel is set as the weighted average of the neighborhood of that pixel. The value of the original pixel receives the largest weight (having the largest Gaussian value), and neighboring pixels receive less weight as their distance to the original pixel increases. This results in blurring that preserves borders and edges better than other, more uniform blurring filters. It is clear that we can change the degree of blur and build an infinite number of Gaussian images. The Difference of Gaussian (DoG) is an image obtained by pixel-by-pixel subtraction of one Gaussian of the original image from a Gaussian with a different blur rate. A point is considered as a keypoint if it is a local extremum of the DoG. In each image from the pyramid, local extremum points are searched. Each point of the current image is compared with its eight neighbors and with nine neighbors one level higher and lower in the pyramid. If this point is larger (less) than all neighbors, then it is taken as a point of local extremum. Further, after finding the points, it is necessary to filter and clarify them. For this, the coordinates of the points are successively calculated with subpixel accuracy. By mathematical transformations, it is determined whether the point was taken correctly or whether it is necessary to shift from it in a certain direction. Also, applying mathematical transformations, you can determine the contrast. If contrast is small, then the point is thrown out (filtered). Also, if the point is located on the boundary of objects or it is poorly lit, then it is excluded from further calculations. After we make sure that the point is the keypoint, we need to calculate its orientation. The direction of the keypoint is calculating based on the gradient’s directions of the adjacent points. All gradients calculations are performed on the image in the pyramid of the Gaussians, with a scale closest to the scale of the keypoint. Now, knowing that the point is the keypoint, we proceed to the descriptor computing. As previously mentioned, a descriptor is some information about the environment of a keypoint. This is done for obvious reasons - the distortions we talked about (changing the position of an object in the picture, changing the scene, overlapping one object with another, turning) have a minimal effect on small areas or do not affect at all. In the SIFT method, the descriptor is a vector. Like the direction of the keypoint, the descriptor is calculated on the Gaussian closest in scale to the keypoint, and based on the gradients in a certain keypoint window. The image shows an example of a keypoint descriptor with a dimension of 2 * 2 * 8. Where the first two components are the number of regions (not pixels. A region can have several pixels) horizontally and vertically. The third digit indicates the number of histogram components of these regions. The keypoint descriptor consists of all these histograms. The dimension of the descriptor in the picture is 32 components (2x2x8), but in practice descriptors of dimension 128 components (4x4x8) are used. After we have found the points and described them correctly, all that remains is to compare them on different images. To do this, the best-bin-first (BBF) search method is used, which can identify the nearest neighbor with a high probability using only a limited number of calculations. After comparing the images, it remains only to calculate the points coordinates using photogrammetric formulas. The result will be a point cloud - a terrain model with the correct coordinates. Later we will talk about what happens next. Besides SIFT algorithm, there are other implementations for solving such problems. Firstly, these are different SIFT variations, such as: RIFT, PCA-SIFT It is also worth mentioning the SURF method - - Speeded Up Robust Features and the KAZE features. The last one is a relatively new method which has gained wide popularity due to the fact that it is distributed freely and has open source codes.