A few words about pattern recognition. Raster image image as a two-dimensional data array Classical filtering: Fourier, low-pass filter, high-pass filter

Formulation of the problem determined by the goal and the possibilities of its implementation.

Target. Develop a program for classifying rectangular parts into quality and defective.

Opportunities for implementing the task determined by the capabilities of the computer. A computer is capable of processing numerical information in an algorithmic sequence. To realize the capabilities of a computer, it is necessary to simulate the problem being solved.

Modeling using a computer implies a transition from a real object (world) to a coded description of its properties using data and operations on them. Such a transition is usually carried out in several stages:

Abstraction– selection of the most essential features of the object from the point of view of the task.

It is necessary to conduct research that allows us to move from the object of modeling to the subject of modeling, discarding everything unnecessary in accordance with the goal of the task

How does a rectangle differ from other quadrilaterals?

  • Equality of opposite sides.
  • Parallelism of opposite sides.
  • Equality of diagonals.
  • All angles are right.

What minimum features are needed to solve the problem unambiguously?

  • Equality of 2 opposite sides + equality of diagonals.
  • Parallelism of 2 opposite sides + equality of diagonals.
  • Three angles are right.

So, thanks to abstraction, we received a verbal information model. But it is still incomprehensible to the computer. It understands a mathematical model represented as an algorithm and implemented in software.

Methodology for implementing the task.

A drawing of a quality part (rectangle) or a defective part (quadrangle) is made from segments (LINE command) in the AutoCAD graphic system and exported to . The kntrs.lsp() program reads data about segments (vertex coordinates) from a DXF file and writes them to the text file vrtks.txt in a circular order of traversing the vertices.

The text file vrtks.txt can be created manually by taking vertex coordinates directly from the drawing.

The developed rct.lsp program must read data from the vrtks.txt file, analyze it and output a record in the result.txt file about whether the part meets the requirements (rectangle or not).

Formalization of features

Equality of lengths of segments (sides or diagonals): The length of each segment is determined as the hypotenuse of a rectangular rectangle (according to the Pythagorean theorem) through the difference in the coordinates of the segments:

(setq DX12 (abs (- X1 X2))) (setq DY12 (abs (- Y1 Y2))) (setq DA1 (sqrt (+ (* DX12 DX12) (* DY12 DY12))))

Parallelism of lines: K2= K1, Where TO– slope of the straight line K=(Y2-Y1)/(X2-X1)

How to formalize the concept of “Right angle”? Within the framework of the task, the presence of a “Right Angle” can be determined by the perpendicularity of the segments: K2= -1/K1, Where TO– slope of the straight line. K=(Y-Y0)/(X-X0).

Displaying the model on the computer

Any information is ultimately displayed in a computer using binary numbers (codes) into an intra-machine model. Previously, coding was done by a programmer. Nowadays, the bulk of programs are created in algorithmic languages.

I have long wanted to write a general article containing the very basics of Image Recognition, a kind of guide on basic methods, telling when to use them, what problems they solve, what can be done in the evening on your knees, and what is better not to think about without having a team of people in 20.

I’ve been writing some articles on Optical Recognition for a long time, so a couple of times a month various people write to me with questions on this topic. Sometimes you get the feeling that you live with them in different worlds. On the one hand, you understand that the person is most likely a professional in a related topic, but knows very little about optical recognition methods. And the most annoying thing is that he is trying to apply a method from a nearby field of knowledge, which is logical, but does not work completely in Image Recognition, but does not understand this and is very offended if you start telling him something from the very basics. And considering that telling from the basics takes a lot of time, which is often not available, it becomes even sadder.

This article is intended so that a person who has never worked with image recognition methods can, within 10-15 minutes, create in his head a certain basic picture of the world that corresponds to the topic, and understand in which direction to dig. Many of the techniques described here are applicable to radar and audio processing.
I'll start with a couple of principles that we always start telling to a potential customer, or a person who wants to start doing Optical Recognition:

  • When solving a problem, always go from the simplest. It is much easier to put an orange tag on a person than to follow a person, highlighting him in cascades. It is much easier to take a camera with a higher resolution than to develop a super-resolution algorithm.
  • A strict formulation of the problem in optical recognition methods is orders of magnitude more important than in system programming problems: one extra word in the technical specification can add 50% of the work.
  • There are no universal solutions to recognition problems. You cannot make an algorithm that will simply “recognize any inscription.” A sign on the street and a sheet of text are fundamentally different objects. It’s probably possible to create a general algorithm (here’s a good example from Google), but it will require a lot of work from a large team and consist of dozens of different subroutines.
  • OpenCV is a bible that has many methods and can solve 50% of almost any problem, but OpenCV is only a small part of what can actually be done. In one study, the conclusions were written: “The problem cannot be solved using OpenCV methods, therefore it is unsolvable.” Try to avoid this, don’t be lazy and soberly evaluate the current task from scratch each time, without using OpenCV templates.
It is very difficult to give any universal advice, or tell how to create some kind of structure around which you can build a solution to arbitrary computer vision problems. The purpose of this article is to structure what can be used. I will try to divide the existing methods into three groups. The first group is preliminary filtering and image preparation. The second group is the logical processing of filtering results. The third group is decision-making algorithms based on logical processing. The boundaries between groups are very arbitrary. To solve a problem, it is not always necessary to use methods from all groups; sometimes two are enough, and sometimes even one.

The list of methods given here is not complete. I suggest adding critical methods in the comments that I did not write and attributing 2-3 accompanying words to each.

Part 1. Filtration

In this group I placed methods that allow you to select areas of interest in images without analyzing them. Most of these methods apply some kind of single transformation to all points in the image. At the filtering level, no image analysis is performed, but points that are filtered can be considered as areas with special characteristics.
Binarization by threshold, selection of histogram area
The simplest transformation is binarization of the image by threshold. For RGB and grayscale images, the threshold is the color value. There are ideal problems in which such a transformation is sufficient. Suppose you want to automatically select objects on a white sheet of paper:




The choice of the threshold at which binarization occurs largely determines the process of binarization itself. In this case, the image was binarized by the average color. Typically, binarization is carried out using an algorithm that adaptively selects a threshold. Such an algorithm can be the choice of expectation or mode. Or you can select the largest peak in the histogram.

Binarization can give very interesting results when working with histograms, including in the situation where we consider an image not in RGB, but in HSV. For example, segment colors of interest. On this principle, you can build both a tag detector and a human skin detector.
Classical filtering: Fourier, low-pass filter, high-pass filter
Classic radar filtering and signal processing methods can be successfully applied to a variety of Pattern Recognition tasks. A traditional method in radar, which is almost never used in pure form images, is the Fourier transform (more specifically, the FFT). One of the few exceptions in which the one-dimensional Fourier transform is used is image compression. For image analysis, a one-dimensional transformation is usually not enough; you need to use a much more resource-intensive two-dimensional transformation.

Few people actually calculate it; usually, it’s much faster and easier to use convolution of the area of ​​interest with a ready-made filter, tuned for high (HPF) or low (LPF) frequencies. This method, of course, does not allow for spectrum analysis, but in a specific video processing task, what is usually needed is not analysis, but the result.


The simplest examples of filters that emphasize low frequencies (Gaussian filter) and high frequencies (Gabor filter).
For each image point, a window is selected and multiplied with a filter of the same size. The result of such a convolution is a new point value. When implementing low-pass filters and high-pass filters, images of the following type are obtained:



Wavelets
But what if we use some arbitrary characteristic function for convolution with the signal? Then it will be called "Wavelet transform". This definition of wavelets is not correct, but traditionally, in many teams, wavelet analysis is the search for an arbitrary pattern in an image using convolution with a model of this pattern. There is a set of classical functions used in wavelet analysis. These include the Haar wavelet, Morlet wavelet, Mexican hat wavelet, etc. Haar primitives, about which there were several of my previous articles (,), relate to such functions for two-dimensional space.


Above are 4 examples of classical wavelets. 3-dimensional Haar wavelet, 2-dimensional Meyer wavelet, Mexican Hat wavelet, Daubechies wavelet. A good example of using the extended interpretation of wavelets is the problem of finding a glare in the eye, for which the wavelet is the glare itself:

Classical wavelets are usually used for image compression, or for image classification (to be described below).
Correlation
After such a free interpretation of wavelets on my part, it is worth mentioning the actual correlation that underlies them. This is an indispensable tool when filtering images. A classic application is correlating a video stream to find shifts or optical flows. The simplest shift detector is also, in a sense, a difference correlator. Where the images did not correlate, there was movement.

Filtering functions
An interesting class of filters is function filtering. These are purely mathematical filters that allow you to detect a simple mathematical function in an image (line, parabola, circle). An accumulating image is constructed, in which for each point of the original image a set of functions that generate it are drawn. The most classic transformation is the Hough transform for lines. In this transformation, for each point (x;y), a set of points (a;b) of the straight line y=ax+b is drawn for which the equality is true. You get beautiful pictures:


(the first plus is to the one who is the first to find a catch in the picture and this definition and explain it, the second plus is to the one who is the first to say what is shown here)
The Hough transform allows you to find any parameterizable functions. For example circles. There is a modified transformation that allows you to search for any shapes. Mathematicians are terribly fond of this transformation. But when processing images, unfortunately, it does not always work. Very slow operating speed, very high sensitivity to the quality of binarization. Even in ideal situations, I preferred to make do with other methods.
An analogue of the Hough transform for straight lines is the Radon transform. It is calculated through FFT, which gives a performance gain in a situation where there are a lot of points. In addition, it can be applied to a non-binarized image.
Contour filtering
A separate class of filters is border and contour filtering. Outlines are very useful when we want to move from working with an image to working with the objects in that image. When an object is quite complex, but clearly distinguishable, then often the only way to work with it is to select its contours. There are a number of algorithms that solve the problem of filtering contours:

Most often it is Canny that is used, which works well and whose implementation is in OpenCV (Sobel is also there, but it searches for contours worse).



Other filters
Above are filters whose modifications help solve 80-90% of problems. But besides them, there are rarer filters used in local tasks. There are dozens of such filters, I will not list them all. Interesting are iterative filters (for example, an active appearance model), as well as ridgelet and curvlet transformations, which are a fusion of classical wavelet filtering and analysis in the radon transform field. The beamlet transform works beautifully at the border of the wavelet transform and logical analysis, allowing you to highlight contours:

But these transformations are very specific and tailored for rare tasks.

Part 2. Logical processing of filtering results

Filtering provides a set of data suitable for processing. But often you cannot simply take and use this data without processing it. In this section there will be several classic methods that allow you to move from an image to the properties of objects, or to the objects themselves.
Morphology
The transition from filtering to logic, in my opinion, is the methods of mathematical morphology (, ,). In essence, these are the simplest operations of growing and eroding binary images. These methods allow you to remove noise from a binary image by increasing or decreasing the existing elements. There are contouring algorithms based on mathematical morphology, but usually some kind of hybrid algorithms or algorithms in combination are used.
Contour analysis
Algorithms for obtaining boundaries have already been mentioned in the section on filtering. The resulting boundaries are quite simply converted into contours. For the Canny algorithm this happens automatically; for other algorithms additional binarization is required. You can obtain a contour for a binary algorithm, for example, using the beetle algorithm.
An outline is a unique characteristic of an object. This often allows you to identify an object by its outline. There is a powerful mathematical apparatus that allows you to do this. The device is called contour analysis (,).

To be honest, I have never been able to apply contour analysis in real problems. Too ideal conditions are required. Either there is no boundary, or there is too much noise. But, if you need to recognize something under ideal conditions, then contour analysis is a great option. It works very quickly, beautiful mathematics and clear logic.
Special points
Singular points are unique characteristics of an object that allow the object to be compared with itself or with similar classes of objects. There are several dozen ways to identify such points. Some methods identify special points in adjacent frames, some after a long period of time and when the lighting changes, some allow you to find special points that remain so even when the object is rotated. Let's start with methods that allow us to find special points, which are not so stable, but are quickly calculated, and then we will go in increasing complexity:
First grade. Special points that are stable over a period of seconds. Such points are used to guide an object between adjacent video frames, or to combine images from neighboring cameras. Such points include local maxima of the image, corners in the image (the best detector is, perhaps, the Charis detector), points at which maximum dispersion is achieved, certain gradients, etc.
Second class. Special points that are stable when lighting changes and small movements of the object. Such points serve primarily for training and subsequent classification of object types. For example, a pedestrian classifier or a face classifier is the product of a system built precisely on such points. Some of the previously mentioned wavelets may be the basis for such points. For example, Haar primitives, search for highlights, search for other specific functions. These points include those found by the histogram of directional gradients (HOG) method.
Third class. Stable points. I know only about two methods that provide complete stability and about their modifications. These are SURF and SIFT. They allow you to find special points even when you rotate the image. The calculation of such points takes longer compared to other methods, but the time is quite limited. Unfortunately, these methods are patented. Although, in Russia it is impossible to patent algorithms, so use it for the domestic market.

Part 3. Training

The third part of the story will be devoted to methods that do not work directly with the image, but which allow you to make decisions. Basically these are various methods of machine learning and decision making. Recently Yandyx posted a course on this topic on Habr, there is a very good selection there. Here it is in the text version. For a serious study of the topic, I highly recommend watching them. Here I will try to outline several main methods used specifically in pattern recognition.
In 80% of situations, the essence of learning in the recognition task is as follows:
There is a test sample that contains several classes of objects. Let it be the presence/absence of a person in the photo. For each image there is a set of features that have been highlighted by some feature, be it Haar, HOG, SURF or some wavelet. The learning algorithm must build a model so that it can analyze a new image and decide which object is in the image.
How it's done? Each of the test images is a point in the feature space. Its coordinates are the weight of each of the features in the image. Let our signs be: “Presence of eyes”, “Presence of a nose”, “Presence of two hands”, “Presence of ears”, etc... We will highlight all these signs using our existing detectors, which are trained on body parts similar to human For a person in such a space, the correct point would be . For the monkey, dot for the horse. The classifier is trained using a sample of examples. But not all the photographs showed hands, others had no eyes, and in the third, the monkey had a human nose due to a classifier error. A trained human classifier automatically partitions the feature space in such a way as to say: if the first feature lies in the range 0.5 Essentially, the goal of the classifier is to draw areas in the feature space that are characteristic of the classification objects. This is what a sequential approximation to the answer will look like for one of the classifiers (AdaBoost) in two-dimensional space:


There are a lot of classifiers. Each of them works better in some particular task. The task of selecting a classifier for a specific task is largely an art. Here are some beautiful pictures on the topic.
Simple case, one-dimensional separation
Let's look at an example of the simplest case of classification, when the feature space is one-dimensional, and we need to separate 2 classes. The situation occurs more often than you might think: for example, when you need to distinguish two signals, or compare a pattern with a sample. Let us have a training sample. This produces an image where the X-axis is the measure of similarity, and the Y-axis is the number of events with such a measure. When the desired object is similar to itself, a left Gaussian is obtained. When it doesn't look like it, it's the right one. The value of X=0.4 separates the samples so that a wrong decision minimizes the probability of making any wrong decision. The search for such a separator is the task of classification.


A small note. The criterion that minimizes the error will not always be optimal. The following graph is a graph of a real iris recognition system. For such a system, the criterion is chosen to minimize the probability of false admission of an unauthorized person to the facility. This probability is called “type I error”, “false alarm probability”, “false positive”. In English-language literature “False Access Rate”.
) AdaBusta is one of the most common classifiers. For example, the Haar cascade is built on it. Usually used when binary classification is needed, but nothing prevents training for a larger number of classes.
SVM ( , , , ) One of the most powerful classifiers, which has many implementations. Basically, on the learning tasks I've encountered, it worked similarly to Adabusta. It is considered quite fast, but its training is more difficult than Adabusta's and requires choosing the right core.

There are also neural networks and regression. But to briefly classify them and show how they differ, we need an article much longer than this.
________________________________________________
I hope I was able to give a quick overview of the methods used without diving into mathematics and description. Maybe this will help someone. Although, of course, the article is incomplete and there is not a word about working with stereo images, nor about LSM with a Kalman filter, nor about the adaptive Bayes approach.
If you like the article, I’ll try to make a second part with a selection of examples of how existing ImageRecognition problems are solved.

And finally

What to read?
1) I once really liked the book “Digital Image Processing” by B. Yane, which is written simply and clearly, but at the same time almost all the mathematics is given. Good for getting acquainted with existing methods.
2) A classic of the genre is R. Gonzalez, R. Woods “Digital Image Processing”. For some reason it was more difficult for me than the first one. Much less mathematics, but more methods and pictures.
3) “Image processing and analysis in computer vision problems” - written on the basis of a course taught at one of the departments of Physics and Technology. There are a lot of methods and their detailed descriptions. But in my opinion, the book has two big disadvantages: the book is strongly focused on the software package that comes with it; in the book, too often the description of a simple method turns into a mathematical jungle, from which it is difficult to derive the structural diagram of the method. But the authors have made a convenient website where almost all the content is presented - wiki.technicalvision.ru
4) For some reason, it seems to me that a good book that structures and links the picture of the world that arises when studying Image Recognition and Machine Learning is “On Intelligence” by Jeff Hawkins. There are no direct methods there, but there are a lot of things to think about where direct image processing methods come from. When you read it, you understand that you have already seen the methods of the human brain, but in video processing tasks.

When we look at a two-dimensional image of a three-dimensional scene (in a painting, photograph, monitor screen), it seems to us that all the objects that we could see if we directly observed the same scene in life are directly present there. Meanwhile, all we are really given in a two-dimensional image is visible field, which represents only some brightness distribution function or colors on a two-dimensional plane: f(x, y) , Where x And y– Cartesian coordinates describing the image plane.

Moreover, if you get close to a computer monitor screen, you can see that the image on the screen is not actually smooth and continuous, but is a discrete “mosaic” consisting of individual colored rectangles arranged in a regular rectangular matrix. This is a digital image. From a mathematical point of view digital image is a two-dimensional matrix Im of size (DimXDimY), where x is an integer from 0 to DimX– 1, describing the number of the element in the matrix row, y is an integer from 0 to DimY– 1, describing the number of the matrix row in which this element is located. In this case, the element of the digital image itself (a cell of a rectangular matrix) is called pixel(pixel,picture element). In the simplest case, each pixelIm has a scalar integer value proportional to the value of the brightness distribution function f(x, y) at a given point in the plane.

In Fig. 1.1.1 on the left is an image of a woman's face, represented as an image, and on the right is an enlarged fragment of an image of the same face (right eye), where for each image element the corresponding numerical pixel value is indicated. Light elements of the image correspond to b O larger matrix values, dark – smaller values. The digital image does not contain any other information.

@Rice. 1.1.1 Digital image as a two-dimensional intensity matrix

When starting to study computer vision, you need to clearly understand that only and exclusively a two-dimensional array of numbers of one format or another is stored in a computer as a digital image. Any other data that we would like to extract from the image (shapes, lines, objects, dimensions, content of the depicted text, etc., etc.) can only be obtained as a result of applying a number of image processing and analysis procedures, which we must either program it ourselves or use ready-made procedures available in well-known image analysis software packages. At the same time, to solve simple problems of computer vision, ready-made tools will most likely be found in standard libraries of image processing procedures; to solve more complex problems, it will be necessary to combine certain ready-made procedures, and for many quite “ordinary” tasks that human “biological” vision would seem , solves easily and playfully, computer vision still has no solutions and is still looking for them. After all, using his natural vision, a person can easily navigate in any environment, recognize objects, choose a path, drive a car and much, much more. Why can’t a computer that receives an image from a video camera do all this? Maybe it's the structure of the human eye?

In fact, the human eye, like a video camera, only forms a “visible field” similar to a digital image. In this case, the optical system, consisting of the pupil and lens, projects a two-dimensional image onto the retina, where photosensitive cells (“rods” and “cones”) convert the resulting image into nerve impulses. And only after this, the complex mechanism for processing the received information, functioning in the corresponding part of our brain, interprets these impulses as an image of the visible scene that we understand. Thus, in humans, the “vision” function is performed not only by the eye, but by the “eye + brain” (“sensor + computer”) system. It is the information processing algorithms built into the brain that allow a person to understand what he sees. The role of these built-in algorithms can be explained using the following example.

When, in the middle of the 20th century, ophthalmological surgeons learned to perform operations on the lens of the eye, many people who were blind from birth had the technical ability to see. That is, after such an operation in a person who was previously blind (the light simply did not pass through the lens), an image began to form on the retina and the corresponding signals began to enter the brain in exactly the same way as happens in healthy people. Unfortunately, in this case, “seeing the light” did not mean “beginning to see.” As subsequent history has shown, most “technically enlightened” adult patients were never able to achieve more significant results in the field of vision than recognizing simple geometric shapes - and even this required serious conscious effort on their part. Recognizing people by their faces and orienting themselves in space remained impossible tasks for them. The fact is that those built-in mechanisms of “automatic” visual analysis that develop in people in early childhood were not developed in a timely manner in these patients, and they found themselves in the position of a computer that has a device for inputting images, but does not have the necessary software for his analysis.

In order to finally be convinced of the complexity of the task facing us of analyzing an image, which is a two-dimensional array of numerical data, let’s try to put ourselves in the place of a computer program that deals with abstract numbers. To do this, let’s mentally change the modality of image perception – transfer it from the visual to the tactile area. Let's imagine a two-dimensional array of intensity values ​​as a chessboard, the size of which is equal to the size of the image (DimXDimY), and a column is inserted into the center of each cell, the height of which is proportional to the value of the corresponding pixel in the image. In other words, let's consider a two-dimensional image as a kind of conditional three-dimensional surface. In Fig. 1.1.2 on the left is a fragment of a woman's face shown as an image, and on the right is shown as a pseudo-3D relief.

@Rice. 1.1.2. Digital image as pseudo-3D relief

Now imagine that you must, without looking at the image, feel the corresponding “relief” and try to determine what exactly this “relief” depicts - a house, a dog or a human eye? Experiments show that the average person is not able to cope with such a task. Even recognizing the simplest geometric figures in such a “relief” representation will involve significant effort and will require the conscious development of a special skill, strategy and sensing algorithms. This, despite the apparent simplicity of the “digital image” object, is the true complexity of computer and machine vision problems.

UDC 004932:621.396

T.M. VLASOVA, V.G. KALMYKOV

ALGORITHM AND PROGRAM FOR RECOGNIZING IMAGE CONTOURS AS A SEQUENCE OF DIGITAL LINE SEGMENTS

Abstract: In the given work the algorithm of the recognition of the digital direct line segment in contours of the binary images and the software implementation of the algorithm is considered. Utilization of this algorithm to process the images will result in more natural and economical description in comparison with known ways of the description of the images. The considered algorithm and the software implementation can be used also for the description of contours when processing the half-tone and color images.

Key words: image, contour, digital straight segments, algorithm, program.

Abstract: This robot develops an algorithm for recognizing segments of digital straight lines in the contours of binary images, as well as a software implementation of the algorithm. The chosen algorithm for compiling the image will lead to the fact that the description of the image will be more natural and economical, aligned with the known methods of coding the image. The registered algorithm and software implementation can be combined to encode contours when processing monotone and color images. Key words: image, contour, sections of digital lines, algorithm, program.

Abstract: This paper discusses the algorithm for recognizing segments of digital lines in the contours of binary images and the software implementation of the algorithm. The use of this algorithm when processing images will lead to a more natural and economical description of images compared to known methods. The considered algorithm and software implementation can also be used to describe contours when processing halftone and color images. Keywords: image, contour, segments of digital lines, algorithm, program.

1. Introduction

Structural analysis of image contours as sequences of straight line segments and curved arcs is one of the main tasks in image processing for the purpose of their interpretation in artificial intelligence systems.

In most cases, an image can be considered as a part of a plane, divided into areas with constant or varying parameters according to some law, for example, optical density, color, texture, which determine the background and objects of the image. An integral property of each of these areas is its boundary, that is, a contour - a simply connected sequence consisting of straight segments and curved arcs. When processing a raster image, the outlines of objects are usually highlighted. However, the contours of objects, presented as a collection of individual, boundary pixels, are of little use for further processing, since they do not sufficiently express its geometric essence.

Recognition of image contours in the form of a sequence of straight segments can be considered one of the main tasks in the process of raster image processing. Solving the problem of representing a contour as a sequence of straight line segments allows us to obtain a description of the image in a compact form that is natural for human perception, invariant under affine transformations, and convenient, in particular, for processing by neural networks. Line segments are the main elements of a contour. The arc of a curve is also often replaced by a broken line inscribed in it, both in the basic principles of mathematical analysis and in a large number of practical applications.

Well-known methods and algorithms, in particular those proposed in the work, make it possible to obtain an approximate solution, which is not acceptable for all applications.

This paper examines the recognition of the contour of a binary image as a sequence of segments of digital straight lines without loss of information.

2. Contour as a sequence of segments of digital lines

This section discusses the structural analysis of image contours as a sequence of segments of digital lines, which are the initial data for segmenting the contour into arcs of digital curves and segments of digital lines.

We will limit ourselves to considering binary images, the objects of which are completely determined by the contours that limit them. Arcs of digital curves, like segments of digital straight lines, are formed by sampling images containing contours formed by segments of straight lines and arcs of curves.

The characteristic features of straight segments and curved arcs are lost during the transformation process. When viewing a sampled image at sufficient magnification, it is often difficult to recognize individual straight segments and curved arcs in sequence

vertical and horizontal segments. Additional difficulties arise during processing due to the fact that contour lines - mathematical lines without thickness - are displayed on the monitor screen as connected sequences of pixels, that is, visual lines with thickness.

To eliminate problems arising from this, we will consider the image obtained from the original as a result of discretization as a two-dimensional cellular complex. In this case

pixels are the two-dimensional elements of this cellular complex. In addition to pixels, there are cracks and dots. Cracks are

sides of pixels that are one-dimensional elements. The points are the end points of the cracks and the corner points of the pixels. Points are zero-dimensional elements. So

Thus, in the case under consideration, the contour of an object is a connected closed sequence of contour cracks bordering between the pixels of the object and the background. A contour can be described as a sequence of integer coordinates of points,

limiting contour cracks. As shown in , representing the image plane as

cell complex provides many benefits. In particular, the boundary of the region becomes a thin curve with zero area.

In Fig. 1 shows an example of the original contour of an object formed by an arc of a curve and a segment of a straight line, as well as its digital equivalent as a sequence of cracks. The points belonging to cracks of different directions are numbered. As in the works, by an L-element we mean a connected sequence of cracks of the same direction, emanating from a certain point and ending with a crack of the same or perpendicular direction. In Fig. Figure 1 shows one of the possible partitions of the contour into b-elements, which are formed by cracks between points: (0-2), (2-4), (4-6), (6-8), (8-9), (9 -11), (11-13), (13-15), (15-17), (17-19), (20-21), (21-23), (23-25), (25-27 ), (27-0). Each b-element is characterized by the following parameters: direction relative to its initial point g (accepted g = 0 - for the direction up, 1 - to the right, 2 - down, 3 - to the left); l - number of cracks in direction g (! = 1,2,...); the direction of the last crack q relative to the direction g of previous cracks (q = -1 - the last crack is directed to the left relative to the direction g, +1 - to the right, 0 - coincides with the direction g). The number of cracks l will be conventionally called the “length” of the b-element. For the b-element (0-2) g =0, !=3, q =+1. For b-element (27-0) g =3, =1, q =0.

The method for selecting segments of digital straight lines in a contour uses the following property of the sequence of L-elements forming the segment. Such a sequence includes b-elements with the same values ​​of g, q; their lengths take values!, +1. Moreover

the alternation of b-elements of length!, +1 is determined by the continued fraction obtained by dividing the integers n = Ax = |x1 - x2| and m = Ау = |у1 - у2\, where (х1зу1), (х2, у2) are the coordinates of the initial

and the end points of the segment: or

Let us assume for definiteness that n > m. As follows from formula (1), l - the integer part of the division of n by m - corresponds in a segment of the digital straight line to the number of l consecutive cracks of the same direction. Together with the adjacent perpendicular crack they form the b-element of length!. k1 consecutive b-elements of length l and one b-element of length!+1 (or k1 consecutive b-elements of length +1 and one b-element of length!) form a K1-element of “length” k1 (by analogy with “length "b-element). A b-element that differs in length by 1 from consecutive b-elements will be called a modified b-element of a given K1-element. Similarly, k2 consecutive K1-elements of “length” k1 and one K1-element of “length” k1 +1 (or k2 consecutive K1-elements of “length” k1 +1 and one K1-element of “length” k1) form K2- element of “length” k1. So

k + k 2 + k z +... + kg

further until the members of the continued fraction are exhausted. A K1 -element (generally a K-1 -element), differing in length by 1 from consecutive K1 -elements (Kg-1 -element), will be called a modified K1 -element (Kg-1 -element) of a given K2 -element (Kg -element). Thus, everyone

A digital line segment corresponds to a continued fraction, the elements of which determine the structure of this segment.

In the outline in Fig. 1, the following segments of digital straight lines can be distinguished: 0-3, 3-9, 910, 10-17, 17-0.

3. Selecting segments of a digital line in a contour

When processing image contours, in particular binary images, in a sequence of cracks forming a contour, it is necessary to select parts of the sequence that form straight segments. This problem can be considered as the problem of determining the elements of a continued fraction from a sequence of L-elements of a contour. This problem is inverse to the problem of determining the structure of a straight line segment from a sequence of terms of a continued fraction, obtained as the ratio of the differences in the coordinates of the beginning and end of the segment.

The method for selecting segments of a digital line consists of sequentially performing the following actions.

1. Identification of the sequence of b-elements in the sequence of cracks. This action meets the definition of a whole part! continued fraction (1).

2. Isolating a sequence of Kr -elements with r = 1 in a sequence of b -elements, and one of the b -elements of each K1 -element must contain 1 crack more or less than the others. This action corresponds to the definition of the k1th element of the continued fraction (1). After its execution, the value of r must be increased by 1.

3. Identification of the sequence of Kr-elements in the sequence of Kr-1 elements,

Moreover, one of the Kr-1 elements of each Kr element must contain one Kr-2 element more or less than the others. This action corresponds to the definition of the k(th element of the continued fraction (1). After its execution, the value of r must be increased by 1.

4. Point 3 is repeated until it is still possible to

form the Km element.

5. Determine the boundary points between two neighboring b-elements that are not included in the same Kr-element. These points are the end points of the digital line segments that form the contour.

Let's consider an algorithm for selecting line segments in a sequence of b-elements

Let [b5(/5,gs,qs)); 5 = 0.1,...,£ - sequence of b-elements forming the contour; x5,y5 - coordinates of the beginning of the th b-element; [hu, y y); y = ; r = 0.1,...,!; !< £ - множество

contour break points. Break points define the end points of the line segments that form the contour. Finding the break points means identifying the straight segments that form the contour.

Each segment under consideration is characterized by a Kg element, as well as a chain

fraction At the initial moment of recognition of a straight line segment, the elements of the corresponding continued fraction are equal to 0. The segment can be considered recognized if the parameters of the Kr element are recognized, including its order r and the values ​​of the elements of the corresponding continued fraction.

1. Initial conditions.

Given sequences [b5 (/5, gs, qs)) and (x5,y5) .

It is necessary to find the coordinates of the break points |x;.,y,).

k0р:= 0, к1р:= 0, к2р:= 0,..., кр:= 0 - working values ​​of the elements of the continued fraction.

Let's take point 5 =0 as the starting point of the first segment; i =0; i =0.

2. Take the first b-element in the sequence as the beginning of the first straight segment. Its starting point is x5,y. The length /=/0 is also the value of the first element of the continued fraction.

5:=5+1; k1p:=k1p+1.

3. Check for the next b-element whether they, together with the previous ones, form a K0-element.

3.1. If ((d3 == d3-1) && (d3 == 73-1)&& (4 == /3-1)), then the continuation of the Kg-element k0p:= k0p +1; 5:= 5 + 1; and continuation of a straight line segment. Go to step 3.

3.2. If ((d3 f d3-1) || (d3 f 73-1)11 (|/e - /є-1!>1)), then the end of the straight segment. Go to step 5.

3.3. If ((&== 9з+1) && (%== 7з+1)&& ((/з+1== /з+1)1! (/з - 1 == /3+1))), then the completion of the K0 -element; Ї = Ї+1.

4. Checking the continuation/completion of the K(-element.

4.1. If (k == 0), then k ^= k^; cr:= 0; k^1p:= k1+ 1p+1; 5:=5 +1; the beginning of the Km element.

Go to step 3.

4.2. If ((k iФ 0)&&(k == k^)), then k^1p:= k^1p+1; 5:= 5+1; continuation of the Ki+1 element. Go to step 3.

4.3. If ((k (Ф 0)&&((k+1== k1p)11(k1-1 == k^))), then Ї := +1; end of the Km-element.

Go to step 4.

4.4. If ((^ф0)&&(|к - к1р|>1)), then the end of the segment is a direct transition to step 5.

5. End of the segment.

X] = Xs; y = Uz; k1р:= 0, к2р:= 0,., кір:= 0; k:= 0, k2:= 0,., k:= 0.

If (s< S), то s:= s +1; переход к п. 2.

Otherwise, the end of the sequence of L-elements. The end of the algorithm.

Essentially, the proposed algorithm finds the elements of the continued fraction and for each obtained Kt -element and checks whether the continued fraction of the newly constructed Kt -element matches the already constructed one.

4. Program for selecting digital line segments

As can be seen from the description of the algorithm, it contains a significant number of conditional transitions, the use of which contradicts the recommendations of structured programming due to the difficulties that arise when debugging programs. In addition, the number of parameters Kt in advance

it is impossible to determine, since the variable t is not limited in advance. Limit t value

This means limiting the image sizes in advance. Software implementation, especially debugging of the proposed algorithm using trivial means, is significantly difficult for the above reasons. The difficulties of developing and debugging a software implementation can be reduced if you use modern object-oriented programming tools.

The proposed algorithm is implemented in the form of the LINESEGM program, which is part of the laboratory software package for image processing in the Visual C++ environment.

As initial information, the LINESEGM program uses sequences of L-elements constructed for each of the contours of the processed image.

The result of the program is a connected sequence of segments of digital lines constructed for each contour, represented by the coordinates of the end points of the segments.

As can be seen from the algorithm, the operations of constructing Kt -elements from Kt-l -elements

are the same for all values ​​of t. Note that the initial value t = 0 and during the operation of the algorithm increases each time by 1. The special class CKForLn includes methods corresponding to the operations of the algorithm. During the operation of a program that implements the algorithm, for each increase in t by 1, a new object is created containing functions that perform the operations of the algorithm for each value of t.

Considering that at the zero level K0 -elements are formed not from K -elements, but from L -

elements, to implement the algorithm at the zero level, a special modification of the CKForLn class was created - the Cmini class.

The principle of operation of the program is that for each value of t, the program implements an object of the CKForLn class of the t-th level, containing functions that determine the parameters of the Kt element. The initial parameters of the Kt element are the parameters already

a completed Kt-l element, the parameters of which were determined by an object of the CKForLn t-1 class

Wow level.

Objects of the CKForLn class are implemented as conditions arise, namely: the need to construct a K-element of the next level, and are accumulated in a special dynamic array. A zero-level object is created immediately at the start of the program.

Implementing objects in a dynamic array as t increases allows you to not impose restrictions on the size of the image. Limits on image size are determined only by the resources of the computer used.

When describing the operation of the program, the concept of completed Kt will be used -

element. Each completed Kt -element contains kt Kt-l -elements and one modified Kt-l -element, which contains kt-l ±1 Kt-2 -elements, in contrast to an incomplete Kt -element, which does not contain an incomplete Kt -element.

The CKForLn class includes the following methods.

1. Method DK(), (define K-element) - determine the K-element.

To determine a Kt -element means to determine the number of Kt-1 -elements that form a given Kt -element.

2. Method VK§, (verify K-element) - check the identity of the K-element in question with a K-element of the same level, previously determined by the function of the DK() method.

3. Method DEO, (define the end of K-element) - determine the end of the K-element, that is, when defining the Kt-element, find its modified Kt-1 -element. The DE() method function of level t-1 is called by the DK() method function of level t.

4. Method VE(), (verify the end of K -element) - check the identity of the end of the considered K -element with the modified K -element, previously determined by the DE() method function.

The Cmini class includes the same methods, differing from the methods of the CKForLn class in that the Cmini class methods work with L -elements and determine or check K0 -elements.

Methods of the Cmini class

Methods of the Cmini class use as initial data sequences of L -elements constructed for each of the contours of the processed image, starting from the current number of the L -element at the time the method function is called.

The DK() method function of the Cmini class compares the parameters of each subsequent L -element with the parameters of the initial L -element until they match. If the parameters do not match, the DK() function checks whether the K0 element is complete and ends

work. A K0 -element is considered complete if it ends with a modified L -element, one whose length differs from other L -elements of the K0 -element by 1 (operation 3.1 for the beginning of the segment - t = 0).

The VK() method function checks whether the parameters of the next k0 L -elements match the parameters of the L -elements of the K0 -element previously defined by the DK() method function

the same level. If the parameters of the current K0 element coincide with the previously

defined, the VK() function forms a segment continuation sign and completes the work (operation 3.1 for segment continuation - t > 0).

Otherwise, the VK() function generates a segment completion sign and completes

The DE() method function compares the parameters of the current Kci element with the parameters of the K0 element previously defined by the DK() function to determine whether the current K0 element is modified. If other parameters are equal, the number of L -elements k0

modified K0 -element compared to the K0 -element previously determined

function DK(), must differ by 1 (operation 3.2, 3.3 to determine completion

the initial K0 element of the segment - t = 0). Result - parameters of the modified K0 element

are used in the VE() method of the Cmini class.

The VE() method function compares the parameters of the current K0 element with the parameters of the changed K0 element, previously determined by the DE() function, to determine

whether they coincide (operation 3.2, 3.3 to continue the segment - t > 0). The result - a sign of a match or mismatch - is used in the VК() method of the CKForLn class.

Methods of the CKForLn class

The methods use the parameters of K-elements constructed for the lowest level as initial data. That is, to determine the parameters of the Kt element, the parameters are used

already constructed Kt-l -elements.

The DK() method function of level t of the CKForLn class, when defining the parameters of a ^-element, calls the VK() function of level t-1 of the CKForLn class, which checks whether an already defined Kt-l element is followed by a Kt-l element with the same parameters. If yes, the VK() function call is repeated. In this case, the number of repetitions is counted, that is, the parameter kt is determined.

Otherwise, the level t DK() function calls the t-1 level DE() function to determine the modified Kt-l element and exits. At the end of the work, the function DK() of level t of the CKForLn class determines the parameters and generates signs of a completed or incomplete Kt element (operation 4.1, 4.2 at the current maximum value of t).

The function of the VK() method of level t of the CKForLn class checks whether the parameters of the following kt Kt -elements match the parameters of the Kt -element previously defined

function of the DK() method of the same level. If the parameters of the current Kt element coincide with

predetermined by the function DK() Kt -element of the same level, function VK()

generates a segment continuation sign and completes the work.

Otherwise, the VK() function generates a segment completion sign and completes the work (operation 4.1,4.2 with the current value of t being less than the maximum).

Kt -elementThe function of the DE0 method of level t of the CKForLn class, when determining the parameters of a Kt -element, compares the parameters of the current Kt -element with the parameters of the Kt -element previously defined by the DK() function to determine whether the current Kt -element is modified. If other parameters are equal, their kt-1 values ​​must differ by 1. If this condition is met, the DE() function generates a sign of a changed Kt -element and

completes work (operation 4.3, 4.4 at the current maximum value of t).

The function of the VE() method of level t of the CKForLn class compares the parameters of the current Kt element with the parameters of the modified Kt element previously allocated by the DE() function to determine whether the values ​​of their parameters coincide.

If the parameter values ​​of the current Kt element coincide with the previously

defined by the DK() function of the same level, the VK() function generates a sign of coincidence of the parameter values ​​and completes the work (operation 4.3,4.4 with the current value of t less than the maximum).

The timing diagram (Fig. 2) illustrates the operation of the LINESEGM program using the example of recognizing a straight line segment. The lower part of the figure shows a part of a digital line, consisting of L-elements of the same main and auxiliary directions and different lengths.

At step 0, an object of the Stipi class was created, which defines the parameters of the K0 element.

At step 10, the determination of the parameters of the K0 element is completed and object 1 of the class SKrogbn is created, which uses the functions of the previously created object to determine the parameters of the K1 element. At step 19, the determination of the parameters of the K1 element is completed and object 2 of the class SKrogbn is created, which uses the functions of previously created objects to determine the parameters of the K2 element. At step 49, the determination of the parameters of the K2 element is completed and an object 3 of the class SKrogbn is created, which uses the functions of previously created objects to determine the parameters of the K3 element. Step 79 executes

condition for ending the segment. The operation of the program is described in detail in the appendix.

In section 0-6, two b-elements form an incomplete K0-element. It is obvious that b -

element 3-6 of length 3 completes the line segment, since the b-element 6-7 of length 1 cannot be its continuation. Thus, b-element 6-7 is the beginning of a segment of the digital line.

In Fig. Figure 3 shows an example of how the program works. The contour of the binary image of a curly arrow is divided by squares into straight segments. The result of the program - a sequence of straight line segments - was used to highlight the arcs of digital curves. Large squares show the boundaries of the arcs of digital curves.

The work of the program has been tested on a significant number (more than 2000) of examples and is used in the study of structural analysis of halftone images.

5. Operation of the program for recognizing line segments

Let's look at the operation of the iYEBESM program using the example of Fig. 4. The lower part of the figure shows a part of a digital line, consisting of b-elements of the same main and auxiliary directions and different lengths. In section 0-6, two b-elements form an incomplete K0-

Rice. 3. An example of the work of a program for structural analysis of a contour - segmentation of a contour by segments of digital lines

element. Obviously, the b-element 3-6 of length 3 completes the line segment, since the b-element 6-7 of length 1 cannot be its continuation. Thus, b-element 6-7 is the beginning of a segment of the digital line.

The work of the program to determine the next straight line segment begins with the zero-level OK() function, which determines the completed K0 -element 6-10, consisting of b -elements

lengths 1,1,2; k0=2. This K0 element is the starting element for the K1 element. The program creates a first-level object and transfers control to the OK() function of this object. The OK() function at level 1 calls the VKQ function at level 0. The VKQ function compares the parameters of the b-elements of the K0-element 6-10 with subsequent b-elements and confirms the presence of the K0-element 10-14,

identical to K0 -element 6-10. Continuing its work, the VKQ function detects that the next b elements do not form the same K0 element, exits and transfers control to the level 1 OK() function. The level 1 OK() function calls the level 0 OE() function. This last one, starting with b-element 6-7, determines the presence of a modified K0-element 14-19, consisting of b-elements of lengths 1,1,1,2; k0=3, completes the work and transfers control to the OK() function of level 1. This function determines the presence of a completed K1 element 6-19, consisting of two K0 -

elements 1,1,2, (k1=2) and one modified 1,1,1,2 (k0=3). The program creates a second-level object and transfers control to the OK() function of this object. The OK() function of level 2 calls the VKQ function of level 1, which, in turn, calls the VKQ function of level 0. The VKQ function compares the parameters of the b elements of K0 elements 6-10 with subsequent b -

elements and confirms the presence of K0 elements 19-23, 23-27, identical to K0 element 6-10, that is, the same number of such K0 elements contained in K1 element 6-19. Next, the VKQ function of level 0 returns control with the sign of continuation of the segment of the VKQ function of level 1. The VKQ function calls the VE0 function of level 0, which determines the presence of a changed K0 -

element 27-32, identical to K0 -element 14-19. Thus, K1-element 19-32 is defined,

identical to K1-element 6-19. Further, the VKQ function of level 1 does not define the next K1-element, identical to the K1-element 6-19, due to the fact that the VE0 function of level 0 does not determine the changed K1-element, identical to the K1-element 6-19, starting with the b-element 40-41, and returns control to the OK() function of level 2. The OK() function of level 2 calls the OE() function of level 1, which determines the presence of a modified K1 element 32-49, consisting of K0 elements 32-36, 36- 40,

40-44, 44-49. Next, the K2 element 6-49 is determined, a level 3 object is formed, and the modified K2 element 49-79 is determined. These two K2 elements form the K3 element 6-79. This completes the construction of the segment, since the next b-elements 79-81 and 81-83 do not form a K0-element,

identical to K0 -element 6-10, and the VKQ function of level 0 does not generate a continuation sign

segment. In the sequence of L-elements, a segment of the digital line 6-79 is highlighted. The program begins determining the next segment, starting with b-element 80-82.

b. conclusions

1. A new algorithm for identifying line segments in image contours and a non-trivial software implementation of the algorithm are proposed, which make it possible to obtain an exact solution to the problem of recognizing image contours as sequences of line segments.

2. The software implementation of the algorithm for selecting straight line segments in image contours was made using modern object-oriented programming tools, which made it possible not to impose obvious restrictions on the size of the processed image while maximizing the use of the resources of the computer used.

3. Based on the proposed algorithm and its software implementation, a theoretical solution was obtained and experiments were conducted on recognizing arcs of digital curves and segmenting the contour of binary images on a segment of digital straight lines and arcs of digital curves.

BIBLIOGRAPHY

1. Kovalevsky V.A. Applications of Digital Straight Segments to Economical Image Encoding, In Proceedings of the 7th International Workshop, DGCI"97, Montpellier. - France, 1997. - December 3-5. - P. 51-62.

2. Kalmykov V.G. Structural method of description and recognition of segments of digital lines in the contours of binary images // Piece intelligence. - 2002. - No. 4. - P. 450-457.

3. Kalmykov V.G., Vishnevsky V.V. Analysis of object contours in binary images // Mathematical machines and systems. - 1997. - No. 2. - P. 68 - 71.

4. Kalmikov V.G. Arc of digital curve - valued and stagnated // Processing of signals and display and recognition of patterns. Proceedings of this All-Ukrainian International Conference. - Kyiv. - 2004. - 11 - 15 zhovten.



Have questions?

Report a typo

Text that will be sent to our editors: