Identifying two-dimensional shapes has been a classical problem in computer vision. Roads and vehicles in aerial images are examples of such shapes which are well approximated by straight lines and rectangles. Human facial features such as eyes and mouth can also be regarded as simple shapes that vary only in small ranges.
In most applications, it is difficult to model the intensity values of objects and their background. Therefore, it is reasonable to exploit the intensity differential along the object's boundary. The intensity change along the boundary is usually modeled as a step edge. Once the pixels with high intensity gradient are chosen, these pixels should be examined to check if they lie on an expected shape boundary.
Edge-based shape detection methods all suffer from the same problem: loss of information at the edge detection stage and the difficulty of statistical performance analysis. If we examine whether the given shape is present at a position, by computing the intensity changes around the hypothetical shape contour at the same time, we only decide once whether the response is strong enough.
We define the shape matched operator by extending the optimal step edge operator along the hypothetical object boundary contour so that responses from intensity differences along the boundary are collected simultaneously. We can prove that this filtering is in fact equivalent to collecting the gradient along the shape boundary. Moreover, since the responses are averaged over the neighborhood of the contour, our method is more robust than merely summing up the gradient magnitudes. Errors in shape position and response estimates averaged over boundary pixels are reduced proportional to the size of the operator. In our filtering approach, edges with weak responses can contribute to the shape response, so that an object having very low contrast on all or some portion of its boundary can be correctly detected. By shifting the sampling of operator values, we can estimate the positions of objects having very small numbers of pixels, with sub-pixel accuracy.
In an effort to ensure an accurate shape detection scheme, we set up a criterion for an optimal step edge operator and derive a closed-form solution, which is the derivative of the double exponential (DODE) function. We have verified that the DODE operator gives better performance than the derivative of Gaussian (DOG) operator, which is widely used for edge detection. The simplicity of this approach facilitates statistical analysis based on simple assumptions: We have found a way to compute the response profile of such an operator, using the local linearity of the response. We are also able to formulate statistical properties of this detection procedure under the Gaussian white noise assumption, and thus predict its detection and localization performance.
Edge detection is essentially finding a high intensity gradient. Ideally, the differential operator should be the optimal edge operator.
In the presence of noise, we need to suppress high frequency intensity structure while preserving the global step edge structure. Therefore, the derivative of the optimal smoothing operator should be the optimal edge operator by the following simple relation between differentiation and convolution:
In this work, we derive a one-dimensional smoothing operator for a step function using a different criterion: minimizing the sum of the noise power and the mean squared error between input and output. Since this operator suppresses noise while preserving the step shape in an optimal way, the derivative of the response function is less noisy and close to an impulse function, thus achieving very accurate detection and localization of the step edge.
Let the step edge with amplitude
be
We want to find an optimal smoothing filter
that minimizes the squared sum
, where
is the mean squared difference between the input and output signals, and
is the mean squared sum of the output noise response.
The mean squared errors, in terms of their frequency domain representations, are
After simple algebraic operations, it can be proved that the familiar Wiener filter
Since we have a step edge and white noise,
, and
.
The optimal filter is then given by
![]() |
The one-dimensional smoothing operator derived above can be extended to a two-dimensional operator for application to images:
The performance of the optimal step edge operator derived in the previous section is compared with the difference-of-boxes (DOB) operator and the DOG operator. An ideal step edge is corrupted by iid Gaussian noise and convolved with the three operators. The position giving the maximum response is chosen to be the edge location and the magnitude is stored as the edge strength (Figure 1(b)). The detection performance, for input noise level
and step magnitude
, measured by the mean squared error of the peak response (normalized by its signal power), is given in Figure 2(a). The corresponding localization performance measured by the mean squared error is shown in Figure 2(b). Observe that the detection performance of the DODE operator is very slightly poorer than that of the DOG operator, for the same scale of the width. This poses no significant problem, since extending the width of the operator always yields better performance. For localization, there is a significant performance difference in favor of the DODE operator, which cannot be overcome by extending the operator width. It is interesting to observe that the invariance of the combined performance exists for DOG when the localization error is small, but not with DODE, as previously mentioned. This is shown in Figure 2(c), where the product of the detection and localization errors is plotted. For implementation, the operator should be truncated to be used for either edge detection or shape detection.
The performance comparison of DODE and DOG operators for the case of vehicle detection is shown in Figure 3. Use of the DODE operator again shows overall similar detection performance(Figure 3(a)) and significantly better localization performance (Figure 3(b)).
We model the intensity change at an object boundary point as a step function, and assume that the boundary is a smooth, simply connected contour. The smoothness condition can be dropped to accommodate the case of objects with piecewise smooth boundaries. The following derivation of the operator function can be easily generalized to include such shapes as polygons.
Let the object image be represented by the function
, where
is a simply connected region representing the shape and let the boundary
be parametrized by
.
This assumption of uniformity of intensity is not critical in real applications, as long as there is an intensity difference on some portion of the shape boundary. We can find a level function
satisfying
We can construct
for implementation:
Let the operator function
be
If we define shape detection as the process of identifying the intensity changes along the shape boundary, we can claim that our scheme is optimal; convolving the ideal image with the shape operator is equivalent to computing intensity gradients after optimal smoothing.
The shape operator is put into a position in an image, and the responses are collected at the centroid of the operator. This operation is repeated for all possible positions, and the maximum response is chosen as indicating the presence of the specified shape. If we know the scale and the orientation of the object a priori, we simply take the convolution. Without such information, every possible orientation and scale value should be tried.
We have found that the response profile, with or without prior information about size or orientation parameters, is the same as the convolution response with the correctly matched operator, locally around the true centroid. We can use this local linearity property to theoretically predict detection and localization performances.
We are able to derive some of the statistical properties of the shape detection process - its detection probability and localization error, assuming additive Gaussian iid noise.
We can compute the probability density function of the responses at points around the true centroid. Since this filtering is locally linear, the
responses are correlated Gaussian, and we can get the covariance matrix using
the convolution profile. Let
be points around the true centroid, including the
centroid;
be the corresponding
responses; and
be the covariance matrix. The ideal response profile
has been calculated above.
The pdf of
is given by
![]() |
The output of vehicle detection where the site information (parking lot orientation) is used, is shown in Figure 9.
Figure 10 shows detection results for face images from the FERET database. To limit the search space, the face center region is estimated using an ellipse-shaped operator, and is marked by a white dotted ellipse having the matched ellipse size. The face region detection is biased because we tried to fit simple ellipses to faces without a precise model. Iris and eyelid detections are marked by the corresponding shapes.
We tested our algorithm on a group photo image which is more cluttered and has lower resolution than the FERET images. In this experiment we also used ellipse detection for face detection. The output is shown in Figure 11.
Figure 12 shows eye detection results on an MPEG-7 dataset. In these images, the person wears glasses, and the image acquisition conditions are worse than in the previous images: the face is rotated or shaded. As mentioned previously, we observe that the detection is robust and accurate despite unfavorable camera angle and illumination. Since we use the operator for irises as well as the operator for eyelids, the glasses don't give rise to false detections.
Once we have found the approximate geometric shape, more accurate outline of an object can be computed by parametrizing the contour and search for the maximum. For example, we constructed a family of curves using fourier basis, computed the corresponding shape operators, and searched for the best set of coefficients by employing simulated annealing. Figure 13 shows a structured shape, approximated outlines by straight lines, and the computed contours.
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -html_version 3.2[math] -split 0 -nonavigation -antialias shape_html.tex
The translation was initiated by Hankyu Moon on 2001-02-21