Optimal Edge-Based Shape Detection


Summary

Identifying two-dimensional shapes has been a classical problem in computer vision. Roads and vehicles in aerial images are examples of such shapes which are well approximated by straight lines and rectangles. Human facial features such as eyes and mouth can also be regarded as simple shapes that vary only in small ranges.

In most applications, it is difficult to model the intensity values of objects and their background. Therefore, it is reasonable to exploit the intensity differential along the object's boundary. The intensity change along the boundary is usually modeled as a step edge. Once the pixels with high intensity gradient are chosen, these pixels should be examined to check if they lie on an expected shape boundary.

Edge-based shape detection methods all suffer from the same problem: loss of information at the edge detection stage and the difficulty of statistical performance analysis. If we examine whether the given shape is present at a position, by computing the intensity changes around the hypothetical shape contour at the same time, we only decide once whether the response is strong enough.

We define the shape matched operator by extending the optimal step edge operator along the hypothetical object boundary contour so that responses from intensity differences along the boundary are collected simultaneously. We can prove that this filtering is in fact equivalent to collecting the gradient along the shape boundary. Moreover, since the responses are averaged over the neighborhood of the contour, our method is more robust than merely summing up the gradient magnitudes. Errors in shape position and response estimates averaged over boundary pixels are reduced proportional to the size of the operator. In our filtering approach, edges with weak responses can contribute to the shape response, so that an object having very low contrast on all or some portion of its boundary can be correctly detected. By shifting the sampling of operator values, we can estimate the positions of objects having very small numbers of pixels, with sub-pixel accuracy.

In an effort to ensure an accurate shape detection scheme, we set up a criterion for an optimal step edge operator and derive a closed-form solution, which is the derivative of the double exponential (DODE) function. We have verified that the DODE operator gives better performance than the derivative of Gaussian (DOG) operator, which is widely used for edge detection. The simplicity of this approach facilitates statistical analysis based on simple assumptions: We have found a way to compute the response profile of such an operator, using the local linearity of the response. We are also able to formulate statistical properties of this detection procedure under the Gaussian white noise assumption, and thus predict its detection and localization performance.


Optimal edge operator

Edge detection is essentially finding a high intensity gradient. Ideally, the differential operator should be the optimal edge operator. In the presence of noise, we need to suppress high frequency intensity structure while preserving the global step edge structure. Therefore, the derivative of the optimal smoothing operator should be the optimal edge operator by the following simple relation between differentiation and convolution:

\begin{displaymath}(h*f)'=h'*f \end{displaymath}

Gaussian convolution has been the most popular scheme for smoothing, and this agrees with the similarity between the derivative of Gaussian and the optimal Canny operator.

In this work, we derive a one-dimensional smoothing operator for a step function using a different criterion: minimizing the sum of the noise power and the mean squared error between input and output. Since this operator suppresses noise while preserving the step shape in an optimal way, the derivative of the response function is less noisy and close to an impulse function, thus achieving very accurate detection and localization of the step edge.


Optimal smoothing operator

Let the step edge with amplitude $\alpha$ be

\begin{displaymath}X(t) = \lbrace{ \begin{array}{cc} 0, \;\;\; x < 0\\
\alpha, \;\;\; x\geq 0  \end{array} } \end{displaymath}

which is corrupted by noise $N(t)$ with spectral density $S_N(t) = \mu \cdot \delta(t)$:


\begin{displaymath}\tilde{X}(t) = X(t) + N(t) \nonumber \end{displaymath}

We want to find an optimal smoothing filter $h$ that minimizes the squared sum $\overline{E^2} + \overline{M^2}$, where $\overline{E^2}$ is the mean squared difference between the input and output signals, and $\overline{M^2}$ is the mean squared sum of the output noise response.

\begin{displaymath}\tilde{Y}(t) = h*\tilde{X}(t) = h*X(t) + h*N(t) \end{displaymath}

Let $Y(t)=h*X(t)$, $M(t)=h*N(t)$ and $E(t) = X(t)-Y(t)$.

The mean squared errors, in terms of their frequency domain representations, are

$\displaystyle \overline{E^2}$ $\textstyle =$ $\displaystyle \int \vert F_E(\omega)\vert^2 d\omega = \int F_E(\omega)F_E(-\omega) d\omega \nonumber$  
  $\textstyle =$ $\displaystyle \int S_X(\omega)(1-H(\omega))(1-H(-\omega)) d\omega$  


$\displaystyle \overline{M^2}$ $\textstyle =$ $\displaystyle \int \vert F_M(\omega)\vert^2 d\omega = \int F_M(\omega)F_M(-\omega) d\omega \nonumber$  
  $\textstyle =$ $\displaystyle \int S_N(\omega)H(\omega)H(-\omega) d\omega$  

where $S_X(\omega)$ and $S_N(\omega)$ are the spectral densities of $X$ and $N$.

After simple algebraic operations, it can be proved that the familiar Wiener filter

\begin{displaymath}H(\omega) = \frac{S_X(\omega)}{S_X(\omega)+S_N(\omega)} \end{displaymath}

minimizes our criterion function.

Since we have a step edge and white noise, $S_X(\omega) = (\alpha^2/4)\delta(\omega) + (\alpha^2)/(4\pi^2\omega^2)$, and $S_N(\omega) = \mu^2$.

The optimal filter is then given by

$\displaystyle H(\omega)$ $\textstyle =$ $\displaystyle \frac{\frac{\alpha^2}{4}\delta(\omega) + \frac{\alpha^2}{4\pi^2\o...
... + \frac{\alpha^2}{4\pi^2\omega^2} + \mu^2 } = \frac{d^2}{d^2 + 4\pi^2\omega^2}$  

where $d=(\alpha/\mu)$; therefore

\begin{displaymath}h(t) = \frac{d}{2}\exp(-d\vert t\vert) \end{displaymath}

The one-dimensional smoothing operator derived above can be extended to a two-dimensional operator for application to images:

\begin{displaymath}\overline{h}(x,y) = h((x^2+y^2)^\frac{1}{2}) \end{displaymath}

The optimal edge operator is merely the (piecewise) derivative $h'$ of the the smoothing operator, by the following relation:
$\displaystyle (h*I)' = h'*I$     (1)

The optimal step edge operator is shown in Figure 1(a). It is worthwhile to note that we can compute the optimal smoothing operator as long as we have models for edge profile and noise.

Figure 1: The optimal edge operator and its response to a noisy step edge
\begin{figure}\center{
\leavevmode
\epsfxsize = 10.5cm
\epsfbox{dexp_and_response.eps}}
\end{figure}


Experiments

The performance of the optimal step edge operator derived in the previous section is compared with the difference-of-boxes (DOB) operator and the DOG operator. An ideal step edge is corrupted by iid Gaussian noise and convolved with the three operators. The position giving the maximum response is chosen to be the edge location and the magnitude is stored as the edge strength (Figure 1(b)). The detection performance, for input noise level $\mu=4$ and step magnitude $\alpha=10$, measured by the mean squared error of the peak response (normalized by its signal power), is given in Figure 2(a). The corresponding localization performance measured by the mean squared error is shown in Figure 2(b). Observe that the detection performance of the DODE operator is very slightly poorer than that of the DOG operator, for the same scale of the width. This poses no significant problem, since extending the width of the operator always yields better performance. For localization, there is a significant performance difference in favor of the DODE operator, which cannot be overcome by extending the operator width. It is interesting to observe that the invariance of the combined performance exists for DOG when the localization error is small, but not with DODE, as previously mentioned. This is shown in Figure 2(c), where the product of the detection and localization errors is plotted. For implementation, the operator should be truncated to be used for either edge detection or shape detection.

Figure 2: Edge detection performance of DOB, DOG, and DODE operators
\begin{figure}\center{
\leavevmode
\epsfxsize = 16.5cm
\epsfbox{edgeperf.eps}}
\end{figure}

The performance comparison of DODE and DOG operators for the case of vehicle detection is shown in Figure 3. Use of the DODE operator again shows overall similar detection performance(Figure 3(a)) and significantly better localization performance (Figure 3(b)).

Figure 3: Vehicle detection performance of DOG and DODE operators
\begin{figure}\center{
\leavevmode
\epsfxsize = 11.0cm
\epsfbox{vd_operator_perf.eps}}
\end{figure}



Operator for arbitrary shapes

We model the intensity change at an object boundary point as a step function, and assume that the boundary is a smooth, simply connected contour. The smoothness condition can be dropped to accommodate the case of objects with piecewise smooth boundaries. The following derivation of the operator function can be easily generalized to include such shapes as polygons.

Let the object image be represented by the function $ I(\textbf{x})= \alpha \cdot 1_D(\textbf{x})$, where $D$ is a simply connected region representing the shape and let the boundary $C=\partial D$ be parametrized by $\textbf{c}(t), t \in J$. This assumption of uniformity of intensity is not critical in real applications, as long as there is an intensity difference on some portion of the shape boundary. We can find a level function $l$ satisfying

  $\textstyle \lbrace\textbf{x}: l(\textbf{x})=k \rbrace = C, \;\; \rm {for \;\; some} \;\; \it {k}>0$    
  $\textstyle \tilde{l}(s) = l(\textbf{c}(t)+s \cdot \bigtriangledown l(\textbf{c}(t))) = s$   (2)

The second equation holds where $l$ is well-defined.

We can construct $l$ for implementation:

\begin{displaymath}l(\textbf{x})=\left\{ \begin{array}{ll}
\min_{\textbf{z}\in ...
...rallel & \rm {for} \;\; \textbf{x} \in D^c \end{array} \right. \end{displaymath}

$l$ is well-defined and continuous, and locally smooth around the boundary contour. This choice of level function facilitates the computation and implementation of the operator for any arbitrary shape. The operator employing this choice of level function is shown in Figure 5. We can even accommodate open contours by replacing [inside:outside] by [one side:the other side] in the above definition.

Figure 4: An arbitrary shape and a matched operator
\begin{figure}\center{
\leavevmode
\epsfxsize = 12.5cm
\epsfbox{shape2.eps}}
\end{figure}

Let the operator function $f(\textbf{x})$ be

\begin{displaymath}f(\textbf{x})=h'(l(\textbf{x}),\sigma) \end{displaymath}

where $h(t)=h(t,\sigma)=\frac{1}{\sigma}\exp(-\vert t\vert/\sigma)$, the optimal one-dimensional smoothing operator. Here, the derivative is not defined at the origin. Note that the cross-section of $f(\textbf{x})$ along the normal direction at a point on the contour, i.e., the gradient direction for $l$, is $h'$, since the level function $l$ is constructed (2) so that it is the identity function along the cross-section. This operator is in fact a natural two-dimensional extension of edge detection.

If we define shape detection as the process of identifying the intensity changes along the shape boundary, we can claim that our scheme is optimal; convolving the ideal image with the shape operator is equivalent to computing intensity gradients after optimal smoothing.


Response Profile

The shape operator is put into a position in an image, and the responses are collected at the centroid of the operator. This operation is repeated for all possible positions, and the maximum response is chosen as indicating the presence of the specified shape. If we know the scale and the orientation of the object a priori, we simply take the convolution. Without such information, every possible orientation and scale value should be tried.

We have found that the response profile, with or without prior information about size or orientation parameters, is the same as the convolution response with the correctly matched operator, locally around the true centroid. We can use this local linearity property to theoretically predict detection and localization performances.



Statistical properties

We are able to derive some of the statistical properties of the shape detection process - its detection probability and localization error, assuming additive Gaussian iid noise.

We can compute the probability density function of the responses at points around the true centroid. Since this filtering is locally linear, the responses are correlated Gaussian, and we can get the covariance matrix using the convolution profile. Let $\underline{\textbf{x}}=(\textbf{x}_{1}, \textbf{x}_{2},
... , \textbf{x}_{N})$ be points around the true centroid, including the centroid; $\underline{y}=(y_{1}, y_{2}, ... , y_{N})$ be the corresponding responses; and $\Sigma$ be the covariance matrix. The ideal response profile $\underline{r}(\textbf{x}))=(r(\textbf{x}_{1}), r(\textbf{x}_{2}), ... , r(\textbf{x}_{N}))$ has been calculated above.

The pdf of $\underline{y}$ is given by

\begin{displaymath}p_Y(\underline{y})=\frac{1}{(2\pi)^{\frac{N}{2}}det\Sigma}\ex...
...{x}))^{T}\Sigma^{-1}(\underline{y}-\underline{r}(\textbf{x}))) \end{displaymath}

Note that the maximum response which is compared to the threshold is the largest order statistic $y_{(N)}$ from $y_{1}, y_{2}, ... ,y_{N}$, and the corresponding position $\textbf{x}_{i}$ is marked as the centroid. Therefore, the pdf of the maximum response is given by

\begin{displaymath}p_{max}(y) =\int \cdots \int^{y} N! p(u_{1},\cdots,u_{N-1},y) du_{N-1} \cdots du_{1} \end{displaymath}

The probability that this maximum occurs at position $x$, which gives the localization distribution, is given by
$\displaystyle p_{cent}(x_{i})$ $\textstyle =$ $\displaystyle Prob(\rm {maximum \;\; occurs \;\; at} \;\; x_{i})$  
  $\textstyle =$ $\displaystyle Prob(y_{1} \leq y_{i}, \cdots ,y_{N} \leq y_{i})$  
  $\textstyle =$ $\displaystyle \int \int^{y_{i}} \cdots \int^{y_{i}} p(y_{1},\cdots ,y_{N}) dy_{N} \cdots dy_{1} dy_{i}$  

We conducted experiments to verify the derived theoretical performance in the case of vehicle detection. First, the probability distribution $p_{\rm {centroid}}$ of the position estimate is computed by a randomized numerical integration algorithm according to the above formulation. It is compared with empirical distributions obtained by the following experiments.


Table I: Comparison of one-dimensional theoretical (top rows) and empirical (bottom rows) distributions

$\sigma$ -2 -1 0 +1 +2
12 0.00001 0.04416 0.91166 0.04416 0.00001
  0.00001 0.04329 0.91246 0.04424 0.00000
16 0.00051 0.09781 0.80336 0.09781 0.00051
  0.00047 0.09732 0.81841 0.09803 0.00044
20 0.00330 0.14343 0.70654 0.14343 0.00330
  0.00292 0.14337 0.70645 0.14242 0.00325


Figure 6: Two-dimensional theoretical and empirical distributions of the centroid estimate
\begin{figure}\center{
\leavevmode
\epsfxsize =13.5cm
\epsfbox{theo_emp_dist.eps}}
\end{figure}



Applications


Vehicle detection

Following figures shows the outputs of the vehicle detector, which gives the most probable vehicle dimensions as well as the locations of the centroids.

Figure 7: Parking lot image
\begin{figure}\center{
\leavevmode
\epsfxsize = 13cm
\epsfbox{pkg6_left.ps}}
\end{figure}

Figure 8: Detected vehicles
\begin{figure}\center{
\leavevmode
\epsfxsize = 13cm
\epsfbox{pkg6_left.thres35.vcent.ps}}
\end{figure}

The output of vehicle detection where the site information (parking lot orientation) is used, is shown in Figure 9.

Figure 9: Parking lot image and detected vehicles
\begin{figure}\center{
\leavevmode
\epsfxsize = 17cm
\epsfbox{fhn711_20.prior.eps}}
\end{figure}



human face feature detection

We designed a small set of operators for eyes and mouths using simple geoemtric curves, and applied them successfully to the facial feature detection problem.

Figure 10 shows detection results for face images from the FERET database. To limit the search space, the face center region is estimated using an ellipse-shaped operator, and is marked by a white dotted ellipse having the matched ellipse size. The face region detection is biased because we tried to fit simple ellipses to faces without a precise model. Iris and eyelid detections are marked by the corresponding shapes.

Figure 10: Faces with detected eyes and mouths
\begin{figure}\vspace{-0.5in}
\center{
\leavevmode
\epsfxsize = 16.0cm
\epsfbox{face4x4.eps}}
\end{figure}

We tested our algorithm on a group photo image which is more cluttered and has lower resolution than the FERET images. In this experiment we also used ellipse detection for face detection. The output is shown in Figure 11.

Figure 11: Face detection and facial feature detection for a group photo
\begin{figure}\center{
\leavevmode
\epsfxsize = 17.0cm
\epsfbox{voyager2.eps}}
\end{figure}

Figure 12 shows eye detection results on an MPEG-7 dataset. In these images, the person wears glasses, and the image acquisition conditions are worse than in the previous images: the face is rotated or shaded. As mentioned previously, we observe that the detection is robust and accurate despite unfavorable camera angle and illumination. Since we use the operator for irises as well as the operator for eyelids, the glasses don't give rise to false detections.

Figure 12: Eye detection under unfavorable imaging conditions
\begin{figure}\center{
\leavevmode
\epsfxsize = 16.5cm
\epsfbox{km_4.eps}}
\end{figure}

Once we have found the approximate geometric shape, more accurate outline of an object can be computed by parametrizing the contour and search for the maximum. For example, we constructed a family of curves using fourier basis, computed the corresponding shape operators, and searched for the best set of coefficients by employing simulated annealing. Figure 13 shows a structured shape, approximated outlines by straight lines, and the computed contours.

Figure 13: An arbitrary shape and fitted outline
\begin{figure}\center{
\leavevmode
\epsfxsize = 17.0cm
\epsfbox{shapefitting.eps}}
\end{figure}


Acknowledgments

We thank the Heinrich Hertz Institute of Germany for providing the MPEG-7 content set S4.


About this document ...

Optimal Edge-Based Shape Detection

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -html_version 3.2[math] -split 0 -nonavigation -antialias shape_html.tex

The translation was initiated by Hankyu Moon on 2001-02-21


Hankyu Moon 2001-02-21