Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis

Zhuozheng Wang; J. R. Deller Jr.; Blair D. Fleet

doi:10.1117/1.JEI.25.1.013007

14 January 2016 Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis

Zhuozheng Wang, J. R. Deller Jr., Blair D. Fleet

Author Affiliations +

Journal of Electronic Imaging, Vol. 25, Issue 1, 013007 (January 2016). https://doi.org/10.1117/1.JEI.25.1.013007

Abstract

Acquired digital images are often corrupted by a lack of camera focus, faulty illumination, or missing data. An algorithm is presented for fusion of multiple corrupted images of a scene using the lifting wavelet transform. The method employs adaptive fusion arithmetic based on matrix completion and self-adaptive regional variance estimation. Characteristics of the wavelet coefficients are used to adaptively select fusion rules. Robust principal component analysis is applied to low-frequency image components, and regional variance estimation is applied to high-frequency components. Experiments reveal that the method is effective for multifocus, visible-light, and infrared image fusion. Compared with traditional algorithms, the new algorithm not only increases the amount of preserved information and clarity but also improves robustness.

1. Introduction

Multisensor image fusion is the process of combining two or more images of a scene to create a single image that is more informative than any of the input images.¹ Image-fusion technology is employed in numerous applications including visual interpretation, image drawing, geographical information gathering, and military target reconnaissance and surveillance. In particular, research into techniques for image fusion by contrast reversal in local image regions has important theoretical and practical significance.¹

Image-fusion methods are classified as spatial- or transform-domain techniques. Spatial-domain methods are simple, but generally result in images with insufficient detail. Transform-domain strategies based on image-fusion arithmetic and wavelet transformations (WTs) represent the current state of the art. Wavelets can be used to resolve an original image into a series of subimages with different spatial resolutions and frequency-domain characteristics. This representation fully reflects local variations in the original image. In addition, WTs can affect multiresolution analysis,²^,³ perfect refactoring, as well as orthogonal features.⁴ Image-fusion arithmetic based on WT coefficients can flexibly resolve multidimensional low-frequency and high-frequency image components. Wavelet transforms can also realize multisensor image fusion using rules that emphasize critical features of the scene.⁵^,⁶

Traditional convolution-based WT methods for multiresolution analysis have been widely applied to image fusion for images with a large number of pixels, but the memory and the computational requirements for these techniques, and their Fourier-domain equivalents, can be substantial. Attempts to create more efficient algorithms in the transform domain have employed the lifting wavelet transform (LWT).⁷^–⁹ Also known as the second-generation WT,¹⁰ the LWT is not dependent upon the Fourier transform. Rather, all operations are carried out in the spatial domain. Image reconstruction is achieved by simply adjusting the calculation and sign orders in the decomposition process,¹¹ thereby reducing two-dimensional image data computation by half, and the data storage to about 75%.

One important motivation for the use of WTs in image processing is their ability to segregate low-frequency content that is critical for interpretation. Traditional image-fusion methods are based on selecting these significant wavelet decomposition coefficients.¹²^–¹⁴ Even with the effective separation and processing of low-frequency components afforded by WT decomposition, such an approach fails to take into full account the relationships among multiple input images. The result can be adverse fusion effects. Significant information can be lost when local area variance corresponding to pixels across images is small.⁸^,⁹

Other algorithms use principal component analysis (PCA) to estimate the wavelet coefficients. This method works well in low-noise environments, but PCA breaks down when corruption is severe, even if only very few of the observations are affected.¹⁵ For example, consider the two PCA simulation results shown in Fig. 1. Suppose that the light line in Fig. 1(a) represents an object in an image, and that the “ $\times$ ” markers represent samples of that object that have been corrupted by low-level Gaussian noise. The reconstruction of the object from the samples using the classical PCA approach is shown as a heavy line. The results of a similar experiment are shown in Fig. 1(b) where the PCA reconstruction is seriously in error as the result of a single noise outlier in the sampling process.

Fig. 1

PCA reconstructions fails when data are corrupted by large errors: (a) samples corrupted by low-level noise and (b) samples include one noise outlier.

To remedy shortcomings in the current methods, this paper presents an improved image-fusion algorithm based on the LWT. For low-frequency image components represented in the LWT decomposition, scale coefficients are determined through matrix completion¹⁶ instead of PCA. For the high-frequency detail and edge information, the LWT coefficients are chosen through self-adaptive regional variance estimation.

2. Matrix Completion and Robust Principal Component Analysis

2.1.

Overview

The matrix completion problem has been the subject of intense research in recent years. Candés et al.¹⁷ verify that the $ℓ_{0}$ -norm optimization problem is equal to $ℓ_{1}$ -norm optimization under a restricted isometry property. Candés and Recht¹⁶ demonstrate exact matrix completion using convex optimization. The “nuclear norm” of the matrix $X \in R^{N \times N}$ ,

Eq. (1)

{‖ X ‖}_{*} = \sum_{k} σ_{k} (X),

in which

σ_{k} (X)

denotes the

k

’th largest singular value, can be used to approximate the matrix rank,

ρ (X)

. The method yields a convex minimization problem for which there are numerous efficient solutions. Candés and Recht¹⁶ prove that if the number,

S

, of sampled entries obeys

Eq. (2)

S \geq C N^{1.2} ρ (X) \log N

for some positive constant

C

, then

N \times N

matrix

X

can be perfectly recovered with probability

\approx 1

, by solving a simple convex optimization problem.

Lin and Ma¹⁵ report a fast, scalable algorithm for solving the robust PCA (RPCA) problem. The method is based on recovering a low-rank matrix with an unknown fraction of corrupted entries. The mathematical model for estimating the low-dimensional subspace is to find a low-rank matrix. The algorithm proceeds as follows: given a matrix $A \in R^{M \times N}$ with $ρ (A) ≪ \min (M, N)$ , the rank is the target dimension of the subspace. The observation matrix $D$ is modeled as

Eq. (3)

D = P_{Ω} (A) + E,

in which

P_{Ω} (\cdot)

is a subsampling projection operator and

E

represents a matrix of unmodeled perturbations that is assumed sparse relative to

A

.

2.2.

Matrix Completion

The objective of matrix completion is to recover in the low-dimensional subspace the truly low-rank matrix $A$ from $D$ , under the working assumption that $E$ is zero. That is, we seek

Eq. (4)

A = \underset{A^{'} \in R^{N \times M}}{argmin} {‖ A^{'} ‖}_{*}, subject to P_{Ω} (A) = D .

It has been shown that the solution to this convex relaxation represents an exact recovery of the matrix

A

under quite general conditions.¹⁶ Further, the recovery is robust to noise with small magnitude bounds; that is, when the elements of

E

are small and bounded. For example, if

E

is a white noise matrix with standard deviation

σ

, and Frobenius norm

{‖ E ‖}_{F} < ε

, then the recovered

D

will be in a small neighborhood of

A

with high probability if

ε^{2} \leq (M + \sqrt{8 M}) σ^{2}

.¹⁸

2.3.

Robust Principal Component Analysis

Conventional PCA is often used to estimate a low-dimensional subspace via the following constrained optimization problem: In the observation model Eq. (5), minimize the difference in the matrices $A$ and $D$ by solving

Eq. (5)

\min_{A, E} {‖ E ‖}_{F}, subject to ρ (A) \leq r, D = A + E,

where

r ≪ \min {M, N}

is the target dimension of the subspace, and the use of the Frobenius norm represents an assumption that the matrix elements are corrupted by additive i.i.d. Gaussian noise. PCA works well in practice as long as the magnitude of noise is small. To use PCA, the singular value decomposition (SVD) of

D

is used to project the columns of

D

onto the subspace spanned by the

r

principal left singular vectors of

D

.

RPCA employs an identity operator $P_{Ω} (\cdot)$ and sparse matrix $E$ which differ from those in the matrix completion and PCA approach. Wright et al.¹⁹ and Candés et al.²⁰ have shown that, for a sufficiently sparse error matrix, a low-rank matrix $A$ can be recovered exactly from the observation matrix $D$ by solving the following convex optimization problem:

Eq. (6)

A = \underset{A^{'}}{argmin} {{‖ A^{'} ‖}_{*} + λ {‖ E ‖}_{1}}, subject to D = A + E,

where

λ

is a positive weighting parameter. RPCA has been used for background modeling, removing shadows from face images, alignment of the human face, and video denoising.²¹^,²²

In the present paper, RPCA is coupled with the “inexact augmented Lagrange multiplier” (IALM)¹⁵ method to determine the low-frequency LWT coefficients for fusion of corrupted images. The IALM method is described in Sec. 3.2 after introducing the general procedure.

3. Frequency-Domain Fusion Rules

3.1.

Overview

By adopting separate fusion strategies for high- and low-frequency components, the WT can differentially preserve the critical features that accompany these separate bands. The procedure that exploits this property is shown in Fig. 2. The source images are converted to frequency-domain coefficients by the LWT. Frequency-band-dependent fusion rules are applied to the low- and high-frequency components of each image. The inverse lifting wavelet transform (ILWT) is used to reconstruct the fused image.

Fig. 2

Image fusion processing based on wavelet transform.

3.2.

Low-Frequency Fusion Based on Inexact Augmented Lagrange Multiplier

Weighted average coefficients are often employed to fuse low-frequency wavelet coefficients. This method is effective when the coefficients of the fused images are similar. However, when contrast reversal occurs in local regions of an image, this procedure results in a loss of image detail in the fused image due to reduced contrast. Further, erroneous or missing regions of corrupted images strongly affect PCA results. These inadequacies of the weighted average method and PCA provide the motivation for using RPCA to determine the weighting of low-frequency coefficients.

There is ordinarily little difference in the low-frequency coefficient values extracted by the LWT from different images of the same scene. RPCA coefficients are used to represent low-frequency content in an attempt to preserve fidelity and coherency between the subbands. Algorithms have been developed in this research to solve the RPCA problem that is the basis for the recovery of the low-rank matrix $A$ and the estimation of the sparse matrix $E$ from the observation matrix $D$ . We employ the IALM method to compute the low-frequency subband coefficients. The method is sketched as follows.

Let $Γ = {I_{k} \in R^{N_{1} \times N_{2}}}_{k = 1}^{K}$ denote a set of corrupted images from $K$ sensors, and let $\tilde{Γ} = {{\tilde{I}}_{k} \in R^{(N_{1} \times N_{2}) / 4^{L}}}_{k = 1}^{K}$ be the corresponding set of low-frequency subimages computed using the LWT. $L$ is the number of LWT layers. For simplicity, we assume square images so that $N_{1} / 4^{L} = N_{2} / 4^{L} \overset{def}{=} N$ . Stack all $N$ columns of each ${\tilde{I}}_{k}$ into a single vector of dimension $N^{2}$ , then use these vectors as $K$ columns of a matrix ${\tilde{I}}_{D}$ . After normalizing the data, we denote by $i_{ℓ k}$ the $(ℓ, k)$ element of ${\tilde{I}}_{D}$ ,

Eq. (7)

{\tilde{I}}_{D} = (\begin{matrix} i_{11} & i_{12} \dots & i_{1 K} \\ \begin{matrix} i_{21} \\ ⋮ \end{matrix} & \begin{matrix} i_{22} \dots \\ ⋮ ⋱ \end{matrix} & \begin{matrix} i_{2 K} \\ ⋮ \end{matrix} \\ i_{N^{2} 1} & i_{N^{2} 2} \dots & i_{N^{2} K} \end{matrix}) .

The cumulative low-frequency subimage matrix is modeled similarly to Eq. (3),

Eq. (8)

{\tilde{I}}_{D} = {\tilde{I}}_{A} + {\tilde{I}}_{E},

in which

{\tilde{I}}_{A} \in R^{N^{2} \times K}

denotes the noise-free and integrated low-frequency subimage sequence matrix, and

{\tilde{I}}_{E} \in R^{N^{2} \times K}

denotes the sparse error matrix from which high-frequency content has been attenuated by the selection of LWT coefficients. The low-frequency LWT coefficients are similar across multiple subimages of the same scene. According to the model,

{\tilde{I}}_{A}

is noise-free and will ideally, therefore, consist of

K

identical columns. Accordingly,

{\tilde{I}}_{A}

will be of low rank as required by the matrix completion procedure. Thus,

{\tilde{I}}_{A}

can be estimated via matrix completion and RPCA by solving

Eq. (9)

\min_{{\tilde{I}}_{A}, {\tilde{I}}_{E}} {‖ {\tilde{I}}_{A} ‖}_{*} + λ {‖ P_{Ω} ({\tilde{I}}_{E}) ‖}_{1} subject to {\tilde{I}}_{A} + {\tilde{I}}_{E} = {\tilde{I}}_{D},

where the augmented Lagrange multiplier is

Eq. (10)

L ({\tilde{I}}_{A}, {\tilde{I}}_{E}, Y, μ) = {‖ {\tilde{I}}_{A} ‖}_{*} + λ {‖ P_{Ω} ({\tilde{I}}_{E}) ‖}_{1} + Tr {Y, {\tilde{I}}_{D} - {\tilde{I}}_{A} - {\tilde{I}}_{E}} + \frac{μ}{2} {‖ {\tilde{I}}_{D} - {\tilde{I}}_{A} - {\tilde{I}}_{E} ‖}_{F}^{2} .

In this equation,

λ

is an estimated positive weighting parameter representing the proportion of the sparse matrix

{\tilde{I}}_{E}

in the low-rank matrix

{\tilde{I}}_{A}

. The default value for this fraction is

1 / N

.

μ

is a positive tuning parameter balancing accuracy and computational effort.

Tr {A, B}

is the trace of the product

A^{T} B

and

Y

is the iterated Lagrange multiplier.

A flowchart of the IALM algorithm is shown in Fig. 3. Definitions of the notation used in the flowchart appear in Table 1. The algorithm is recursive with superscript $j$ indicating the iteration number. The quantity ${\tilde{I}}_{A}^{(j^{'})} \in R^{N^{2} \times K}$ is the recovered low-rank matrix for some sufficiently large $j$ , say $j^{'}$ . A reasonable strategy for transforming the resulting ${\tilde{I}}_{A}^{(j^{'})}$ to the final low-frequency subimage is to unwrap its first column to form the original $N \times N$ image structure. The final low-frequency subimage is denoted ${\tilde{I}}_{\partial}$ .

Fig. 3

Flowchart of operations in the IALM algorithm.

Table 1

Notation used in the IALM∂ algorithm.

Notation	Definition
${\tilde{I}}_{D}$	Low-frequency subimage observation matrix
${\tilde{I}}_{E}^{(j)}$	Error (sparse) matrix, iteration $j$
${\tilde{I}}_{A}^{(j)}$	Recovered low-rank subimage matrix, iteration $j$
$Y^{(j)}$	Lagrange multiplier matrix, iteration $j$
$τ$	Mean-squared-error tolerance bound
$\nabla (X)$	Singular value decomposition (SVD) of general matrix $X$
$U$ and $V$	Customary notation for orthogonal matrices of SVD
$S$	Customary notation for diagonal matrix of singular values
$S_{ϵ} [x]$	Soft-shrinkage operator applied to scalar $x$ ¹⁵ $S_{ϵ} [x] \overset{def}{=} {\begin{matrix} x - ϵ, & x > ϵ \\ x + ϵ, & x < - ϵ \\ 0, & otherwise \end{matrix}, x \in R, ϵ > 0$

In this process, $Y^{(0)}$ is initialized to ${\tilde{I}}_{D} / \max ({‖ {\tilde{I}}_{D} ‖}_{2}, {‖ {\tilde{I}}_{D} ‖}_{\infty})$ ; and ${\tilde{I}}_{E}^{(0)}$ is initialized to zero matrix as the same size of ${\tilde{I}}_{D}$ ; $λ$ is initialized to $1 / \sqrt{m}$ where $m$ is the column size of ${\tilde{I}}_{D}$ ; tolerance for stopping criterion $τ$ is initialized to $1 \times 10^{- 7}$ ; and $j$ is set to zero for loop computation.

3.3.

High-Frequency Fusion Based on Self-Adapting Regional Variance Estimation

Processing of high-frequency wavelet coefficients has a direct effect on salient details which affect the overall clarity of the image. As the variance of a subimage characterizes the degree of gray level change in a corresponding image region, the variance is a key indicator in processing of high-frequency components. In addition, there is generally a strong correlation among adjacent pixels in a local area, so that there is significant amount of shared information among neighboring pixels. When variances in corresponding local regions across subimages vary widely, a high-frequency fusion rule for selecting the source image of greatest variance has been shown to be effective at preserving image features.⁸^,⁹ However, if the local variances of two source images are similar, this method can result in the loss of information by discarding subtle variations among different subimages. An empirical procedure has been developed in which a thresholding procedure is used to segregate local areas that have sufficiently large variance. This allows the entire set to be represented by the single maximum-variance set member. The selection of this difference threshold, $ξ$ , is discussed below.

Let us return to the original set of images $Γ = {I_{k} \in R^{N_{1} \times N_{2}}}_{k = 1}^{K}$ . Denote by $I_{k} (x, y)$ the gray-scale value at pixel $(x, y)$ in the $k$ ’th image. Also let $V_{k} \in R^{N_{1} \times N_{2}}$ denote a matrix associated with image $I_{k}$ in which matrix element $V_{k} (x, y)$ contains the normalized sample variance of the $3 \times 3$ window of pixels centered on pixel $(x, y)$ . The normalized sample variance means that all variance values are in the interval [0,1]. Without loss of generality, we select images $I_{1}$ and $I_{2}$ with which to describe the steps of the high-frequency fusion algorithm:

1. Compute normalized sample variance matrices $V_{1}$ and $V_{2}$ . Then $V_{k} (x, y)$ denotes the normalized variance value of pixel $(x, y)$ in image $I_{k}$ for $k = 1$ , 2.
2. Implement the LWT over $L = 2$ layers against $I_{1}$ , $I_{2}$ , $V_{1}$ , and $V_{2}$ . Multiresolution structures for each matrix are obtained: $I_{1}^{θ}$ , $I_{2}^{θ}$ , $V_{1}^{θ}$ , and $V_{2}^{θ}$ , in which the superscript $θ$ takes one of three designators of direction—horizontal ( $h$ ), vertical ( $v$ ) or diagonal ( $d$ )—associated with structure matrix
Eq. (11)
$Δ_{V}^{θ} (x, y) = V_{1}^{θ} (x, y) - V_{2}^{θ} (x, y) .$
Let $Δ_{V} (x, y)$ denote the sum of the differences in the horizontal, vertical, and diagonal directions
Eq. (12)
$Δ_{V} (x, y) = [V_{1}^{h} (x, y) - V_{2}^{h} (x, y)] + [V_{1}^{v} (x, y) - V_{2}^{v} (x, y)] + [V_{1}^{d} (x, y) - V_{2}^{d} (x, y)],$
in which $V_{k}^{θ} (x, y)$ indicates the normalized variance of the $k$ ’th image in direction $θ$ .
3. Compare the threshold value and $| Δ_{V} (x, y) |$ . If $| Δ_{V} (x, y) | \geq ξ$ take the pixel value with bigger variance as the wavelet coefficient after fusion; otherwise use a weighted sum to compute the wavelet coefficient, $D_{F}^{θ}$ is the multiresolution structure after fusion, namely
Eq. (13)
$D_{F}^{θ} (x, y) = {\begin{cases} I_{1}^{θ} (x, y), & when Δ_{V} (x, y) > 0 and | Δ_{V} (x, y) | \geq ξ, \\ I_{2}^{θ} (x, y) . & when Δ_{V} (x, y) < 0 and | Δ_{V} (x, y) | \geq ξ, \\ V_{1}^{θ} (x, y) I_{1}^{θ} (x, y) + V_{2}^{θ} (x, y) I_{2}^{θ} (x, y), & when | Δ_{V} (x, y) | < ξ . \end{cases}$
In this study, the value of $ξ$ is set to 0.8. This means that when the normalized variance of the pixel $(x, y)$ in one image is much greater than another, the source image of greater variance is selected. Otherwise, the coefficient is obtained by averaging as in Eq. (13). This fusion rule for high-frequency subimages not only results in the retention of details, but it also prevents the loss of image information caused by redundant data. It ensures the consistency of the fused image.

In summary, IALM is used to determine the low-frequency component to be fused, and self-adapting regional variance is employed to estimate the high-frequency contribution. The fused wavelet coefficients are combined by ILWT to create the final result.

4. Experimental Results and Analysis

4.1.

Comparison of Robust Principal Component Analysis Algorithms

To validate the new procedure, four groups of experiments results are reported. The objective of the first is to compare the performance of RPCA algorithms with that of IALM. The results are shown in Table 2. Two mainstream algorithms are compared—singular value thresholding (SVT), accelerated proximal gradient with IALM.

Table 2

Comparison of RPCA algorithms.

N	Algorithm	r	NMSE	#SVD	Time (s)
500	SVT	25	$1.35 \times 10^{- 4}$	78	13.72
500	APG	25	2.33 $\times 10^{- 5}$	56	10.34
500	IALM	25	4.73 $\times 10^{- 7}$	34	3.32
600	SVT	30	1.27 $\times 10^{- 4}$	77	19.02
600	APG	30	2.11 $\times 10^{- 5}$	58	16.92
600	IALM	30	4.61 $\times 10^{- 7}$	34	5.64
700	SVT	35	1.36 $\times 10^{- 4}$	74	24.77
700	APG	35	2.25 $\times 10^{- 5}$	58	26.25
700	IALM	35	4.62 $\times 10^{- 7}$	34	8.41
800	SVT	40	1.26 $\times 10^{- 4}$	75	33.95
800	APG	40	2.14 $\times 10^{- 5}$	59	42.14
800	IALM	40	4.30 $\times 10^{- 7}$	34	12.09
900	SVT	45	1.27 $\times 10^{- 4}$	75	42.52
900	APG	45	2.03 $\times 10^{- 5}$	60	59.24
900	IALM	45	4.45 $\times 10^{- 7}$	34	16.78
1000	SVT	50	1.25 $\times 10^{- 4}$	73	52.65
1000	APG	50	2.16 $\times 10^{- 5}$	60	72.26
1000	IALM	50	4.45 $\times 10^{- 7}$	34	22.54
2000	SVT	100	1.30 $\times 10^{- 4}$	71	257.17
2000	APG	100	2.05 $\times 10^{- 4}$	64	387.42
2000	IALM	100	4.39 $\times 10^{- 7}$	34	154.43

In this table, the input dataset named observation matrix $D$ of Eq. (6) is of dimension $N \times N$ . It has some random missing or broken pixels. For fair comparison, we set $r$ , the rank of $A$ , to $0.05 N$ , and define the normalized mean squared error (NMSE) as

Eq. (14)

NMSE = \frac{{‖ D - A - E ‖}_{F}}{{‖ D ‖}_{F}} .

In Table 2, the column labeled #SVD indicates the number of iterations. The “times” column displays the number of seconds to run the algorithm. The oversampling rate $(p / d_{r})$ is six, implying $\sim 60 %$ downsampling of the data appearing in the observation matrix, in which, $d_{r}$ indicates the number of degrees of freedom in the rank $r$ matrices: $d_{r} = r (2 N - r)$ . $p$ elements from $A$ are then sampled uniformly to form the known samples in $D$ .¹⁶

Among the three algorithms, IALM exhibits superiority performance in all three measures. The results indicate that time increases proportionately with $N^{2}$ . Note, however, that #SVD is not dependent upon $N$ .

4.2.

Fusion of Clean Images

For convenience, we will refer to the new algorithm as ${IALM}_{\partial}$ . The next two groups of experiments involve processing of left-focus–right-focus images and visible-light–infrared-light images, comparing different image-fusion algorithms with ${IALM}_{\partial}$ . The source images are not corrupted by noise or errors. The spline $5 / 3$ wavelet basis²³ was selected for the LWT process. Through factorization, the equivalent lifting wavelet was obtained. The experimental results are shown in Figs. 4 and 5.

Fig. 4

Multifocus image-fusion experiment: (a) left-focus image, (b) right-focus image, (c) WA_LM, (d) PCNN, (e) ${PCA}_{\partial}$ , and (f) ${IALM}_{\partial}$ .

Fig. 5

Visible light and infrared image-fusion experiment: (a) visible-light image, (b) infrared image, (c) WA_LM, (d) PCNN, (e) ${PCA}_{\partial}$ , and (f) ${IALM}_{\partial}$ .

The first group of source images involves those with eccentric focus, the second contains images of visible contrasting and infrared light. Fig. 4(a) shows a left-focused source image, whereas Fig. 4(b) is right-focused; Fig. 5(a) is a visible-light source image, while Fig. 5(b) uses an infrared source; in Figs. 4(c)–4(f) and 5(c)–5(f) are, respectively, the fusion results by the weighted average over low frequencies and the absolute value maximum method over high frequencies (WA_AM), weighted average over low frequencies and the local area maximum method over high frequencies (WA_AM), improved pulse-coupled neural networks (PCNN) method,²⁴^,²⁵ and PCA-weighted over low frequencies, the self-adaptive regional variance estimation method over high frequencies ( ${PCA}_{\partial}$ ), and the algorithm developed in this paper ( ${IALM}_{\partial}$ ).

The processed images empirically suggest that a clearer fused image is obtained through ( ${IALM}_{\partial}$ ). More detailed information is evident, e.g., in Figs. 4(e) and 4(f) in which the image information on the left edge of the large alarm clock is apparently richer than the same feature in the other three fused images. This also means that algorithm ${IALM}_{\partial}$ is equally effective to algorithm ${PCA}_{\partial}$ , even though the algorithm ${IALM}_{\partial}$ has more detailed information (Table 2). Furthermore, the new algorithm achieves a fusion result with finer detail. For example, the barbed wire in Fig. 5(d) is more clearly visible than the same feature in (c). In Fig. 5, the person in 5(c) is better defined than in 5(d), while in 5(e) and 5(f), both the barbed wire and the person, and even the smoke in the upper-right corner of the image, are easier to identify than in the others. This enhanced clarity admits more effective subsequent processing.

The following objective criteria were evaluated:

1. The “mutual information” (MI) is a measure of statistical dependence that can be interpreted as the amount of information transmitted from the source images to the fused image.²⁶ To assess the MI between source image $I_{1}$ and the fused image, say $I_{F}$ , we use the estimator
Eq. (15)
$M_{1, F} = \sum_{l_{1}, l_{F}} h_{1, F} (l_{1}, l_{F}) \log \frac{h_{1, F} (l_{1}, l_{F})}{h_{1} (l_{1}) h_{F} (l_{F})},$
where $h_{1} (l_{1})$ and $h_{F} (l_{F})$ represent the normalized histogram of source image $I_{1}$ and fused image $I_{F}$ , respectively. $l_{1}$ and $l_{F}$ each take integers indicating one of $2^{8}$ gray levels ${0,1, \dots, 255}$ . $h_{1, F} (l_{1}, l_{F})$ denote the jointly normalized histogram of $I_{1}$ and image $I_{F}$ . Similarly, $M_{2, F}$ denotes the mutual information between image $I_{2}$ and the fused image $I_{F}$ . The MI between the source images $I_{1}$ and $I_{2}$ and the fused image $I_{F}$ is
Eq. (16)
$M_{1,2, F} = M_{1, F} + M_{2, F} .$
A larger MI value indicates that the fused image includes more information from the original images.
2. The “average gradient” (AG), or “clarity,” reflects the preservation of gray level changes in the image. With dimensions $N_{1} = N_{2} = N$ , larger values of AG imply greater clarity and edge preservation. Gray-level differentials are important, e.g., in texture rendering. The AG is defined as
Eq. (17)
$\nabla \bar{g} = \frac{1}{N^{2}} \sum_{x = 1}^{N - 1} \sum_{y = 1}^{N - 1} \sqrt{[Δ I_{x}^{2} (x, y) + Δ I_{y}^{2} (x, y)] / 2},$
where $Δ I_{x} (x, y) = I (x + 1, y) - I (x, y)$ and $Δ I_{y} (x, y) = I (x, y + 1) - I (x, y)$ are the gray value differentials in the coordinate $x$ and $y$ directions, respectively.
3. The “correlation coefficient” (CC) is used to compare two images of the same object (or scene). CC, which measures the correlation (degree of linear coherence) between the original and the fused images, is defined as
Eq. (18)
$C_{F, 1} = \frac{\sum_{x, y} [(I_{F} (x, y) - {\bar{I}}_{F})] [(I_{1} (x, y) - {\bar{I}}_{1})]}{\sqrt{\sum_{x, y} {[I_{F} (x, y) - {\bar{I}}_{F}]}^{2} \sum_{x, y} {[I_{1} (x, y) - {\bar{I}}_{1}]}^{2}}},$
where $I_{F} (x, y)$ and $I_{1} (x, y)$ are the gray levels at pixel $(x, y)$ in the fused and original images, and ${\bar{I}}_{F}$ and ${\bar{I}}_{1}$ denote the average gray levels in the two images.
4. The “degree of distortion” (DD), a direct indicator of image fidelity, is defined as
Eq. (19)
$D_{F, 1} = \frac{1}{N_{1} \times N_{2}} \sum_{x = 1}^{N_{1}} \sum_{y = 1}^{N_{2}} | I_{F} (x, y) - I_{1} (x, y) |,$
in which $I_{F} (x, y)$ and $I_{1} (x, y)$ are as defined above.
5. The $Q^{A B / F}$ metric quantifies the amount of edge information transferred from two source images $I_{A}$ and $I_{B}$ to a fused image $I_{F}$ .²⁶ It is calculated as
Eq. (20)
$Q^{A B / F} = \frac{\sum_{x = 1}^{N_{1}} \sum_{y = 1}^{N_{2}} [Q^{A F} (x, y) w^{A} (x, y) + Q^{B F} (x, y) w^{B} (x, y)]}{\sum_{x = 1}^{N_{1}} \sum_{y = 1}^{N_{2}} [w^{A} (x, y) + w^{B} (x, y)]},$
where each image is of size $N_{1} \times N_{2}$ . $α$ and $β$ represent, respectively, the edge strength and orientation. $Q^{A F} (x, y)$ is the product of $Q_{α}^{A F} (x, y)$ and $Q_{β}^{A F} (x, y)$ which represent, respectively, how well the edge strength and orientation values of a pixel are represented in the fused image $I_{F}$ . Similarly, $Q^{B F} (x, y)$ is computed as the product of $Q_{α}^{B F} (x, y)$ and $Q_{β}^{B F} (x, y)$ which represent, respectively, how well the edge strength and orientation values of a pixel $(x, y)$ in $I_{2}$ are represented in the fused image $I_{F}$ . $w^{A} (x, y)$ and $w^{B} (x, y)$ , respectively, denote the proportion of $Q^{A F} (x, y)$ and $Q^{B F} (x, y)$ , which reflect the importance of $Q^{A F} (x, y)$ and $Q^{B F} (x, y)$ . The dynamic range of $Q^{A B / F}$ is between [0 1], and it should be as close to 1 as possible.
6. The “peak signal-to-noise ratio” (PSNR) is an expression for the ratio between the maximum possible power of a signal and the power of distorting noise that affects the quality of its representation. This objective metric is used to compare the effectiveness of algorithms by measuring the proximity of the fused image and the original image. The PSNR is computed as
Eq. (21)
$PSNR = 10 \lg \frac{{(L - 1)}^{2}}{{RMSE}^{2}},$
where RMSE denotes the root mean square error between the reference and fused images. $L = 256$ is the number of gray levels used in representing an image. A larger PSNR value indicates a better fusion result.

Tables 3 and 4 report the objective performance evaluation measures for the four fusion algorithms.

Table 3

Experimental objective evaluation measures of Fig. 4.

Evaluation indicator	WA_LM	PCNN	PCA∂	IALM∂
MI	6.1604	7.0814	7.2788	7.5191
AG	4.2067	6.8089	6.8096	6.8089
CC	0.9768	0.9749	0.9836	0.9927
DD	3.9762	3.9089	3.6406	3.5743
$Q^{A B / F}$	0.6133	0.6897	0.6987	0.6929
PSNR	22.6195	28.1095	28.1270	31.3846

Table 4

Evaluation comparison of Fig. 5.

Evaluation indicator	WA_LM	PCNN	PCA∂	IALM∂
MI	2.7595	3.7565	3.8953	3.8938
AG	6.8556	7.8666	7.9206	8.1982
CC	0.7873	0.8729	0.8808	0.8976
DD	17.1008	11.0016	10.9259	10.2100
$Q^{A B / F}$	0.5988	0.6978	0.6798	0.7548
PSNR	20.4271	24.3396	25.1234	25.3540

Relative to the other algorithms, ${IALM}_{\partial}$ obtains the largest MI and AG for the fused images, suggesting that this algorithm can provide fused images with higher information content and better clarity. The objective indicators of fidelity to the source image also favor the IALM and self-adaptive regional variance estimation algorithm performance.

4.3.

Fusion of Corrupted Images

To assess whether ${IALM}_{\partial}$ is robust to missing data and image corruption, we continue to use clean, multifocus clock images for processing. At a 0.15 error rate, 15% of the pixels of the original image are corrupted, and an additional 15% are missing (gray-level values set to zero). This implies an effective data corruption rate or 30%. The results of the test of the four algorithms are shown in Fig. 6. Figures 6(a) and 6(b) show, respectively, Fig. 4(a) with errors and Fig. 4(b) with errors. Figure 6(c) shows the result of using ${PCA}_{\partial}$ without a denoising filter, while Fig. 6(d), labeled ${PCA}_{\partial, F}$ , shows the result of using ${PCA}_{\partial}$ with an adaptive median filter. The result of using PCNN with an adaptive median filter is labeled ${PCNN}_{F}$ and appears in Fig. 6(e). To achieve this outcome, we use the adaptive median filtering strategy proposed by Chen and Wu²⁷ to identify pixels corrupted by impulsive noise and replace each damaged pixel by the median of its neighborhood. The adaptive median filter can employ varying window sizes to accommodate different noise conditions and to reduce distortions like excessive thinning or thickening of object boundaries. Figure 6(f) shows results using ${IALM}_{\partial}$ without denoising. The clarity of result 6(f) relative to those in 6(c), 6(d), and 6(e) is quite apparent. The empirical image quality tracks the improvement in PSNR as reported in the captions. Figures 6(g) and 6(h) show 400% blow ups of portions of 6(e) and 6(f).

Fig. 6

Multifocus corrupted image-fusion experiment: (a) Fig. 4(a) with errors; (b) Fig. 4(b) with errors; (c) ${PCA}_{\partial}$ ( $PSNR = 17.82$ ); (d) ${PCA}_{\partial, F}$ ( $PSNR = 19.37$ ) (e) ${PCNN}_{F}$ ( $PSNR = 20.76$ ); (f) ${IALM}_{\partial}$ ( $PSNR = 30.33$ ); (g) zoom out 400% of (e); and (h) zoom out 400% of (f).

These results demonstrate the ability of ${IALM}_{\partial}$ to recover the missing or erroneous data, while preserving image detail in both corrupted and clean images.

5. Conclusions

Traditional convolution-based wavelet transform processing for image fusion has shortcomings including large memory requirements and high computational complexity. The approach to fusion taken in this research uses different fusion rules for low-frequency and high-frequency decomposition components represented on a lifting wavelet basis set. Low-frequency components are characterized by the matrix completion and RPCA methods: IALM, whereas the high-frequency components critical for image details are represented by taking into account the variance differences among proximal neighborhoods. Furthermore, strong correlation between pixels in a local area is captured by a self-adaptive regional variance assessment.

Experimental results show that the new algorithm not only improves the amount of information and the correlation between the fused and source images, but also reduces the level of distortion. Significant clarity improvement relative to state-of-the-art methods is also demonstrated for corrupted images.

Acknowledgments

This research was supported in part by the National Natural Science Foundation of China (Grant No. 30970780) and by the General Program of Science and Technology Development Project of Beijing Municipal Education Commission of China (Grant No. KM201110005033). J.D. and D.B. efforts were supported in part by the U.S. National Science Foundation under Cooperative Agreement DBI-0939454. Any opinions, conclusions, or recommendations expressed are those of the authors and do not necessarily reflect the views of the NSF. This work was undertaken in part while Z.W. was a visiting research scholar at the Michigan State University. The authors thank the Beijing University of Technology’s Multimedia Information Processing Lab for assistance.

References

1.

B. Khaleghi et al., “Multisensor data fusion: a review of the state-of-the-art,” Inf. Fusion, 14 28 –44 (2013). http://dx.doi.org/10.1016/j.inffus.2011.08.001 Google Scholar

2.

G. Piella, “A general framework for multiresolution image fusion: from pixels to regions,” Inf. Fusion, 4 (4), 259 –280 (2003). http://dx.doi.org/10.1016/S1566-2535(03)00046-0 Google Scholar

3.

Y. Chai, H. Li and Z. Li, “Multifocus image fusion scheme using focused region detection and multiresolution,” Opt. Commun., 284 (19), 4376 –4389 (2011). http://dx.doi.org/10.1016/j.optcom.2011.05.046 Google Scholar

4.

R. K. Sharma and M. Pavel, Probabilistic Model-Based Multisensor Image Fusion, 1 –35 Oregon Graduate Institute of Science and Technology(1999). Google Scholar

5.

Y. Zheng, “An orientation-based fusion algorithm for multisensor image fusion,” Proc. SPIE, 7710 77100K (2010). http://dx.doi.org/10.1117/12.849656 Google Scholar

6.

R. Nava, B. Escalante-Ramírez and G. Cristóbal, “A novel multi-focus image fusion algorithm based on feature extraction and wavelets,” Proc. SPIE, 7000 700028 (2008). http://dx.doi.org/10.1117/12.781403 Google Scholar

7.

C. Ramesh and T. Ranjith, “Fusion performance measures and a lifting wavelet transform based algorithm for image fusion,” Inf. Fusion, 1 317 –320 (2002). http://dx.doi.org/10.1109/ICIF.2002.1021168 Google Scholar

8.

G. Liu and C. Liu, “A novel algorithm for image fusion based on wavelet multi-resolution decomposition,” J. Optoelectron., 15 334 –347 (2004). Google Scholar

9.

Z. Qiang and J. Peng, “Remote sensing image fusion based on small wavelet transform’s local variance,” J. Huazhong Univ. Sci. Technol., 6 89 –91 (2003). Google Scholar

10.

W. Sweldens, “The lifting scheme: a construction of second generation wavelets,” SIAM J. Math. Anal., 29 (2), 511 –546 (1998). http://dx.doi.org/10.1137/S0036141095289051 Google Scholar

11.

M. Chen and H. Di, “Study on optimal wavelet decomposition level for multi-focus image fusion,” Opto-Electron. Eng., 31 64 –67 (2004). Google Scholar

12.

Q. Lin and F. Gui, “A novel image fusion algorithm based on wavelet transforms,” Proc. SPIE, 7001 70010M (2008). http://dx.doi.org/10.1117/12.780162 PSISDG 0277-786X Google Scholar

13.

L. S. Arivazhagan, L. Ganesan and T. Kumar, “A modified statistical approach for image fusion using wavelet transform,” Signal Image Video Process., 3 (2), 137 –144 (2009). http://dx.doi.org/10.1007/s11760-008-0065-4 Google Scholar

14.

S. El-Khamy et al., “Regularized super-resolution reconstruction of images using wavelet fusion,” Opt. Eng., 44 (9), 097001 (2005). http://dx.doi.org/10.1117/1.2042947 Google Scholar

15.

Z. Lin and Y. Ma, “The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices,” (2011). Google Scholar

16.

E. J. Candés and B. Recht, “Exact matrix completion via convex optimization,” Found. Comput. Math., 9 717 –772 (2009). http://dx.doi.org/10.1007/s10208-009-9045-5 Google Scholar

17.

E. J. Candés, J. K. Romberg and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math., 59 1207 –1223 (2006). http://dx.doi.org/10.1002/cpa.20124 Google Scholar

18.

E. J. Candés and Y. Plan, “Matrix completion with noise,” Proc. IEEE, 98 925 –936 (2010). http://dx.doi.org/10.1109/JPROC.2009.2035722 Google Scholar

19.

J. Wright et al., “Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization,” Proc. Neural Inf. Process. Syst., 3 1 –9 (2009). Google Scholar

20.

E. J. Candés et al., “Robust principal component analysis?,” J. ACM, 58 11 (2011). Google Scholar

21.

W. Tan, G. Cheung and Y. Ma, “Face recovery in conference video streaming using robust principal component analysis,” in Proc. IEEE Int. Conf. on Image Processing, 3225 –3228 (2011). Google Scholar

22.

H. Ji et al., “Robust video denoising using low rank matrix completion,” in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, 1791 –1798 (2010). Google Scholar

23.

A. Z. Averbuch and V. A. Zheludev, “Image compression using spline based wavelet transforms,” Wavelets Signal Image Anal., 19 341 –376 (2001). Google Scholar

24.

X. Qu et al., “Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain,” Acta Autom. Sin., 34 1508 –1514 (2008). http://dx.doi.org/10.1016/S1874-1029(08)60174-3 THHPAY 0254-4156 Google Scholar

25.

Y. Chai, H. F. Li and M. Y. Guo, “Multifocus image fusion scheme based on features of multiscale products and PCNN in lifting stationary wavelet domain,” Opt. Commun., 284 1146 –1158 (2011). http://dx.doi.org/10.1016/j.optcom.2010.10.056 Google Scholar

26.

C. S. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett., 36 308 –309 (2000). http://dx.doi.org/10.1049/el:20000267 ELTNBK 0013-4759 Google Scholar

27.

T. Chen and H. Wu, “Adaptive impulse detection using center-weighted median filters,” IEEE Signal Process. Lett., 8 1 –3 (2001). http://dx.doi.org/10.1109/97.889633 Google Scholar

Biography

Zhuozheng Wang is an associate professor at Beijing University of Technology and a visiting scholar at Michigan State University sponsored by the China Scholarship Council. He received his MS and PhD degrees in electronic engineering from Beijing University of Technology in 2005 and 2013. He is the first author of more than 10 academic papers and has written one book chapter. His current research interests include image processing, electroencephalography, and virtual reality technology. He has been a reviewer and is a member of SPIE.

J. R. Deller Jr. is an IEEE fellow and professor of electrical and computer engineering at Michigan State University, where he received the distinguished faculty award in 2004. He received a PhD in biomedical engineering in 1979, an MS degree in electrical and computer engineering in 1976, and an MS degree in biomedical engineering in 1975 from the University of Michigan, and his BS degree in electrical engineering (summa cum laude) in 1974 from Ohio State University. His research interests include statistical signal processing with applications to speech and hearing, genomics, and other aspects of biomedicine.

Blair D. Fleet received her BS degree (summa cum laude) from Morgan State University, Baltimore, MD, in 2010, and her MS degree from Michigan State University in 2012, both in electrical engineering. She is a National Science Foundation graduate research fellowship award recipient, as well as a GEM (the National Consortium for graduate degrees for Minorities in Engineering and Science, Inc.) fellow. She is currently pursuing her PhD in electrical engineering at Michigan State University. Her research interests include merging signal/image processing with the evolutionary computation fields to solve challenging engineering processing problems, especially in the biomedical domain.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Zhuozheng Wang, J. R. Deller Jr., and Blair D. Fleet "Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis," Journal of Electronic Imaging 25(1), 013007 (14 January 2016). https://doi.org/10.1117/1.JEI.25.1.013007

Published: 14 January 2016

Access the abstract

JOURNAL ARTICLE
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 15 scholarly publications.

Explore citations on Lens.org

KEYWORDS

Image fusion

Principal component analysis

Image processing

Wavelets

Wavelet transforms

Image transmission

Infrared imaging

1.

Introduction

Fig. 1

2.

Matrix Completion and Robust Principal Component Analysis

2.1.

Overview

Eq. (1)

Eq. (2)

Eq. (3)

2.2.

Matrix Completion

Eq. (4)

2.3.

Robust Principal Component Analysis

Eq. (5)

Eq. (6)

3.

Frequency-Domain Fusion Rules

3.1.

Overview

Fig. 2

3.2.

Low-Frequency Fusion Based on Inexact Augmented Lagrange Multiplier

Eq. (7)

Eq. (8)

Eq. (9)

Eq. (10)

Fig. 3

Table 1

3.3.

High-Frequency Fusion Based on Self-Adapting Regional Variance Estimation

Eq. (11)

Eq. (12)

Eq. (13)

4.

Experimental Results and Analysis

4.1.

Comparison of Robust Principal Component Analysis Algorithms

Table 2

Eq. (14)

4.2.

Fusion of Clean Images

Fig. 4

Fig. 5

Eq. (15)

Eq. (16)

Eq. (17)

Eq. (18)

Eq. (19)

Eq. (20)

Eq. (21)

Table 3

Table 4

4.3.

Fusion of Corrupted Images

Fig. 6

5.

Conclusions

Acknowledgments

References

Biography

Show All Keywords

Keywords/Phrases

Search In:

Publication Years