|
1.IntroductionIt is crucial to assess objectively image qualities for image processing applications because the assessments can compare with results of other methods to evaluate the performance. For measuring the performance of image correction, compressing and enhancing methods, such as denoising, JPEG compression, super-resolution, and frame rate upconversion,1–7 and almost all objective evaluation metrics do not completely agree with the perceived subjective visibility of humans, while subjective evaluation is usually too inconvenient, time-consuming, and expensive.8 The simplest and most widely used metrics are mean squared error (MSE) and peak signal-to-noise ratio (PSNR); MSE is computed by averaging the squared differences of two signals, and PSNR is the ratio between the maximum value (Max) of a signal and the MSE as follows: where and are the element of two signals, and is the number of elements, e.g., the elements in image signals indicate pixels and the number of pixels should be equal to . However, the MSE and PSNR are not very well matched to perceived visible quality.9–13 A lot of image quality assessment methods based on error sensitivity have been proposed,14–19 and they use the human visual system (HVS), contrast sensitivity function, discrete cosine transform, wavelet transform, and so forth. However, the similarity errors assessed by them may quite differ with the loss of qualities, so some distortions may be clearly visible but these errors are not clearly observed in them.8Recently, structural similarity (SSIM) has typically been used to determine visible quality.8,20 This is a full reference image quality assessment method and it indicates how much an image is similar to the original image. It has three main components, which are structure, illuminance, and contrast. However, the components, especially structure component, are highly sensitive to translation, scaling, and rotation of an image. This means that although when images are translated and rotated as little as an unrecognizable amount, the SSIM is sensitively decreased.21 Moreover, it may overestimate images that have undergone regional distortions such as JPEG compression. In this paper, we aim at developing an improved structural similarity metric to outperform the typical SSIM, which can be used to overcome potential drawbacks. The proposed metric uses an improved structure comparison, and additionally uses a sharpness comparison. 2.SSIM and Its DrawbacksSince humans usually use contrast, color, and frequency changes in their image quality measures,22 the SSIM uses the luminance, contrast, and structure comparison shown in Fig. 1.8,22 The SSIM of two images and is defined by the combination of three components as follows:8 where , , and are the luminance, contrast, and structure comparison functions, respectively, defined by where and denote the mean and the standard deviation of ; and denote the mean and the standard deviation of ; denotes the covariance between and ; and , , and are constants used to avoid instability when the denominators are very close to zero. The values of , , and are in [0, 1] and they indicate higher similarities for each comparison function when the values are close to 1. The local statistics are calculated within the local window having circular symmetric Gaussian weights, which are and as follows: where is an index of the pixels in the Gaussian window and is the total pixel number of the Gaussian window.The combination of all comparisons between two images and is where , , and are parameters used to adjust the relative importance. In order to simplify the expression and equalize the relative importance of the three components, they are generally set and , so we also set the parameters in the same manner.8,21 The results in a specific form of the SSIM index as follows: To measure a single overall quality measure of the entire image, a mean SSIM (MSSIM) index is used as follows: where and are the original and the distorted images, respectively, and is the number of pixels of images as used in Eq. (1).8 MSSIM can be interpreted as a mean value of the SSIM index map.23 Because SSIM values have the range of [0, 1], MSSIM also has the same range.The SSIM and MSSIM can be used to measure the similarity of two images. However, they have some drawbacks as shown in Fig. 2 and Table 2. First, images filtered by a low pass filter, such as a mean filter (MF), a median filter (MedF), and JPEG compression, are evaluated as having high similarity scores. Second, images that have been slightly distorted by some geometric transformations, such as spatial translation (ST) and rotation (RT), are evaluated as having low similarity scores. 3.New Structural SimilarityThe main component of the SSIM that causes drawbacks is the structure comparison defined by Eq. (6). When we use Eq. (3) by only combining Eqs. (4) and (5), images that are slightly geometrically transformed do not have low similarities as shown in Fig. 3 and Table 1, where , , and are the mean of in Eq. (4), in Eq. (5), and in Eq. (6). In Table 1, of the ST image is very low, while of the JPEG image is higher than that of the ST image. This example shows that the limitation of SSIM is sensitive to ST, scaling, and RT. Table 1Comparison of MSSIM and its components with MSSIM-S and its components about Fig. 3.
To reduce the weak effect of , we define the structure comparison in a new way as follows: where and denote the standard deviations for elements of smaller than and larger than , respectively, and and denote the same for . In Ref. 8, structural information in an image is defined as those attributes that represent the structure of objects in the scene, independent of the average luminance and contrast, and structure comparison is conducted after luminance subtraction and variance normalization. So is defined by the correlation between standard scores (-score),24 and . However, we define as the correlation between standard deviations for pixels having positive/negative standard scores because and can represent the structure of objects by dividing as locally brighter and darker regions. As shown in Fig. 3 and Table 1, the weak effect of is relatively decreased compared to the original SSIM; however, the similarity of the ST image is lower than that of the JPEG image. That is to say, the SSIM still overestimates blurred images, when is used as the structure comparison. Therefore, we add a new component, the sharpness comparison , which is the correlation between the normalized digital Laplacian, defined as where and denote the normalized digital Laplacian given byThe new similarity components and are satisfied with the properties for measurement metrics as follows: As shown in Fig. 4, the mean of of the ST image is higher than that of the JPEG image. Finally, the improved SSIM which includes the sharpness comparison (ISSIM-S) can be defined as and the proposed ISSIM-S measurement system can be configured (Fig. 4).To measure a single overall quality measure of the entire image, a mean ISSIM-s (MISSIM-S) index may be used as follows: The values of ISSIM-S and MISSIM-S are also in [0, 1] and these values indicate higher similarities when they are close to 1.4.Experimental ResultsTo evaluate the proposed similarity metric, which compares the PSNR and the SSIM, we tested some distorted images as shown in Fig. 2. In this test, we used an circular-symmetric Gaussian weight function, with a standard deviation of 1.5; normalized the unit sum equals to 1. The constants were selected to be , , and as was done in Ref. 8. These values seem somewhat arbitrary, but Wang et al. found that in their experiments, the performance of the SSIM index algorithm is fairly insensitive to variations of these values. The local variance similarity between the original and the histogram-equalized images is quite different because histogram equalization (HE) is a nonlinear intensity transform. However, the SSIM is evaluated to have a high similarity score, while our new metric is evaluated as having a lower similarity than the SSIM. The ISSIM-Ss of the images, filtered by low pass filters, such as MF, MedF, and JPEG compression, are also evaluated to have lower similarities than the SSIM. In addition, the ISSIM-Ss of images that have been slightly geometrically transformed by ST and RT are higher than SSIMs. The results of the mean luminance shifting (MLS) and impulsive noise (IN) images show that the SSIMs and the ISSIM-Ss are evaluated with the same image but the result values are different. To compare the different index maps of the SSIM and the ISSIM-S, the results of HE, MedF, JPEG, and MF are shown in Fig. 5. The pixel values of the index map are normalized SSIM or ISSIM-S values. The index maps have different results, and the index maps of the ISSIM-S are darker than those of the SSIM because the MISSIM-Ss are lower than the MSSIMs. While the index maps of the ISSIM-S for IN, ST, and RT are brighter than those of the SSIM, because the similarities of the ISSIM-S are increased than those of the SSIM as shown in Fig. 6. The index maps of MLS are very similar as shown in Fig. 7. To compare the mean opinion scores (MOSs), the rank of PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS are shown in Table 2. To measure MOSs, we showed subjects the result images of each processing with the original image, and received their opinion scores, which have ranges of 1 (not similar) to 5 (very similar). Each comparison was implemented one-on-one with the original image and we randomized the order of the distorted images we showed to minimize order effects. The number of test subjects was 17 and none of them had any problems with their eyes. The experiments were implemented under the regulated illumination conditions and display conditions. Table 2Comparison of the PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS rank of “Lena” image (the rank for each metric is shown in parentheses).
The scores themselves are subjective and not convincing but they can have meaning in relative comparison. Therefore, we used MOS ranks instead of MOS itself. The rank correlations by the MOS rank are also shown, where the rank correlation is computed by Spearman’s rank correlation coefficient ()25 which is defined as follows: where denotes the difference of the ’th rank and denotes the ranking size. The rank correlation of the mean of the ISSIM-S is closer to 1 than the others.We compared PSNR, SSIM, ISSIM-S, and MOS with another image shown in Fig. 8 and the results are shown in Table 3. The types of distortion are exactly the same as those of Table 2, but the only difference is the filter size. The resolution of test images in Table 2 is and the filter size is ; however, the resolution of test images in Fig. 8 is so we set the filter size as . Table 3Comparison of the PSNR, mean of the SSIM, mean of the ISSIM-S, and MOS rank of “Einstein” image (the rank for each metric is shown in parentheses).
To evaluate the performance with different distortion levels, we tested a few more images: blurred images with different sizes of MF, images that have undergone various loss via JPEG compression, and images differently translated by ST (shown in Fig. 9 and Table 4). As the distortion level increases, PSNR, MSSIM, and mean ISSIM-S decrease, no matter the processing type. However, in ST, PSNR and MSSIM have the lowest values when it is translated only 3 pixels according to axis, while mean ISSIM-S does not. ISSIM-S is also affected by translation but it is less sensitive than PSNR and SSIM methods. Table 4Comparison of the PSNR, mean of the SSIM, and mean of the ISSIM-S for different distortion levels.
We conducted two additional experiments. First, comparison of ST, MF, and JPEG compression for various scene contents are shown Fig. 10 and Table 5. The resolutions of the tested images in this experiment are . The PSNR and the mean of SSIM values for each image are scored according to this order, . However, the mean of ISSIM-S shows another pattern, which is . The order of ISSIM-S is more reasonable than PSNR or SSIM. This result shows that the proposed image quality assessment method does not overestimate blurred images and it is much less sensitive to geometric transformations, which were one of the identified drawbacks of SSIM. Second, as shown in Fig. 11 and Table 6, we compared the PSNR, the mean of SSIM, and the mean of ISSIM-S for various combinations of degradations. The drawback of SSIM is that it is too sensitive to geometric translation and can be found when the degradations are combined. This result shows that MSSIM overvalues HE+IN while MISSIM-S evaluates moderately. It means that MISSIM-S is much closer to HVS because MISSIM-S is less sensitive to a small amount of geometric translation just as HVS is. Table 5Comparison of the PSNR, mean of the SSIM, and mean of the ISSIM-S for different scene contents.
Table 6Comparison of the PSNR, mean of the SSIM, and mean of the ISSIM-S for various combinations of degradations.
In addition, we tested the variations of MSSIM and MISSIM-S in terms of the size of the Gaussian window as shown in Fig. 12, where the window size is large enough because the variations are very small when the window size is larger than 11. 5.ConclusionIn this paper, we have proposed an improved structural similarity metric using structure and sharpness comparison functions to overcome the drawbacks of the SSIM metric. The structure comparison used segmented standard deviations by the mean, and sharpness comparison used the normalized digital Laplacian. The proposed metric can evaluate geometric transformed images with high similarities and cannot overestimate blurred images such as JPEG compression. The experimental results indicate that our similarity metric is superior to existing methods in respect to the perceived visibility of humans. Therefore, our method can be used to evaluate the performance of various methods such as image enhancement, frame rate upconversion, image compression, super-resolution, and image restoration. AcknowledgmentsThis research was partly supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01059091), and Institute for Information and communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-16-0033, Research and Development of 5G Mobile Communications Technologies using CCN-based Multi-dimensional Scalability). ReferencesU. S. Kim and M. H. Sunwoo,
“New frame rate up-conversion algorithm with low computational complexity,”
IEEE Trans. Circuits Syst. Video Technol., 24
(3), 384
–393
(2014). http://dx.doi.org/10.1109/TCSVT.2013.2278142 Google Scholar
R. V. Babu, S. Suresh and A. Pekis,
“No-reference JPEG-image assessment using GAP_RBF,”
Signal Process., 87
(6), 1493
–1503
(2007). http://dx.doi.org/10.1016/j.sigpro.2006.12.014 Google Scholar
A. Shnaydeman, A. Gusev and A. M. Eskicioglu,
“An SVD-based grayscale image quality measure for local and global assessment,”
IEEE Trans. Image Process., 15
(2), 422
–429
(2006). http://dx.doi.org/10.1109/TIP.2005.860605 IIPRE4 1057-7149 Google Scholar
W. T. Freeman, T. R. Jones and E. C. Pasztor,
“Example-based super-resolution,”
IEEE Comput. Graph. Appl., 22
(2), 56
–65
(2003). http://dx.doi.org/10.1109/38.988747 ICGADZ 0272-1716 Google Scholar
R. R. Schultz, L. Meng and R. L. Stevenson,
“Subpixel motion estimation for super-resolution image sequence enhancement,”
J. Visual Commun. Image Represent., 9
(1), 38
–50
(1998). http://dx.doi.org/10.1006/jvci.1997.0370 JVCRE7 1047-3203 Google Scholar
S. G. Chang, B. Yu and M. Vetterli,
“Spatially adaptive wavelet thresholding with context modeling for image denoising,”
IEEE Trans. Image Process., 9
(9), 1522
–1531
(2000). http://dx.doi.org/10.1109/83.862630 IIPRE4 1057-7149 Google Scholar
A. Buades, B. Coll and J. M. Morel,
“A non-local algorithm for image denoising,”
in Proc. IEEE CS Conf. Computer Vision and Pattern Recognition,
60
–65
(2005). http://dx.doi.org/10.1109/CVPR.2005.38 Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612
(2004). http://dx.doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149 Google Scholar
M. P. Eckert and A. P. Bradley,
“Perceptual quality metrics applied to still image compression,”
Signal Process., 70
(3), 177
–200
(1998). http://dx.doi.org/10.1016/S0165-1684(98)00124-8 Google Scholar
A. M. Eskicioglu and P. S. Fisher,
“Image quality measures and their performance,”
IEEE Trans. Commun., 43
(12), 2959
–2965
(1995). http://dx.doi.org/10.1109/26.477498 Google Scholar
S. Winkler,
“A perceptual distortion metric for digital color video,”
Proc. SPIE, 3644 175
–184
(1999). http://dx.doi.org/10.1117/12.348438 PSISDG 0277-786X Google Scholar
P. C. Eeo and D. J. Heeger,
“Perceptual image distortion,”
Proc. SPIE, 2179 127
–141
(1994). http://dx.doi.org/10.1117/12.172664 PSISDG 0277-786X Google Scholar
Z. Qang, A.C. Bovik and L. Lu,
“Why is image quality assessment so difficult,”
in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing,
3313
–3316
(2002). http://dx.doi.org/10.1109/CVPR.2005.38 Google Scholar
W. Osberger, N. Bergmann and A. Maeder,
“An automatic image quality assessment technique incorporating high level perceptual factors,”
in Proc. IEEE Int. Conf. Image Processing,
414
–418
(1998). http://dx.doi.org/10.1109/ICIP.1998.727227 Google Scholar
A. B. Watson, J. Hu and III J. F. McGowan,
“DVQ: a digital video quality metric based on human vision,”
J. Electron. Imaging, 10
(1), 20
–29
(2001). http://dx.doi.org/10.1117/1.1329896 JEIME5 1017-9909 Google Scholar
A. B. Watson et al.,
“Visibility of wavelet quantization noise,”
IEEE Trans. Image Process., 6
(8), 1164
–1175
(1997). http://dx.doi.org/10.1109/83.605413 IIPRE4 1057-7149 Google Scholar
Y. K. Lai and C. C. J. Kuo,
“A Haar wavelet approach to compressed image quality measurement,”
J. Visual Commun. Image Represent., 11
(1), 17
–40
(2000). http://dx.doi.org/10.1006/jvci.1999.0433 JVCRE7 1047-3203 Google Scholar
A. B. Watson,
“DCT quantization matrices visually optimized for individual images,”
Proc. SPIE, 1913 202
–216
(1993). http://dx.doi.org/10.1117/12.152694 PSISDG 0277-786X Google Scholar
W. Xu and G. Hauske,
“Picture quality evaluation based on error segmentation,”
Proc. SPIE, 2308 1454
–1465
(1994). http://dx.doi.org/10.1117/12.185904 PSISDG 0277-786X Google Scholar
Z. Wang and A.C. Bovik,
“A universal image quality index,”
IEEE Signal Process Lett., 9
(3), 81
–84
(2002). http://dx.doi.org/10.1109/97.995823 Google Scholar
Z. Wang and E. P. Simoncelli,
“Translation insensitive image similarity in complex wavelet domain,”
in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing,
573
–576
(2005). http://dx.doi.org/10.1109/ICASSP.2005.1415469 Google Scholar
Y. A. Y. Al-Najjar and D. C. Soong,
“Comparison of image quality assessment: PSNR, HVS, SSIM, UIQI,”
Int. J. Sci. Eng. Res., 3
(8), I041
–I045
(2012). Google Scholar
Z. Wang, L. Lu and A. C. Bovik,
“Video quality assessment based on structural distortion measurement,”
Signal Process. Image Commun., 19
(2), 121
–132
(2004). http://dx.doi.org/10.1016/S0923-5965(03)00076-6 SPICEF 0923-5965 Google Scholar
K. Ma et al.,
“Objective quality assessment for color-to-gray image conversion,”
IEEE Trans. Image Process., 24
(12), 4673
–4685
(2015). http://dx.doi.org/10.1109/TIP.2015.2460015 IIPRE4 1057-7149 Google Scholar
J. L. Myers and A. D. Well, Research Design and Statistical Analysis, 508 2nd ed.Lawrence Erlbaum Associates, New Jersey
(2003). Google Scholar
BiographyDaeho Lee received his MS and PhD degrees in electronics engineering from Kyung Hee University, Republic of Korea, in 2001 and 2005, respectively. He has been an associate professor in the Humanities College at Kyung Hee University, Republic of Korea, since 2005. His research interests include computer vision, pattern recognition, machine learning, image processing, image fusion, 3-D image reconstruction, computer games, ITS, HCI, electrical impedance tomography analysis, and digital signal processing. Sungsoo Lim received his BS degrees in electronics and radio engineering and biomedical engineering and his MS degree in electronics and radio engineering from Kyung Hee University, Republic of Korea, in 2014 and 2016. He is currently pursuing his PhD in electronic engineering at the Kyung Hee University. His research interests include computer vision, image processing, intelligent transportation systems (ITS), human computer interaction (HCI), and medical image processing. |