Deep completion which predicts dense depth from sparse depth has important applications in the fields of robotics, autonomous driving and virtual reality. It compensates for the shortcomings of low accuracy in monocular depth estimation. However, the previous deep completion works evenly processed each depth pixel and ignored the statistical properties of the depth value distribution. In this paper, we propose a self-supervised framework that can generate accurate dense depth from RGB images and sparse depth without the need for dense depth labels. We propose a novel attention-based loss that takes into account the statistical properties of the depth value distribution. We evaluate our approach on the KITTI Dataset. The experimental results show that our method achieves state-of-the-art performance. At the same time, ablation study proves that our method can effectively improve the accuracy of the results.
In this paper, we propose a new automatic laser sheet calibration in laser triangulation. The existing methods are relatively monotonous, inefficient, and subject to a large number of human interventions. Our new method has three advantages. First of all, the new method is an automated method while the traditional one is manual; Secondly, the new method simplifies the calibration process and costs fewer time; Finally, the new method can obtain higher accuracy result. We calibrated the parameters of camera and laser plane by the new method, and calculated the height of corresponding points on laser line by laser triangulation. The validity of our new method is verified by analyzing and comparing the error of height value
Machine vision is widely used in the detection of surface defects in industrial products. However, traditional detection algorithms are usually specialized and cannot be generalized to detect all types of defects. Object detection algorithms based on deep learning have powerful learning ability and can identify various types of defects. This paper applied object detection algorithm to defects detection of paper dish. We first captured the images with different shapes of defects. Then defects in these images were annotated and integrated for model training. Next, the model Mask R-CNN were trained for defects detection. At last, we tested the model on different defects categories. Not only the category and the location of the defect in the image could be got, but also the pixel segmentation were given. The experiments show that Mask R-CNN is a successful approach for defect detection task, which can quickly detect defects with a high accuracy.
A typical texture retrieval system performs feature comparison and might not be able to make human-like judgments of image similarity. Meanwhile, it is commonly known that perceptual texture similarity is difficult to be described by traditional image features. In this paper, we propose a new texture retrieval scheme based on texture perceptual similarity. The key of the proposed scheme is that prediction of perceptual similarity is performed by learning a non-linear mapping from image features space to perceptual texture space by using Random Forest. We test the method on natural texture dataset and apply it on a new wallpapers dataset. Experimental results demonstrate that the proposed texture retrieval scheme with perceptual similarity improves the retrieval performance over traditional image features.
Semantic attributes are commonly used for texture description. They can be used to describe the information of a texture, such as patterns, textons, distributions, brightness, and so on. Generally speaking, semantic attributes are more concrete descriptors than perceptual features. Therefore, it is practical to generate texture images from semantic attributes. In this paper, we propose to generate high-quality texture images from semantic attributes. Over the last two decades, several works have been done on texture synthesis and generation. Most of them focusing on example-based texture synthesis and procedural texture generation. Semantic attributes based texture generation still deserves more devotion. Gan et al. proposed a useful joint model for perception driven texture generation. However, perceptual features are nonobjective spatial statistics used by humans to distinguish different textures in pre-attentive situations. To give more describing information about texture appearance, semantic attributes which are more in line with human description habits are desired. In this paper, we use sigmoid cross entropy loss in an auxiliary model to provide enough information for a generator. Consequently, the discriminator is released from the relatively intractable mission of figuring out the joint distribution of condition vectors and samples. To demonstrate the validity of our method, we compare our method to Gan et al.'s method on generating textures by designing experiments on PTD and DTD. All experimental results show that our model can generate textures from semantic attributes.
Neurodegenerative diseases (NDs) usually cause gait disorders and postural disorders, which provides an important basis for NDs diagnosis. By observing and analyzing these clinical manifestations, medical specialists finally give diagnostic results to the patient, which is inefficient and can be easily affected by doctors' subjectivity. In this paper, we propose a two-layer Long Short-Term Memory (LSTM) model to learn the gait patterns exhibited in the three NDs. The model was trained and tested using temporal data that was recorded by force-sensitive resistors including time series, such as stride interval and swing interval. Our proposed method outperforms other methods in literature in accordance with accuracy of the predicted diagnostic result. Our approach aims at providing the quantitative assessment so that to indicate the diagnosis and treatment of these neurodegenerative diseases in clinic
Semantic scene parsing is considerable in many intelligent field, including perceptual robotics. For the past few years, pixel-wise prediction tasks like semantic segmentation with RGB images has been extensively studied and has reached very remarkable parsing levels, thanks to convolutional neural networks (CNNs) and large scene datasets. With the development of stereo cameras and RGBD sensors, it is expected that additional depth information will help improving accuracy. In this paper, we propose a semantic segmentation framework incorporating RGB and complementary depth information. Motivated by the success of fully convolutional networks (FCN) in semantic segmentation field, we design a fully convolutional networks consists of two branches which extract features from both RGB and depth data simultaneously and fuse them as the network goes deeper. Instead of aggregating multiple model, our goal is to utilize RGB data and depth data more effectively in a single model. We evaluate our approach on the NYU-Depth V2 dataset, which consists of 1449 cluttered indoor scenes, and achieve competitive results with the state-of-the-art methods.
Metal corrosion can cause many problems, how to quickly and effectively assess the grade of metal corrosion and timely remediation is a very important issue. Typically, this is done by trained surveyors at great cost. Assisting them in the inspection process by computer vision and artificial intelligence would decrease the inspection cost. In this paper, we propose a dataset of metal surface correction used for computer vision detection and present a comparison between standard computer vision techniques by using OpenCV and deep learning method for automatic metal surface corrosion grade estimation from single image on this dataset. The test has been performed by classifying images and calculating the accuracy for the two different approaches.
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. The challenge for action recognition is to capture and fuse the multi-dimension information in video data. In order to take into account these characteristics simultaneously, we present a novel method that fuses multiple dimensional features, such as chromatic images, depth and optical flow fields. We built our model based on the multi-stream deep convolutional networks with the help of temporal segment networks and extract discriminative spatial and temporal features by fusing ConvNets towers multi-dimension, in which different feature weights are assigned in order to take full advantage of this multi-dimension information. Our architecture is trained and evaluated on the currently largest and most challenging benchmark NTU RGB-D dataset. The experiments demonstrate that the performance of our method outperforms the state-of-the-art methods.
In this paper, we propose a method for accurate 3D reconstruction based on Photometric Stereo. Instead of applying the global least square solution on the entire over-determined system, we randomly sample the images to form a set of overlapping groups and recover the surface normal for each group using the least square method. We then employ fourdimensional Tensor Robust Principal Component Analysis (TenRPCA) to obtain the accurate 3D reconstruction. Our method outperforms global least square in handling sparse noises such as shadows and specular highlights. Experiments demonstrate the reconstruction accuracy of our approach.
Laser triangulation and photometric stereo are commonly used three-dimensional (3-D) reconstruction methods, but they bear limitations in an underwater environment. One important reason is due to the refraction occurring at the interface (usually glass) of the underwater housing. The image formation process does not follow the commonly used pinhole camera model, and the image captured by the camera is a refracted projection of the object. We introduce a flat refraction model to describe the geometric relation between the refracted image and the real object. The model parameters were estimated in a calibration step with a standard chessboard. The proposed geometric relation is used for rebuilding underwater 3-D shapes in laser triangulation and photometric stereo. The experimental results indicate that our method can effectively correct the distortion in underwater 3-D reconstruction.
Classical photometric stereo requires uniform collimated light, but point light sources are usually employed in practical setups. This introduces errors to the recovered surface shape. We found that when the light sources are evenly placed around the object with the same slant angle, the main component of the errors is the low-frequency deformation, which can be approximately described by a quadratic function. We proposed a postprocessing method to correct the deviation caused by the nonuniform illumination. The method refines the surface shape with prior information from calibration using a flat plane or the object itself. And we further introduce an optimization scheme to improve the reconstruction accuracy when the three-dimensional information of some locations is available. Experiments were conducted using surfaces captured with our device and those from a public dataset. The results demonstrate the effectiveness of the proposed approach.
Surface height map estimation is an important task in high-resolution 3D reconstruction. This task differs from general scene depth estimation in the fact that surface height maps contain more high frequency information or fine details. Existing methods based on radar or other equipments can be used for large-scale scene depth recovery, but might fail in small-scale surface height map estimation. Although some methods are available for surface height reconstruction based on multiple images, e.g. photometric stereo, height map estimation directly from a single image is still a challenging issue. In this paper, we present a novel method based on convolutional neural networks (CNNs) for estimating the height map from a single image, without any equipments or extra prior knowledge of the image contents. Experimental results based on procedural and real texture datasets show the proposed algorithm is effective and reliable.
Underwater images are blurred due to light scattering and absorption. Image restoration is therefore important in many underwater research and practical tasks. In this paper, we propose an effective two-stage method to restore underwater scene images. Based on an underwater light propagation model, we first remove backscatter by fitting a binary quadratic function. Then we eliminate the forward scattering and non-uniform lighting attenuation using blue-green dark channel prior. The proposed method requires no additional calibration and we show its effectiveness and robustness by restoring images captured under various underwater scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.