We previously proposed a method of removing reflection artifacts in photoacoustic images that uses deep learning. Our approach generally relies on using simulated photoacoustic channel data to train a convolutional neural network (CNN) that is capable of distinguishing sources from artifacts based on unique differences in their spatial impulse responses (manifested as depth-based differences in wavefront shapes). In this paper, we directly compare a CNN trained with our previous continuous transducer model to a CNN trained with an updated discrete acoustic receiver model that more closely matches an experimental ultrasound transducer. These two CNNs were trained with simulated data and tested on experimental data. The CNN trained using the continuous receiver model correctly classified 100% of sources and 70.3% of artifacts in the experimental data. In contrast, the CNN trained using the discrete receiver model correctly classified 100% of sources and 89.7% of artifacts in the experimental images. The 19.4% increase in artifact classification accuracy indicates that an acoustic receiver model that closely mimics the experimental transducer plays an important role in improving the classification of artifacts in experimental photoacoustic data. Results are promising for developing a method to display CNN-based images that remove artifacts in addition to only displaying network-identified sources as previously proposed.
Interventional applications of photoacoustic imaging often require visualization of point-like targets, including the circular cross sectional tips of needles and catheters or the circular cross sectional views of small cylindrical implants such as brachytherapy seeds. When these point-like targets are imaged in the presence of highly echogenic structures, the resulting photoacoustic wave creates a reflection artifact that may appear as a true signal. We propose to use machine learning principles to identify these type of noise artifacts for removal. A convolutional neural network was trained to identify the location of individual point targets from pre-beamformed data simulated with k-Wave to contain various medium sound speeds (1440-1640 m/s), target locations (5-25 mm), and absorber sizes (1-5 mm). Based on 2,412 randomly selected test images, the mean axial and lateral point location errors were 0.28 mm and 0.37 mm, respectively, which can be regarded as the average imaging system resolution for our trained network. This trained network successfully identified the location of two point targets in a single image with mean axial and lateral errors of 2.6 mm and 2.1 mm, respectively. A true signal and a corresponding reflection artifact were then simulated. The same trained network identified the location of the artifact with mean axial and lateral errors of 2.1 mm and 3.0 mm, respectively. Identified artifacts may be rejected based on wavefront shape differences. These results demonstrate strong promise to identify point targets without requiring traditional geometry-based beamforming, leading to the eventual elimination of reflection artifacts from interventional images.
With the increasing amount of patient information that is being collected today, the idea of using this information to inform future patient care has gained momentum. In many cases, this information comes in the form of medical images. Several algorithms have been presented to automatically segment these images, and to extract structures relevant to different diagnostic or surgical procedures. Consequently, this allows us to obtain large data-sets of shapes, in the form of triangular meshes, segmented from these images. Given correspondences between these shapes, statistical shape models (SSMs) can be built using methods like Principal Component Analysis (PCA). Often, the initial correspondences between the shapes need to be improved, and SSMs can be used to improve these correspondences. However, just as often, initial segmentations also need to be improved. Unlike many correspondence improvement algorithms, which do not affect segmentation, many segmentation improvement algorithms negatively affect correspondences between shapes. We present a method that iteratively improves both segmentation as well as correspondence by using SSMs not only to improve correspondence, but also to constrain the movement of vertices during segmentation improvement. We show that our method is able to maintain correspondence while achieving as good or better segmentations than those produced by methods that improve segmentation without maintaining correspondence. We are additionally able to achieve segmentations with better triangle quality than segmentations produced without correspondence improvement.
We present an automatic segmentation and statistical shape modeling system for the paranasal sinuses which allows us to locate structures in and around the sinuses, as well as to observe the variability in these structures. This system involves deformably registering a given patient image to a manually segmented template image, and using the resulting deformation field to transfer labels from the template to the patient image. We use 3D snake splines to correct errors in this initial segmentation. Once we have several accurately segmented images, we build statistical shape models to observe the population mean and variance for each structure. These shape models are useful to us in several ways. Regular registration methods are insufficient to accurately register pre-operative computed tomography (CT) images with intra-operative endoscopy video of the sinuses. This is because of deformations that occur in structures containing erectile tissue. Our aim is to estimate these deformations using our shape models in order to improve video-CT registration, as well as to distinguish normal variations in anatomy from abnormal variations, and automatically detect and stage pathology. We can also compare the mean shapes and variances in different populations, such as different genders or ethnicities, in order to observe differences and similarities, as well as in different age groups in order to observe the developmental changes that occur in the sinuses.
Functional Endoscopic Sinus Surgery (FESS) is a challenging procedure for otolaryngologists and is the main surgical approach for treating chronic sinusitis, to remove nasal polyps and open up passageways. To reach the source of the problem and to ultimately remove it, the surgeons must often remove several layers of cartilage and tissues. Often, the cartilage occludes or is within a few millimeters of critical anatomical structures such as nerves, arteries and ducts. To make FESS safer, surgeons use navigation systems that register a patient to his/her CT scan and track the position of the tools inside the patient. Current navigation systems, however, suffer from tracking errors greater than 1 mm, which is large when compared to the scale of the sinus cavities, and errors of this magnitude prevent from accurately overlaying virtual structures on the endoscope images. In this paper, we present a method to facilitate this task by 1) registering endoscopic images to CT data and 2) overlaying areas of interests on endoscope images to improve the safety of the procedure. First, our system uses structure from motion (SfM) to generate a small cloud of 3D points from a short video sequence. Then, it uses iterative closest point (ICP) algorithm to register the points to a 3D mesh that represents a section of a patients sinuses. The scale of the point cloud is approximated by measuring the magnitude of the endoscope's motion during the sequence. We have recorded several video sequences from five patients and, given a reasonable initial registration estimate, our results demonstrate an average registration error of 1.21 mm when the endoscope is viewing erectile tissues and an average registration error of 0.91 mm when the endoscope is viewing non-erectile tissues. Our implementation SfM + ICP can execute in less than 7 seconds and can use as few as 15 frames (0.5 second of video). Future work will involve clinical validation of our results and strengthening the robustness to initial guesses and erectile tissues.
In this work we present a method for dense reconstruction of anatomical structures using white light endoscopic imagery based on a learning process that estimates a mapping between light reflectance and surface geometry. Our method is unique in that few unrealistic assumptions are considered (i.e., we do not assume a Lambertian reflectance model nor do we assume a point light source) and we learn a model on a per-patient basis, thus increasing the accuracy and extensibility to different endoscopic sequences. The proposed method assumes accurate video-CT registration through a combination of Structure-from-Motion (SfM) and Trimmed-ICP, and then uses the registered 3D structure and motion to generate training data with which to learn a multivariate regression of observed pixel values to known 3D surface geometry. We demonstrate with a non-linear regression technique using a neural network towards estimating depth images and surface normal maps, resulting in high-resolution spatial 3D reconstructions to an average error of 0.53mm (on the low side, when anatomy matches the CT precisely) to 1.12mm (on the high side, when the presence of liquids causes scene geometry that is not present in the CT for evaluation). Our results are exhibited on patient data and validated with associated CT scans. In total, we processed 206 total endoscopic images from patient data, where each image yields approximately 1 million reconstructed 3D points per image.
We present a system for registering the coordinate frame of an endoscope to pre- or intra- operatively acquired CT data based on optimizing the similarity metric between an endoscopic image and an image predicted via rendering of CT. Our method is robust and semi-automatic because it takes account of physical constraints, specifically, collisions between the endoscope and the anatomy, to initialize and constrain the search. The proposed optimization method is based on a stochastic optimization algorithm that evaluates a large number of similarity metric functions in parallel on a graphics processing unit. Images from a cadaver and a patient were used for evaluation. The registration error was 0.83 mm and 1.97 mm for cadaver and patient images respectively. The average registration time for 60 trials was 4.4 seconds. The patient study demonstrated robustness of the proposed algorithm against a moderate anatomical deformation.
KEYWORDS: Video, Video surveillance, Unmanned aerial vehicles, Detection and tracking algorithms, Cameras, Signal to noise ratio, Data acquisition, Analytical research, Kinematics, Sensors
Persistent aerial video surveillance from small UAV (SUAV) platforms requires accurate and robust target tracking
capabilities. However, video tracks can break due to excessive camera motion, target resolution, low signal-to noise
ratio, video frame dropout, and frame-to-frame registration errors. Connecting broken tracks (video track repair) is thus
essential for maintaining high quality target tracks. In this paper we present an approach to track repair based on multi-hypothesis
sequential probability ratio tests (MHSPRT) that is suitable for real-time video tracking applications. To
reduce computational complexity, the approach uses a target dynamics model whose state estimation covariance matrix
has an analytic eigendecomposition. Chi-square gating is used to form feasible track-to-track associations, and a set of
local hypothesis tests is defined for associating new tracks with coasted tracks. Evidence is accumulated across video
frames by propagating posterior probabilities associated with each track repair hypothesis in the MHSPRT framework.
Global maximum likelihood and maximum a posteriori estimation techniques resolve conflicts between local track
association hypotheses. The approach also supports fusion of appearance-based features to augment statistical
distributions of the track state and enhance performance during periods of kinematic ambiguity. First, an overview of the
video tracker technology is presented. Next the track repair algorithm is described. Finally, numerical results are
reported demonstrating performance on real video data acquired from an SUAV.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.