Current capabilities and sales volume of present-day UAVs (unmanned aerial vehicles) strongly demand counterUAV systems in a lot of applications to protect facilities or areas from misused or threatening drones. In order to reach a maximum detection and information gathering performance such systems need to combine different detection subsystems, i.e. based on visual optical, radar, and radio sensors. But available systems on the market are very expensive, the price is typically far over half a million dollars. Therefore, a far more cost-efficient solution has been developed which is presented in this paper. Four high-resolution visual optical cameras offer full 360 degree observation at distances up to several hundred meters. As soon as UAVs are visible in an image as small dots, they are detected and tracked with a GPU-based point target detector. Radar and radio sensor subsystems detect UAVs at higher distances. A full HD camera on a pan and tilt unit successively focuses on each found object to enable a convolutional neural network (CNN) to classify it with a higher local image resolution to identify UAVs and discard false alarms, e.g. from birds. Furthermore, drone type and payload are determined with CNNs, too, and a laser rangefinder on the pan and tilt unit measures the object distance. All information is collected and visualized in a 2D or 3D environmental map or situation representation on the base of geo-coordinates that are computed based on a RTK GNSS sensor self-localization. All software and hardware components are described in detail. The overall system is powerful, modular, scalable, and cost-efficient.
Object detection is the basis for several computer vision applications and autonomous functionalities. The task has been studied extensively and since the onset of deep learning detection accuracy have increased significantly. Every year several new models based on convolutional neural networks (CNNs) are developed and released. However, the development is driven by large research datasets, such as ImageNet and MS COCO, which aim to cover a large range of classes and contain very strong biases with respect to object size and position. Thus existing models and design choices are biased towards such situations. More specialized domains, such as that of maritime vessel detection, can have very different requirements and not all mainstream models are equally suited towards this task. Specific challenges of maritime vessel detection in surface-to-surface view are a large variety of object sizes due to distances from the camera but also the large range in different vessel types, atmospheric effects, and strong overlap between objects. Furthermore, the lack of large training datasets in such specialized domains is a limitation that needs to be considered. Finally, the existing smaller datasets often contain strong biases themselves, as they were usually recorded in a single location with unique visual characteristics and vessel types that may be very distinct from those in other datasets. In this work we analyze the performance of several of the latest state-of-the-art object detectors in the context of maritime vessel detection. We evaluate the detectors on the limited existing public datasets, including the specialized Singapore Maritime Dataset and the SeaShips dataset but also ship images included in general object detection datasets, such as MS COCO. We specifically analyze how well existing dataset biases impact the ability of the resulting detectors to generalize. In addition to this, we create our own maritime vessel training data from online sources and investigate the impact of adding such data to the training process. Our evaluation results in a set of models which achieve strong vessel detection accuracy on all datasets. In summary, this work does not aim at methodological novelty but rather seeks to provide an empirical basis for choice of object detector and composition of training data for future work on the subject of maritime vessel detection.
Reliable vehicle detection and tracking in wide area motion imagery (WAMI), a novel class of imagery captured by airborne sensor arrays and characterized by large ground coverage and low frame rate, are the basis for higher-level image analysis tasks in wide area aerial surveillance. Possible applications include real-time traffic monitoring, driver behavior analysis, and anomaly detection. Most frameworks for detection and tracking in WAMI data rely on motion-based input detections generated by frame differencing or background subtraction. Subsequently employed tracking approaches aim at recovering missing motion detections to enable persistent tracking, i.e. continuous tracking also for vehicles that become stationary. Recently, a moving object detection method based on convolutional neural networks (CNNs) showed promising results on WAMI data. Therefore, in this work we analyze how CNN-based detection methods can improve persistent WAMI tracking compared to detection methods based on difference images. To find detections, we employ a network that uses consecutive frames as input and computes detection heatmaps as output. The high quality of the output heatmaps allows for detection localization by non-maximum suppression without further post processing. For quantitative evaluation, we use several regions of interest defined on the publicly available, annotated WPAFB 2009 dataset. We employ the common metrics precision, recall, and f-score to evaluate detection performance, and additionally consider track identity switches and multiple object tracking accuracy to assess tracking performance. We first evaluate the moving object detection performance of our deep network in comparison to a previous analysis of difference-image based detection methods. Subsequently, we apply a persistent multiple hypothesis tracker with WAMI-specific adaptations to the CNN-based motion detections, and evaluate the tracking results with respect to a persistent tracking ground truth. We yield significant improvement of both the motion-based input detections and the output tracking quality, demonstrating the potential of CNNs in the context of persistent WAMI tracking.
Wide area motion imagery (WAMI) acquired by an airborne multicamera sensor enables continuous monitoring of large urban areas. Each image can cover regions of several square kilometers and contain thousands of vehicles. Reliable vehicle tracking in this imagery is an important prerequisite for surveillance tasks, but remains challenging due to low frame rate and small object size. Most WAMI tracking approaches rely on moving object detections generated by frame differencing or background subtraction. These detection methods fail when objects slow down or stop. Recent approaches for persistent tracking compensate for missing motion detections by combining a detection-based tracker with a second tracker based on appearance or local context. In order to avoid the additional complexity introduced by combining two trackers, we employ an alternative single tracker framework that is based on multiple hypothesis tracking and recovers missing motion detections with a classifierbased detector. We integrate an appearance-based similarity measure, merge handling, vehicle-collision tests, and clutter handling to adapt the approach to the specific context of WAMI tracking. We apply the tracking framework on a region of interest of the publicly available WPAFB 2009 dataset for quantitative evaluation; a comparison to other persistent WAMI trackers demonstrates state of the art performance of the proposed approach. Furthermore, we analyze in detail the impact of different object detection methods and detector settings on the quality of the output tracking results. For this purpose, we choose four different motion-based detection methods that vary in detection performance and computation time to generate the input detections. As detector parameters can be adjusted to achieve different precision and recall performance, we combine each detection method with different detector settings that yield (1) high precision and low recall, (2) high recall and low precision, and (3) best f-score. Comparing the tracking performance achieved with all generated sets of input detections allows us to quantify the sensitivity of the tracker to different types of detector errors and to derive recommendations for detector and parameter choice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.