Knowledge Distillation (KD) aims at using a low-capacity model, called student, to learn from a high-capacity one, termed as teacher, such that the performance of student can be improved. Previous KD methods typically train a student by minimizing a task-related loss and the KD loss simultaneously, with the help of a loss weight hyper-parameter to balance these two terms. In this work, we propose to first transfer the backbone knowledge from a teacher to the student, and then only learn the task-head of the student network. Such a training decomposition alleviate the use of loss weight, which can be hard to define. This allows our method to be easily applied to different datasets or tasks with strong stability. Importantly, the decomposition permits the core of our method, Stage-by-Stage Knowledge Distillation (SSKD), which facilitates progressive feature mimicking from teacher to student. Extensive experiments on CIFAR-100 and ImageNet suggest that SSKD significantly narrows down the performance gap between student and teacher, outperforming state-of-the-art approaches. We also demonstrate the generalization ability of SSKD on object detection on COCO dataset. On both tasks SSKD shows significant improvements.
It is very necessary to recognize person through visual surveillance automatically for public security reason. Human gait
based identification focus on recognizing human by his walking video automatically using computer vision and image
processing approaches. As a potential biometric measure, human gait identification has attracted more and more
researchers. Current human gait identification methods can be divided into two categories: model-based methods and
motion-based methods. In this paper a two-Dimensional Principal Component Analysis and temporal-space analysis
based human gait identification method is proposed. Using background estimation and image subtraction we can get a
binary images sequence from the surveillance video. By comparing the difference of two adjacent images in the gait
images sequence, we can get a difference binary images sequence. Every binary difference image indicates the body
moving mode during a person walking. We use the following steps to extract the temporal-space features from the
difference binary images sequence: Projecting one difference image to Y axis or X axis we can get two vectors. Project
every difference image in the difference binary images sequence to Y axis or X axis difference binary images sequence
we can get two matrixes. These two matrixes indicate the styles of one walking. Then Two-Dimensional Principal
Component Analysis(2DPCA) is used to transform these two matrixes to two vectors while at the same time keep the
maximum separability. Finally the similarity of two human gait images is calculated by the Euclidean distance of the two
vectors. The performance of our methods is illustrated using the CASIA Gait Database.
Gait based human identification is very useful for automatic person recognize through visual surveillance and has
attracted more and more researchers. A key step in gait based human identification is to extract human silhouette from
images sequence. Current silhouette extraction methods are mainly based on simple color subtraction. These methods
have a very poor performance when the color of some body parts is similar to the background. In this paper a
cosegmentation based human silhouette extraction method is proposed. Cosegmentation is typically defined as the task
of jointly segmenting “something similar” in a given set of images. We can divide the human gait images sequence into
several step cycles and every step cycle consist of 10-15 frames. The frames in human gait images sequence have
following similarity: every frame is similar to the next or previous frame; every frame is similar to the corresponding
frame in the next or previous step cycle; every pixel can find similar pixel in other frames. The progress of
cosegmentation based human silhouette extraction can be described as follows: Initially only points which have high
contrast to background are used as foreground kernel points, the points in the background are used as background kernel
points, then points similar to foreground points will be added to foreground points set and the points similar to
background points will be added to background points set. The definition of the similarity consider the context of the
point. Experimental result shows that our method has a better performance comparing to traditional human silhouette
extraction methods.
Keywords: Human gait
To improve the retrieval accuracy of content-based video retrieval systems, researchers face a hard challenge that is
reducing the 'semantic gap' between the extracted features of the systems and the richness of human semantics. This
paper presents a novel video retrieval system to bridge the semantic gap. Firstly, the video captions are segmented from
the video and then are transformed into text format. To extract the semantic information from the video streaming we
apply a text mining process, which adopts a cluster algorithm as a kernel, on the text format captions. On the other hand,
in this system, users are requested to comment on the video which they download from the system when they have
watched the video. Then we associate the users' comments with the video on the system. The same text mining process
is used to deal with the comment texts. We combine the captions of the video with the comments on the video to extract
the semantic information of the video more accurately. Finally, taking advantage of the comments and the captions of the
video, we performed experiments on a set of videos and obtained promising results.
A novel content-based image retrieval data structure is developed in present work. It can improve the searching
efficiency significantly. All images are organized into a tree, in which every node is comprised of images with similar
features. Images in a children node have more similarity (less variance) within themselves in relative to its parent. It
means that every node is a cluster and each of its children nodes is a sub-cluster. Information contained in a node
includes not only the number of images, but also the center and the variance of these images. Upon the addition of new
images, the tree structure is capable of dynamically changing to ensure the minimization of total variance of the tree.
Subsequently, a heuristic method has been designed to retrieve the information from this tree. Given a sample image,
the probability of a tree node that contains the similar images is computed using the center of the node and its variance.
If the probability is higher than a certain threshold, this node will be recursively checked to locate the similar images. So
will its children nodes if their probability is also higher than that threshold. If no sufficient similar images were founded,
a reduced threshold value would be adopted to initiate a new seeking from the root node. The search terminates when it
found sufficient similar images or the threshold value is too low to give meaningful sense. Experiments have shown that
the proposed dynamic cluster tree is able to improve the searching efficiency notably.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.