The current Internet is not well suited for the transmission of high quality video such as MPEG-2 because of severe quality degradation during network congestion episodes. One possible solution is the combination of layered video coding with the Differentiated Services (DiffServ) architecture; different video layers are mapped into different priority levels, and packets with different priorities receive a different dropping treatment in the network. It is expected that with layering and priority dropping, graceful degradation of video quality will be experienced during congestion episodes. We consider various layering mechanisms defined in the MPEG-2 standards; namely, temporal scalability, data partitioning (DP) and Signal to Noise Ratio (SNR) scalability. The main issue in this paper is how layers should be created to maximize perceived video quality over a given range of network conditions. Key to our study is the use of real life video sequences and a video quality measure consisting of a perceptual distortion metric based on the Human Visual System (VHS). Our results show that video quality is sensitive to how layering is accomplished, and that there is an optimum layering that maximizes the quality for a given network condition. Our results also show that layering can achieve higher network loading for a given minimum quality target than non-layered video, and can achieve graceful degradation over a wider range of network conditions. We have also seen that the wider the range of network conditions is, the higher is the number of layers required in order to remain at the highest possible quality level for each network condition. In particular, we demonstrate how three or four layers achieve better results than two layers; however, additional layers beyond four provide marginal improvement. Therefore, from a practical point of view, three or four layers are sufficient to attain most of the benefits of layering. We compare the various scalability techniques in terms of complexity and video quality. Temporal scalability, which restricts the layering to be done at frame boundaries, is the simplest to implement and introduces no overhead, but performs poorly compared to data partitioning, which allows the grouping of coefficients into layers independent of the frames they belong to. This shows that, contrary to customary belief, dropping data in B frames prior to dropping data in P or I frames is a poor layering technique. DP is much simpler to implement and introduces significantly lower overhead than SNR scalability. However, SNR scalability provides higher quality than DP when network conditions are particularly poor.
|