An Improved Adaptive Quantization Method Based on Perceptual CU [PDF]

studying efficient adaptive quantization algorithms for video coding is very important to improve the coding performance

6 downloads 14 Views 331KB Size

Recommend Stories


An Improved Image Segmentation Algorithm Based on Otsu Method
We can't help everyone, but everyone can help someone. Ronald Reagan

An Adaptive Background Subtraction Method Based on Kernel Density Estimation
And you? When will you begin that long journey into yourself? Rumi

Adaptive Quantization for Hashing
You often feel tired, not because you've done too much, but because you've done too little of what sparks

An Improved Azimuth Angle Estimation Method with a Single Acoustic Vector Sensor Based on an
Forget safety. Live where you fear to live. Destroy your reputation. Be notorious. Rumi

Taxonomy-based Adaptive Web Search Method
Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

An improved calcium chloride method preparation
Love only grows by sharing. You can only have more for yourself by giving it away to others. Brian

Community detection based on modularity and an improved genetic algorithm
Respond to every call that excites your spirit. Rumi

An Improved Pseudo-Random Generator Based on Discrete Log
Raise your words, not voice. It is rain that grows flowers, not thunder. Rumi

an investigation into an improved method of dewatering fine coal
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

Idea Transcript






An Improved Adaptive Quantization Method Based on Perceptual CU Early Splitting for HEVC Guoqing Xiang1, Huizhu Jia*1, Mingyuan Yang2, Jie Liu1, Chuang Zhu1, Xiaodong Xie1 1 EECS of Peking University, Beijing100871, China 2 Beijing BOYA-HUALU Technology Inc, Beijing 100080, China Email: {gqxiang, hzjia, liuzimin, czhu, xdxie} @jdl.ac.cn; [email protected]

Abstract—High Efficiency Video Coding is the latest video coding standard, and it can achieve significant compression performance with numerous coding tools. It supports different quantization parameters for each different coding unit in any coding tree unit. However, the adaptive quantization (AQ) tool in current video standard suffers from the underestimated spatial activity and Lagrange multiplier selection problems. What’s more, no perceptual considering are taken into the AQ model, thus it cannot obtain performance correctly and reasonably. Therefore, in this paper, we first solve the problems existed in current AQ method and then we propose a novel perceptual AQ method (PAQ) based on an adaptive perceptual CU early splitting algorithm to improve performance. Experiments demonstrate that with SSIM as metric, we have achieved -9.3% BD-Rate performance gain with PAQ method compared with the HM10.0 encoder and it outperforms than the existed methods.

I. INTRODUCTION Nowadays the High Efficiency Video Coding (HEVC) standard [1] has become the mainstream video coding standard. It can achieve significant compression performance than proceeding coding standards [2]. HEVC employs a flexible quadtree coding unit (CU) partitioning structure where the size of each CU can vary from 8ൈ8 to 64ൈ64 pixels. Each coding tree unit (CTU) structure will be recursively divided into CU in the quadtree fashion during RDO process for final partition and modes decision. The quantization operation is the main distortion source and the distortion can be introduced into the final reconstructed video frame inevitably. Therefore, studying efficient adaptive quantization algorithms for video coding is very important to improve the coding performance. Since HEVC supports the quantization parameters (QP) changing at each CU level [3] rather than only the macroblock level in H.264/AVC [2], AQ algorithms can be used for all CUs. One spatial AQ method from MPEG-2 TM5 [4] has been incorporated in current HEVC reference software, such as HM10.0. This method describes how to scale the quantization step size according to the spatial activity of each CU relative to its frame average activity level. A perceptual SSIM based adaptive quantization algorithm has been proposed in [5]. The authors view the AQ problem as one of optimizing the SSIM of reconstructed picture and they update QP offset for each CU by deriving the relationship between SSIM-based Lagrange multiplier and QP. However, there are some important issues for these existed AQ methods. First, with our analysis, the current AQ method of TM5 is too conservative when evaluating the spatial activity for each CU, which cannot reflect the real spatial 

characteristics of each CU sufficiently. Second, all the Lagrange multipliers keep the same as the slice one when implementing the TM5 AQ method in HM software, which is not reasonably because the slice level Lagrange multiplier can only reflect the entire frame characteristics but not the CU level difference. At last, from the perceptual point, even though authors proposed the SSIM-based method in [5], when the perceptual characteristics for the content in one large CTU are different at a certain, we should give more opportunity to split it into small CUs and encode with different QPs. There several early splitting methods for fast CU decision [6], but they only focus on more time reduction with less PSNR performance loss, and none of them take the visual quality into consideration with subjective assessment, so these methods cannot be used for our purpose. In order to solve these problems, we first solve the issues in TM5 AQ method and furtherly we develop a perceptual CU early splitting algorithm with JND (just noticeable distortion) method. At last, we present our perceptual AQ method (PAQ) to improve more perceptual coding performance. Our simulation results show that when utilizing the PAQ method, we can achieve more than -9.3% performance. The reminder of this paper is organized as follows. In Section II, the AQ method of MPEG-2 TM5 is introduced and analyzed by implementing on HM10.0 software reference firstly, and then the problems are analyzed and solved. The Section III will introduce a novel spatial JND model first, and we then present the perceptual CU early splitting algorithm with the JND model. With the two models, we propose the PAQ method to further improve subjective performance. Experiments and results analysis are described in Section IV. Finally, we draw conclusion for our algorithm in Section V. II. RELATED WORK AND PROBLEMS ANALYSIS A spatial AQ method is present in MEPG-2 TM5 software reference and it has been implemented in HM10.0 by taking the spatial activity of each CU and frame level into consideration. For each kth CTU of size 2Nൈ2N in depth d (d=0, 1, 2, 3) within one frame, four variance of each NൈN ଷ

ଶ sub-CU ൛ߪௗǡ௞ǡ௜ ൟ௜ୀ଴ are calculated first and the minimum one is selected as the spatial activity of the whole CTU as ଷ

ଶ ௗǡ௞ ൌ ͳ ൅ ‹൛ߪௗǡ௞ǡ௜ ൟ௜ୀ଴ (1) Then the average activity of the entire frame in the dth depth is described by averaged the corresponding ௗǡ௞ as ଵ (2) ‫ܣ‬ௗ ൌ σேିଵ ௞ୀ଴  ௗǡ௞ ே

where N is the total number of CU with dth depth in the frame. Finally, the QP offset for each CU is computed with normalized CU level and frame level spatial activity as ௌ୅೏ǡೖ ା஺೏

οܳܲௗǡ௞ ൌ ͸݈‫݃݋‬ଶ ൬

୅೏ǡೖ ାௌ஺೏



(3) οೂು೘ೌೣ ల

where S is the strength parameter as  ൌ ʹ and οܳܲ݉ܽ‫ ݔ‬is the maximum allowable absolute difference from the slice QP, which can be adjusted by users. We can find that the TM5 AQ method utilizes variance as spatial characteristic and takes CU-level and averaged framelevel influence into consideration to achieve adaptive quantization. However, by studying the procedure in HEVC, we can find that variance of each CTU may be too much underestimated only with the minimum variance in sub-block fo each sub-CU. This may be useful in MPEG-2 TM5 coders for smaller coding units, but for HEVC, coding units with size more than 16x16 are usually utilized, therefore, only the subblock’s variance cannot stand for the spatial characteristics of 2Nൈ2N size CTU sufficiently. What’s more, when operating on HEVC10.0, we find that all the CUs keep the same Lagrange multiplier as the slice level value while different QPs are applied in for them. This violates the rules of AQ for different CU with different spatial activity, because the slice level Lagrange multiplier only relates to entire frame spatial characteristics. So it will be hard to achieve good performance when directly implementing the method. An experiment in [5] has demonstrated there is no performance gain just as this way and the same facts can be verified with our experiments later. In order to solve the implemented problems of MEPG-2 TM5 AQ method in current HEVC reference software, obviously, we compute the spatial activity for each CTU with its own whole variance instead of using the minimum value among its sub-block variances. Thus we update (1) as ଶ ௗǡ௞ ൌ ͳ ൅ ߪௗǡ௞ (4) ଶ where the ߪௗǡ௞ denotes variance of the kth CTU in depth d. Other steps keep the same as (2) and (3). With different QP offset and original slice QP we can obtain each updated QP. Then we update the Lagrange multiplier with a relationship derived in HEVC reference software as (5), where c is a constant depending on coding configuration. (5) ɉ ൌ … ή ʹሺொ௉ିଵଶሻȀଷ It is to be noted that when each CU gets different QP, they will get different Lagrange multiplier. For the RD cost comparing problem, we select the Lagrange multiplier of the corresponding immediate upper CTU to compute the RD cost when deciding whether a CTU should be split or not. The similar way is described in [5]. We define ଴ and  for each 2N ൈ 2N CTU and the combination of four sub-CUs, respectively as showed in Fig. 1, and we compute ɉ଴ for ଴ , and the ɉ଴ǡ௜ for each sub NൈN CU. Finally, we use ɉ଴ both for computing the RD cost of ଴ and  by comparing (6) and (7) for final decision. (6) ‫ܬ‬௑బ ൌ ‫ܦ‬ሺܺ଴ ሻ ൅ ɉ଴ ܴሺܺ଴ ሻ ‫ܬ‬௑ ൌ σ ‫ܦ‬൫ܺ଴ǡ௜ ൯ ൅ ɉ଴ σ ܴ൫ܺ଴ǡ௜ ൯ (7)

X0,0 X0,1

X0

X0,2 X0,3

JX0

JX

Fig. 1 Illustration of CTU and its sub-CUs combination mode

III. PERCEPTUAL ADAPTIVE QUANTIZATION METHOD A. Just noticeable distortion (JND) model For any video frames, human eyes are the ultimate receivers, and it is well known that the visual resolution of the human visual system (HVS) is limited, which can perceive the distortion above certain threshold. The Just Noticeable Distortion (JND) technique can estimate such thresholds well in pixel domain [7, 9] and subband domain [8]. The luminance adaption, spatial texture visual masking effect, contrast sensitivity function etc. are usually considered. In this paper, we adopt an improved pixel domain JND estimation model in [9] by considering the effect of the content regularity on visual masking. Research on cognition science shows that the HVS can be adaptive to extract the visual regularities from an input scene for content perception and understanding. Therefore, model in [9] has demonstrated better performance as

ሺšǡ ›ሻ ൌ ‫ܮ‬஺ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൅ ܸெ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ െͲǤ͵ ή ‹ሼ‫ܮ‬஺ ሺ‫ݔ‬ǡ ‫ݕ‬ሻǡ ܸெ ሺ‫ݔ‬ǡ ‫ݕ‬ሻሽ (8) where x and y denotes the pixel positions,‫ܮ‬஺ is the luminance adaption factor calculated as ஻ሺ௫ǡ௬ሻ

‫ܮ‬஺ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൌ ൞

ͳ͹ ൈ ቆͳ െ ට ଷ ଵଶ଼

ଵଶ଻

ቇ ǡ ݂݅‫ܤ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൑ ͳʹ͹

(9)

ൈ ሺ‫ܤ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ െ ͳʹ͹ሻ ൅ ͵ǡ ݈݁‫݁ݏ‬

where ‫ܤ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ is the average luminance of a local region. Since image regions with high luminance contrast and weak regularity present strong visual masking, the term ܸெ in (7) denoted the visual masking is described as follow. ܸெ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൌ ‫ܮ‬௖ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ή ࣨሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൌ 

ଵǤ଼ସή௅೎ మǤర ௅೎ మ ାଶ଺మ

ή

଴Ǥଷήࣨ మǤర ࣨ మ ାଵ

(10)

The term ‫ܮ‬௖ is the luminance contrast and ࣨ is number of quantified orientation differences, which denotes the orientation complexity. More details can be found in [9]. B. Perceptual CU early Splitting algorithm Although we can apply different QP and Lagrange multiplier for each CU as aforementioned, the CU partition or splitting principle is still MSE based in RDO procedure, which do not take the HVS characteristics into consideration. Actually, from the perceptual point, HEVC supports large CU size, such as 32 ൈ 32 and 64 ൈ 64. When the visual characteristics in four sub-CUs of these large size CTUs vary considerable, we should split them intended. That is to say we can give more opportunity to operate different degree quantization on sub-CUs. Even though the larger CU size becoming the optimal one as the MSE-based RDO result, it makes all the sub-CUs quantized on the same level without considering the perceptual difference.

Considering the JND variance of each CU can reflect its perceptual homogeneity, small JND variance means relative more homogeneous visual characteristics distribution, while CU with relative larger inhomogeneous contents fluctuation can produce large JND variance. Obviously, when the JND variance of four sub-CUs within the same one CTU differ a lot, it means the perceptual characteristics in the CTU are different a lot for human eyes, and we should directly split it to enter RDO procedure for next depth. Input dth CTU Calculate Spatial activity as (4) Calculate QP offset as (2)(3) Update Lagrange multiplier as (5)

dPT?

TABLE I PERFORMANCE COMPARISON WITH DIFFERENT AQ METHODS

Yes

Resolution

Fig. 2 The whole flowchart of proposed PAQ algorithm

For a certain depth dth CTU, we first calculate all the JND values for pixels in it with (8), and then four JND variance of the corresponding sub-CUs become computed as ܸ௃ . We map every JND variance to its CU-level perceptual distortion factor as follow, where k and i are the indices of CTU and sub-CUs. ி ሺ݀ǡ ݇ǡ ݅ሻ ൌ ݈‫݃݋‬ଶ ሺܸ௃ ሺ݀ǡ ݇ǡ ݅ሻሻ (11) The factor ி can reflect every sub-CU’s perceptual characteristic and different ி mean different visual distortion between each sub-CU. Then we compute the maximum difference of the ி for each sub-CU in depth d CTU as ƒš ሼȁி ሺ݀ǡ ݇ǡ ݉ሻ െ ி ሺ݀ǡ ݇ǡ ݊ሻȁሽ (12) ‫ܦ‬௣ ሺ݀ǡ ݇ሻ ൌ

ClassA

ClassB

ClassC

௠ǡ௡‫א‬ሼ଴ǡଵǡଶǡଷሽ

where m, n is the index of the sub-CU. The larger the ‫ܦ‬௣ is, the larger perceptual difference of the sub-CUs, which means the larger requirement for splitting and quantization on different level. So when ‫ܦ‬௣ exceeds a certain perceptual difference degree of the CTU, we can claim that the content perceptual difference among the sub-CUs become more sensitive for human eyes than the content only within the CTU. So we define the threshold for kth CTU early splitting as (13) ்ܲ ሺ݀ǡ ݇ሻ ൌ ߱ሺ݀ሻி ሺ݀ǡ ݇ሻ where ߱ሺ݀ሻ means that the perceptual difference degree that can be perceived between sub-CUs and ி ሺ݀ǡ ݇ሻ denotes the CTU level perceptual distortion. Since the perceptual characteristics in size 16ൈ16 will not change a lot and the 8ൈ8 is the minimum size in HEVC, we only carry this method on depth 0 and 1 with size 64ൈ64 and 32ൈ32. With extensive experiments, we set ߱ as 0.75 and 0.6 for them with available performance, respectively. Therefore, we can realize the PAQ method by integrating the perceptual early splitting method and with the spatial AQ

ClassD

Average

Sequence

TM5 [4]

YEO [5]

Proposed

Traffic

-2.6%

-9.6%

-12.1%

PeopleOnStreet

1.1%

-7.0%

-8.1%

NebutaFestival

1.0%

0.2%

-0.6%

SteamLocomotive

3.0%

0.7%

-13.6%

BasketballDrive

2.1%

-8.3%

-7.4%

Cactus

1.3%

0.0%

-0.9%

ParkScene

0.2%

-5.4%

-8.4%

BQTerrace

1.0%

-7.2%

-10.0%

Kimono

3.3%

2.3%

1.9%

BasketballDrill

-0.2%

-18.1%

-22.0%

BQMall

2.7%

-4.1%

-3.1%

PartyScene

1.5%

-4.0%

-8.9%

RaceHorses

2.0%

-2.2%

-1.9%

BasketballPass

-0.7%

-14.7%

-16.6%

BlowingBubbles

1.7%

-4.0%

-6.3%

BQSquare

-2.8%

-21.9%

-33.1%

RaceHorses

0.0%

-8.4%

-9.4%

0.9%

-6.6%

-9.3%

Fig. 3 illustrates the various SSIM-Rate performance of the PAQ method for the sequence BasketballPass_416x240 and BasketballDrill_832x480 as representatives while all the results are listed in Table I. It shows that the original TM5 AQ method cannot achieve performance as demonstrated before. With our perceptual CU early splitting considering, the proposed PAQ method can get performance with -9.3% and outperforms than other methods. By observing the results, we can see that most sequences can achieve better subjective quality with the PAQ method except the sequence Kimono. Because most regions in one frame are complex texture for Kimono, the CU level spatial variance may not distinguish with each other well.

V. CONCLUSION

SSIM

BasketballPass 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.85 0.83 0.81

Anchor PAQ

0

400

800 Bitrate

1200

1600

(a)

SSIM

BasketballDrill 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.85 0.83 0.81

Anchor

PAQ

0

1000

2000 Bitrate

3000

In this paper, we have proposed an improved adaptive quantization method based on perceptual CU early splitting operation. First we analysis the problem of MPEG-2 TM5 AQ method in HM10.0 reference software and the reasons of performance loss, then we have solved these problems properly. Finally we present a novel perceptual adaptive CU early splitting method and integrate it to develop our PAQ method to improve the adaptive quantization performance. Experiments show than with the PAQ method we can achieve -9.3% performance with better subjective quality, and more than -9.16% encoding time can be saved. In the future, we will take the both spatial and temporal characteristics into consideration to improve performance further. REFERENCE

4000

(b) Fig. 3 SSIM vs Rate comparison for anchor and PAQ method

In the proposed perceptual adaptive quantization scheme, the calculation of JND model has little influence for the coding complexity. What’s more, due to some CUs are skipped in RDO recursion, the final encoding complexity can be reduced. We have measured the encoding runtime of the anchor and PAQ method, and find our method can save more than 9.16% encoding time on average for its utilization of perceptual CU early splitting method. Finally, subjective quality for sequence BQSquare_416x240 is showed in Fig. 4. We can see that our method can present better visual quality significant with almost the same bitrates which demonstrate that it can protect sensitive regions well.

(a) Anchor: Bitrate 98.79kbit/s SSIM: 0.8457

(b) PAQ: Bitrate 99.00kbit/s SSIM: 0.8561 Fig. 4 Subjective quality comparison for sequence BQSquare

[1]

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology. [2] J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, “Comparison of the coding efficiency of video coding standards including High Efficiency Video Coding (HEVC),” IEEE Transactions on Circuits and Systems for Video Technology. [3] B. Bross, W.-J. Han, J.-R. Ohm, G. J. Sullivan, and T. Wiegand, “WD9: Working Draft 9 of High-Efficiency Video Coding,” in JCTVC-K1003, Shanghai, China, Oct 2012. [4] ISO/IEC/JTC1/SC29/WG11, MPEG-2 Test Model 5, chapter 10. “Rate control and quantization control”, Mar. 1993.. [5] Yeo C, Tan H L, Tan Y H. SSIM-based adaptive quantization in HEVC[C]//Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013: 1690-1694. [6] Jie Liu, Huizhu Jia, Guoqing Xiang, Xiaofeng Huang, BinBin Cai, Chuang Zhu, and Don Xie, “An Adaptive Inter CU Depth Decision Algorithm for HEVC”, in Proc. of Visual Communications and Image Processing,VCIP, Singapore, Dec.13-16, 2015. [7] X. K. Yang, W. S. Lin, Z. K. Lu, E. P. Ong, and S. S. Yao, “Just noticeable distortion model and its applications in video coding,” Signal Processing: Image Communication, vol. 20, no. 7, pp. 662–680, 2005. [8] Z. Wei and K. Ngan, “Spatio-temporal just noticeable distortion profile for grey scale image/video in dct domain,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, no. 3, pp. 337–346, Mar. 2009. [9] J. Wu, G. Shi, W. Lin and C. C. J. Kuo, "Enhanced just noticeable difference model with visual regularity consideration," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 1581-1585. [10] F. Bossen, “Common test conditions and software reference configurations,” in JCTVC-J1100, Stockholm, Sweden, Jul 2012.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.