Video Codec Acceleration¶

For acceleration of video codec, there are two main solutions. One is hardware acceleration, which offers very low latency, low power consumption and can achieve real time encoding. The downside of the hardware acceleration is the cost and it is less flexible.

The other solution is software-based acceleration and has more flexibilities. These acceleration solutions include parallelization, statistics based or machine learning based acceleration. Parallelization of encoding algorithms to maximally use multi-core processors or SIMD units is an effective solution for viedo encoding. Statistics or machine learning based acceleration are also becoming more poplular. In the following paragraph, the related works of software-based acceleration are presented.

Parallelism¶

Parallelism can be divided into Task level and Data level parallelism. Task level parallelism means assign different functions to different computing unit. Since the different complexity of the functions, distributing the task into different multi-core processors is a challenge [1].

For data level parallelism, data can be processed on many units running the same program. It can be further divided into different levels, from Group of Pictures (GOP), frame, tile, block to instruction level. Among all the parallelisms, GOP level offers more flexibility and can preserve higher compressibility. There are already several research using GOP parallelism to improve the encoding speed [2]. Downside of GOP parallelism is that it would consume a lot of memory. The other strategy is tile level parallelism, which divides each frame into several tiles and encodes in parallel. Finally, block level parallelism is also possible but rarely used, since the communication and synchronization between blocks will consume too much time. Each pixel in the block can also be processed in parallel by instruction level parallelism, also known as SIMD. SIMD is the most effective acceleration solution and is supported by most modern processors. [3] have used SIMD for HEVC decoding.

AV1 supports both tile level parallelism (multi-threading) and instruction level parallelism (SIMD). Since most operation are pixel-wise which can be accelerated by SIMD, most coding functions in AV1 have SIMD support.

Statistic Based Approaches¶

Statistical analysis is another common strategy to accelerate encoding speed. It uses intermediate encoding data (on-line or off-line) to decide if some encoding steps can be skipped. By analyzing these statistical data, signs for early termination of encoding process may be discovered. Thus, unnecessary steps can be avoided. These algorithms need profound understanding of the encoding process. [4] have used the relation between R-D costs and variances of pixel motion vectors to early skip the specific inter CUs in HEVC.

Machine Learning Based Approaches¶

Using machine learning in video codecs already has a long history. Similar to statistical analysis, machine learning can also use intermediate encoding data as input. The major difference is the relation between input information and the output decisions is obtained by training with many data.

[5] has used data mining to build decision trees to decides the best coding tree structure for HEVC. Their results show 65% compelexity reduction averagely and with only 1.36% BD-BR and minor BD-PSNR loss compared to HEVC HM. In [6], SVM is used for both CU and PU splitting decisions with selected features including sum of absolute differences (SAD) between blocks, depth of current block, quantization parameters. Their methods achieve up to 68.3% time saving with only 0.093 dB BD-PSNR loss and 4.191% BD-BR gain.

In recent years, deep learning obtains more attentions due to their impressive performance in many fields. Thus many research groups start to apply deep learning to video coding. Some are targeting acceleration of encoding and is disgussed below.

[7] has achieved averagely 65% reduction of HEVC encoding time under inter mode by using CNN and Long Term Short Term Memory (LSTM).

[8] has used CNN for quantization prediction without computing rate and distortion, but the time cost is not revealed.

[9] also has tested three CNN models for partition classification with heterogeneous texture characteristics as input for intra-coding in HEVC. They reduce encoding time by 62.13% with negligible BD-rate loss of 2.01%.

[10] used DenseNet for loop filter in HEVC and achieves −11.62% of BD-BR saving and 0.39dB of BD-PSNR increment on average.

[11] and [12] made a review of video coding using deep learning.

The power of deep learning model is it can learn more situations when given more data which is almost impossible for traditional algorithms to consider all the situations.

[1]	Yang, J. Tham, S. Rahardja and D. Wu, “Real-time H.264 encoder implementation on a low-power digital signal processor,” 2009 IEEE International Conference on Multimedia and Expo, New York, NY, 2009, pp. 1150-1153.

[2]	Sreeramula, Dr. Sankaraiah & Lam, H.S. & Eswaran, C. & Abdullah, Junaidi. (2011). GOP level parallelism on H.264 video encoder for multicore architecture. International Conference on Circuits, System and Simulation IPCIST. 7. 127-132.

[3]	Chi, M. Alvarez-Mesa, B. Bross, B. Juurlink and T. Schierl, “SIMD Acceleration for HEVC Decoding,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 5, pp. 841-855, May 2015.

[4]	Xiong, H. Li, Q. Wu and F. Meng, “A Fast HEVC Inter CU Selection Method Based on Pyramid Motion Divergence,” in IEEE Transactions on Multimedia, vol. 16, no. 2, pp. 559-564, Feb. 2014.

[5]	Correa, P. A. Assuncao, L. V. Agostini and L. A. da Silva Cruz, “Fast HEVC Encoding Decisions Using Data Mining,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 4, pp. 660-673, April 2015.

[6]	Zhu, Y. Zhang, Z. Pan, R. Wang, S. Kwong and Z. Peng, “Binary and Multi-Class Learning Based Low Complexity Optimization for HEVC Encoding,” in IEEE Transactions on Broadcasting, vol. 63, no. 3, pp. 547-561, Sept. 2017.

[7]	Xu, T. Li, Z. Wang, X. Deng, R. Yang and Z. Guan, “Reducing Complexity of HEVC: A Deep Learning Approach,” in IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5044-5059, Oct. 2018.

[8]	Nguyen Canh, Thuong & Xu, Motong & Jeon, Byeungwoo. (2018). Rate-Distortion Optimized Quantization: A Deep Learning Approach.

[9]	Zhang, G. Wang, R. Tian, M. Xu and C. C. J. Kuo, “Texture-Classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC,” 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 241-249.

[10]	Li, M. Xu, R. Yang and X. Tao, “A DenseNet Based Approach for Multi-frame In-loop Filter in HEVC,” 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 270-279.

[11]	Xu, T. Li, Z. Wang, X. Deng, R. Yang and Z. Guan, “Reducing Complexity of HEVC: A Deep Learning Approach,” in IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5044-5059, Oct. 2018.

[12]	Liu, Dong & Li, Yue & Lin, Jianping & Li, Houqiang. (2019). Deep Learning-Based Video Coding: A Review and A Case Study.

[13]	Zhang, Yun & Kwong, Sam & Wang, Shiqi. (2019). Machine Learning based Video Coding Optimizations: A Survey. Information Sciences. 10.1016/j.ins.2019.07.096.