吃什么药可以延长性功能| 金线莲有什么功效| 六角龙吃什么食物| mz是什么意思| 顾虑是什么意思| 6.5是什么星座| 什么茶| 亲子鉴定需要什么样本| 龙须菜是什么| 月经每个月都提前是什么原因| 嘴角起泡是什么原因| 权威是什么意思| 凶神宜忌是什么意思| 头自动摇摆是什么原因| 阿尔兹海默症吃什么药| 班禅是什么意思| 鸽子和什么一起炖汤最有营养| 腿水肿是什么原因引起的| 右眼皮一直跳什么预兆| 邓超的老婆叫什么名字| 唇炎是什么原因造成的| 产品批号什么意思| 紫萱名字的含义是什么| 墨菲定律什么意思| 鸡是什么动物| 3月9日什么星座| 卒中患者什么意思| 三进宫是什么意思| 结婚一年是什么婚| 护士要什么学历| 心脏彩超能检查出什么| 看灰指甲去医院挂什么科| naco是什么牌子| 贫血喝什么口服液| 腰疼不能弯腰是什么原因引起的| 禾加农是什么字| 直系亲属为什么不能输血| 二次报销需要什么条件| 芹菜不能和什么食物一起吃| 手术刀口吃什么愈合快| 烧仙草粉是什么做的| 曼陀罗是什么意思| 冠状动脉粥样硬化性心脏病吃什么药| ntr是什么意思啊| lll是什么意思| 梦见吃西红柿是什么意思| 2022年五行属什么| bcr是什么意思| 多动症看什么科室| m的意思是什么| 痔疮吃什么药好| 比中指是什么意思| fmc是什么意思| 胆小怕事是什么生肖| gbs检查是什么| 月亮五行属什么| 首饰是什么意思| 花儿为什么那么红| 胃反酸吃点什么能缓解| 蛇什么时候出来活动| 脸上不出汗是什么原因| 正常龟头什么样子| 什么是沙发发质| qt什么意思| 水瓶男和什么座最配| 兰台是什么意思| mar是什么意思| 辩证思维是什么意思| 兴风作浪什么意思| camel什么意思| 肚脐左侧按压疼是什么原因| 雌二醇低是什么原因造成的| 六月一号什么星座| 结缔组织是什么| 心脏回流是什么意思| 梦见嫖娼是什么意思| 金是什么结构的字| 严重失眠吃什么药| 媛交是什么意思| 79年属羊的是什么命| 626什么星座| 木耳和什么不能一起吃| 吃什么水果对肺好| 葫芦娃的爷爷叫什么| 一吃东西就牙疼是什么原因引起的| 肝钙化灶是什么意思| 中暑了吃什么好| 海参为什么越小越贵| 下面干涩是什么原因导致的| 狗狗不吃饭是什么原因| 国家的实质是什么| 绝望是什么意思| 蓝玫瑰代表什么| 梦见龙卷风是什么预兆| 看山不是山看水不是水是什么意思| 五角硬币是什么材质| 月经期喝什么茶好| 孩子吃什么容易长高| 血液粘稠是什么原因| 肩袖损伤吃什么药| 过期的啤酒有什么用处| 38码衣服相当于什么码| 郑恺的父母是做什么的| 总是很困想睡觉是什么原因| 天麻治什么病| 什么食物可以化解结石| 脚气用什么药最好| 胆固醇偏高吃什么食物可以降胆固醇| 孢子是什么| cta是什么检查| 吃鱼有什么好处| 紫罗兰是什么颜色| 脑硬化是什么病严重吗| 国安局是什么单位| 厥阴病是什么意思| 抑郁是什么意思| 牛是什么意思| 血块多是什么原因| 宫颈粘液栓是什么样的| 被cue是什么意思| 舌头发麻什么原因| 手足口病有什么症状| 肚子上长毛是什么原因| 脾虚吃什么食物| 嘴唇上有痣代表什么| 手抖吃什么药| tvb什么意思| 东星斑为什么这么贵| 茱萸是什么意思| 梦见死人预示什么| 清明节一般开什么生肖| 清火喝什么茶| 心绪不宁的意思是什么| 不服气是什么意思| 狗狗拉稀是什么原因| 惊恐发作是什么病| 什么叫同工同酬| 唯我独尊是什么意思| 龙脉是什么意思| 宫腔镜检查后需要注意什么| 2r是什么意思| 诺氟沙星胶囊治什么病| 科举制什么时候废除| 青年是什么意思| 318是什么日子| 高铁跟动车有什么区别| 属蛇女和什么属相最配| 什么牌子的蜂胶最好| 梦见好多老鼠是什么意思| 化学学什么| 马为什么不怕蛇毒| 77年属什么生肖| 牙龈萎缩是什么原因引起的| 办理健康证需要什么材料| 卵巢囊性结构是什么| 洗衣机漏水是什么原因| 调理肠胃吃什么好| 名流是什么意思| 腰痛宁为什么晚上吃| 2013年是什么年| 肾病钾高吃什么食物好| 孢子是什么东西| 受虐倾向是什么| 小儿惊风是什么症状| 浓绿的什么| 杨桃什么味道| 夜里睡觉手麻是什么原因| 卯是什么生肖| rapido是什么牌子| 职位是什么意思| 什么之财| 一什么图画| 今年78岁属什么生肖| 儿童看包皮挂什么科| 自私自利是什么意思| 香波是什么| 佑字五行属什么| 骨密度是什么意思| 天麻与什么煲汤最好| 角化型脚气用什么药| 青椒炒什么好吃又简单| 黄历冲生肖是什么意思| 上午8点是什么时辰| 一什么牛奶| 郑板桥是什么生肖| 梦见和婆婆吵架是什么意思| 大排畸主要检查什么| 什么叫五音不全| 什么水果对胃好更养胃| 脂肪肝吃什么好得快| 低血糖的症状是什么| 早上打嗝是什么原因呢| 69年属什么生肖| 社保缴费基数和工资有什么关系| 榴莲补什么| 本命佛是什么意思| 知了长什么样| 身体老是出汗是什么原因| 太阳是什么| 宽字五行属什么| 为什么想吃甜食| 制片人是什么意思| im医学上是什么意思| 夜间尿多是什么原因| 死间计划到底是什么| 手抖是什么情况| 灰指甲医院挂什么科| 花生的种子是什么| 新生儿五行缺什么查询| 上善若水什么意思| 蒸馏水是什么水| 声线是什么意思| 大什么大| 时蔬是什么意思| 奶头疼是什么原因| 耕的左边读什么| 儿童热感冒吃什么药| 芹菜吃多了会有什么影响| 什么如什么| 婀娜多姿是什么意思| 嘴唇起泡是什么原因引起的| 七六年属什么生肖| 做果冻用什么粉| 什么情况下要打狂犬疫苗| 有何指教是什么意思| 学架子鼓有什么好处| 描述是什么意思| 旻读什么| 不甘心是什么意思| 毛豆不能和什么一起吃| 自强不息的息是什么意思| 蜗牛的天敌是什么| 饺子什么馅儿最好吃| 一只眼睛充血是什么原因| 吴用属什么生肖| 血压高什么原因| 似水年华是什么意思| 熟地黄是什么| 精囊炎吃什么药| 把妹是什么意思| 后颈长痘痘是什么原因| 涧什么字| 荷尔蒙什么意思| 流连忘返的返是什么意思| 什么叫佛系| 阿昔洛韦片治什么病| 决堤什么意思| mri是什么| 牙缝越来越大是什么原因| 衣原体阴性是什么意思| 切除一侧输卵管对女性有什么影响| 暗房是什么意思| 小白和兽神什么关系| 脑死亡是什么意思| 胆结石吃什么可以化掉结石| 什么赴什么继| 脚掌麻木是什么原因| 钾离子低的原因是什么| 什么思而行| 拍胸片挂什么科室| 手麻脚麻是什么病| 梦到吃螃蟹是什么意思| aoc是什么牌子| 补牙为什么要分三次| 百度

一场冰雹“砸晕”云南春茶价格走势


Directory: ../../../ffmpeg/
File: src/libavcodec/vp9dsp_template.c
Date: 2025-08-04 00:43:16
Exec Total Coverage
Lines: 1374 1406 97.7%
Functions: 274 308 89.0%
Branches: 326 337 96.7%

Line Branch Exec Source
1 /*
2 * VP9 compatible video decoder
3 *
4 * Copyright (C) 2013 Ronald S. Bultje <rsbultje gmail com>
5 * Copyright (C) 2013 Clément B?sch <u pkh me>
6 *
7 * This file is part of FFmpeg.
8 *
9 * FFmpeg is free software; you can redistribute it and/or
10 * modify it under the terms of the GNU Lesser General Public
11 * License as published by the Free Software Foundation; either
12 * version 2.1 of the License, or (at your option) any later version.
13 *
14 * FFmpeg is distributed in the hope that it will be useful,
15 * but WITHOUT ANY WARRANTY; without even the implied warranty of
16 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 * Lesser General Public License for more details.
18 *
19 * You should have received a copy of the GNU Lesser General Public
20 * License along with FFmpeg; if not, write to the Free Software
21 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
22 */
23
24 #include "libavutil/common.h"
25 #include "bit_depth_template.c"
26 #include "vp9dsp.h"
27
28 #if BIT_DEPTH != 12
29
30 // FIXME see whether we can merge parts of this (perhaps at least 4x4 and 8x8)
31 // back with h264pred.[ch]
32
33 127788 static void vert_4x4_c(uint8_t *restrict _dst, ptrdiff_t stride,
34 const uint8_t *left, const uint8_t *_top)
35 {
36 127788 pixel *dst = (pixel *) _dst;
37 127788 const pixel *top = (const pixel *) _top;
38 127788 pixel4 p4 = AV_RN4PA(top);
39
40 127788 stride /= sizeof(pixel);
41 127788 AV_WN4PA(dst + stride * 0, p4);
42 127788 AV_WN4PA(dst + stride * 1, p4);
43 127788 AV_WN4PA(dst + stride * 2, p4);
44 127788 AV_WN4PA(dst + stride * 3, p4);
45 127788 }
46
47 27099 static void vert_8x8_c(uint8_t *restrict _dst, ptrdiff_t stride,
48 const uint8_t *left, const uint8_t *_top)
49 {
50 27099 pixel *dst = (pixel *) _dst;
51 27099 const pixel *top = (const pixel *) _top;
52 #if BIT_DEPTH == 8
53 24056 uint64_t p8 = AV_RN64A(top);
54 #else
55 3043 pixel4 p4a = AV_RN4PA(top + 0);
56 3043 pixel4 p4b = AV_RN4PA(top + 4);
57 #endif
58 int y;
59
60 27099 stride /= sizeof(pixel);
61
2/2
✓ Branch 0 taken 216792 times.
✓ Branch 1 taken 27099 times.
243891 for (y = 0; y < 8; y++) {
62 #if BIT_DEPTH == 8
63 192448 AV_WN64A(dst, p8);
64 #else
65 24344 AV_WN4PA(dst + 0, p4a);
66 24344 AV_WN4PA(dst + 4, p4b);
67 #endif
68 216792 dst += stride;
69 }
70 27099 }
71
72 6390 static void vert_16x16_c(uint8_t *restrict _dst, ptrdiff_t stride,
73 const uint8_t *left, const uint8_t *_top)
74 {
75 6390 pixel *dst = (pixel *) _dst;
76 6390 const pixel *top = (const pixel *) _top;
77 #if BIT_DEPTH == 8
78 5605 uint64_t p8a = AV_RN64A(top);
79 5605 uint64_t p8b = AV_RN64A(top + 8);
80 #else
81 785 pixel4 p4a = AV_RN4PA(top + 0);
82 785 pixel4 p4b = AV_RN4PA(top + 4);
83 785 pixel4 p4c = AV_RN4PA(top + 8);
84 785 pixel4 p4d = AV_RN4PA(top + 12);
85 #endif
86 int y;
87
88 6390 stride /= sizeof(pixel);
89
2/2
✓ Branch 0 taken 102240 times.
✓ Branch 1 taken 6390 times.
108630 for (y = 0; y < 16; y++) {
90 #if BIT_DEPTH == 8
91 89680 AV_WN64A(dst + 0, p8a);
92 89680 AV_WN64A(dst + 8, p8b);
93 #else
94 12560 AV_WN4PA(dst + 0, p4a);
95 12560 AV_WN4PA(dst + 4, p4b);
96 12560 AV_WN4PA(dst + 8, p4c);
97 12560 AV_WN4PA(dst + 12, p4d);
98 #endif
99 102240 dst += stride;
100 }
101 6390 }
102
103 1042 static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride,
104 const uint8_t *left, const uint8_t *_top)
105 {
106 1042 pixel *dst = (pixel *) _dst;
107 1042 const pixel *top = (const pixel *) _top;
108 #if BIT_DEPTH == 8
109 778 uint64_t p8a = AV_RN64A(top);
110 778 uint64_t p8b = AV_RN64A(top + 8);
111 778 uint64_t p8c = AV_RN64A(top + 16);
112 778 uint64_t p8d = AV_RN64A(top + 24);
113 #else
114 264 pixel4 p4a = AV_RN4PA(top + 0);
115 264 pixel4 p4b = AV_RN4PA(top + 4);
116 264 pixel4 p4c = AV_RN4PA(top + 8);
117 264 pixel4 p4d = AV_RN4PA(top + 12);
118 264 pixel4 p4e = AV_RN4PA(top + 16);
119 264 pixel4 p4f = AV_RN4PA(top + 20);
120 264 pixel4 p4g = AV_RN4PA(top + 24);
121 264 pixel4 p4h = AV_RN4PA(top + 28);
122 #endif
123 int y;
124
125 1042 stride /= sizeof(pixel);
126
2/2
✓ Branch 0 taken 33344 times.
✓ Branch 1 taken 1042 times.
34386 for (y = 0; y < 32; y++) {
127 #if BIT_DEPTH == 8
128 24896 AV_WN64A(dst + 0, p8a);
129 24896 AV_WN64A(dst + 8, p8b);
130 24896 AV_WN64A(dst + 16, p8c);
131 24896 AV_WN64A(dst + 24, p8d);
132 #else
133 8448 AV_WN4PA(dst + 0, p4a);
134 8448 AV_WN4PA(dst + 4, p4b);
135 8448 AV_WN4PA(dst + 8, p4c);
136 8448 AV_WN4PA(dst + 12, p4d);
137 8448 AV_WN4PA(dst + 16, p4e);
138 8448 AV_WN4PA(dst + 20, p4f);
139 8448 AV_WN4PA(dst + 24, p4g);
140 8448 AV_WN4PA(dst + 28, p4h);
141 #endif
142 33344 dst += stride;
143 }
144 1042 }
145
146 278319 static void hor_4x4_c(uint8_t *_dst, ptrdiff_t stride,
147 const uint8_t *_left, const uint8_t *top)
148 {
149 278319 pixel *dst = (pixel *) _dst;
150 278319 const pixel *left = (const pixel *) _left;
151
152 278319 stride /= sizeof(pixel);
153 278319 AV_WN4PA(dst + stride * 0, PIXEL_SPLAT_X4(left[3]));
154 278319 AV_WN4PA(dst + stride * 1, PIXEL_SPLAT_X4(left[2]));
155 278319 AV_WN4PA(dst + stride * 2, PIXEL_SPLAT_X4(left[1]));
156 278319 AV_WN4PA(dst + stride * 3, PIXEL_SPLAT_X4(left[0]));
157 278319 }
158
159 74390 static void hor_8x8_c(uint8_t *_dst, ptrdiff_t stride,
160 const uint8_t *_left, const uint8_t *top)
161 {
162 74390 pixel *dst = (pixel *) _dst;
163 74390 const pixel *left = (const pixel *) _left;
164 int y;
165
166 74390 stride /= sizeof(pixel);
167
2/2
✓ Branch 0 taken 595120 times.
✓ Branch 1 taken 74390 times.
669510 for (y = 0; y < 8; y++) {
168 595120 pixel4 p4 = PIXEL_SPLAT_X4(left[7 - y]);
169
170 595120 AV_WN4PA(dst + 0, p4);
171 595120 AV_WN4PA(dst + 4, p4);
172 595120 dst += stride;
173 }
174 74390 }
175
176 13695 static void hor_16x16_c(uint8_t *_dst, ptrdiff_t stride,
177 const uint8_t *_left, const uint8_t *top)
178 {
179 13695 pixel *dst = (pixel *) _dst;
180 13695 const pixel *left = (const pixel *) _left;
181 int y;
182
183 13695 stride /= sizeof(pixel);
184
2/2
✓ Branch 0 taken 219120 times.
✓ Branch 1 taken 13695 times.
232815 for (y = 0; y < 16; y++) {
185 219120 pixel4 p4 = PIXEL_SPLAT_X4(left[15 - y]);
186
187 219120 AV_WN4PA(dst + 0, p4);
188 219120 AV_WN4PA(dst + 4, p4);
189 219120 AV_WN4PA(dst + 8, p4);
190 219120 AV_WN4PA(dst + 12, p4);
191 219120 dst += stride;
192 }
193 13695 }
194
195 1474 static void hor_32x32_c(uint8_t *_dst, ptrdiff_t stride,
196 const uint8_t *_left, const uint8_t *top)
197 {
198 1474 pixel *dst = (pixel *) _dst;
199 1474 const pixel *left = (const pixel *) _left;
200 int y;
201
202 1474 stride /= sizeof(pixel);
203
2/2
✓ Branch 0 taken 47168 times.
✓ Branch 1 taken 1474 times.
48642 for (y = 0; y < 32; y++) {
204 47168 pixel4 p4 = PIXEL_SPLAT_X4(left[31 - y]);
205
206 47168 AV_WN4PA(dst + 0, p4);
207 47168 AV_WN4PA(dst + 4, p4);
208 47168 AV_WN4PA(dst + 8, p4);
209 47168 AV_WN4PA(dst + 12, p4);
210 47168 AV_WN4PA(dst + 16, p4);
211 47168 AV_WN4PA(dst + 20, p4);
212 47168 AV_WN4PA(dst + 24, p4);
213 47168 AV_WN4PA(dst + 28, p4);
214 47168 dst += stride;
215 }
216 1474 }
217
218 #endif /* BIT_DEPTH != 12 */
219
220 78331 static void tm_4x4_c(uint8_t *_dst, ptrdiff_t stride,
221 const uint8_t *_left, const uint8_t *_top)
222 {
223 78331 pixel *dst = (pixel *) _dst;
224 78331 const pixel *left = (const pixel *) _left;
225 78331 const pixel *top = (const pixel *) _top;
226 78331 int y, tl = top[-1];
227
228 78331 stride /= sizeof(pixel);
229
2/2
✓ Branch 0 taken 313324 times.
✓ Branch 1 taken 78331 times.
391655 for (y = 0; y < 4; y++) {
230 313324 int l_m_tl = left[3 - y] - tl;
231
232 313324 dst[0] = av_clip_pixel(top[0] + l_m_tl);
233 313324 dst[1] = av_clip_pixel(top[1] + l_m_tl);
234 313324 dst[2] = av_clip_pixel(top[2] + l_m_tl);
235 313324 dst[3] = av_clip_pixel(top[3] + l_m_tl);
236 313324 dst += stride;
237 }
238 78331 }
239
240 20021 static void tm_8x8_c(uint8_t *_dst, ptrdiff_t stride,
241 const uint8_t *_left, const uint8_t *_top)
242 {
243 20021 pixel *dst = (pixel *) _dst;
244 20021 const pixel *left = (const pixel *) _left;
245 20021 const pixel *top = (const pixel *) _top;
246 20021 int y, tl = top[-1];
247
248 20021 stride /= sizeof(pixel);
249
2/2
✓ Branch 0 taken 160168 times.
✓ Branch 1 taken 20021 times.
180189 for (y = 0; y < 8; y++) {
250 160168 int l_m_tl = left[7 - y] - tl;
251
252 160168 dst[0] = av_clip_pixel(top[0] + l_m_tl);
253 160168 dst[1] = av_clip_pixel(top[1] + l_m_tl);
254 160168 dst[2] = av_clip_pixel(top[2] + l_m_tl);
255 160168 dst[3] = av_clip_pixel(top[3] + l_m_tl);
256 160168 dst[4] = av_clip_pixel(top[4] + l_m_tl);
257 160168 dst[5] = av_clip_pixel(top[5] + l_m_tl);
258 160168 dst[6] = av_clip_pixel(top[6] + l_m_tl);
259 160168 dst[7] = av_clip_pixel(top[7] + l_m_tl);
260 160168 dst += stride;
261 }
262 20021 }
263
264 2568 static void tm_16x16_c(uint8_t *_dst, ptrdiff_t stride,
265 const uint8_t *_left, const uint8_t *_top)
266 {
267 2568 pixel *dst = (pixel *) _dst;
268 2568 const pixel *left = (const pixel *) _left;
269 2568 const pixel *top = (const pixel *) _top;
270 2568 int y, tl = top[-1];
271
272 2568 stride /= sizeof(pixel);
273
2/2
✓ Branch 0 taken 41088 times.
✓ Branch 1 taken 2568 times.
43656 for (y = 0; y < 16; y++) {
274 41088 int l_m_tl = left[15 - y] - tl;
275
276 41088 dst[ 0] = av_clip_pixel(top[ 0] + l_m_tl);
277 41088 dst[ 1] = av_clip_pixel(top[ 1] + l_m_tl);
278 41088 dst[ 2] = av_clip_pixel(top[ 2] + l_m_tl);
279 41088 dst[ 3] = av_clip_pixel(top[ 3] + l_m_tl);
280 41088 dst[ 4] = av_clip_pixel(top[ 4] + l_m_tl);
281 41088 dst[ 5] = av_clip_pixel(top[ 5] + l_m_tl);
282 41088 dst[ 6] = av_clip_pixel(top[ 6] + l_m_tl);
283 41088 dst[ 7] = av_clip_pixel(top[ 7] + l_m_tl);
284 41088 dst[ 8] = av_clip_pixel(top[ 8] + l_m_tl);
285 41088 dst[ 9] = av_clip_pixel(top[ 9] + l_m_tl);
286 41088 dst[10] = av_clip_pixel(top[10] + l_m_tl);
287 41088 dst[11] = av_clip_pixel(top[11] + l_m_tl);
288 41088 dst[12] = av_clip_pixel(top[12] + l_m_tl);
289 41088 dst[13] = av_clip_pixel(top[13] + l_m_tl);
290 41088 dst[14] = av_clip_pixel(top[14] + l_m_tl);
291 41088 dst[15] = av_clip_pixel(top[15] + l_m_tl);
292 41088 dst += stride;
293 }
294 2568 }
295
296 386 static void tm_32x32_c(uint8_t *_dst, ptrdiff_t stride,
297 const uint8_t *_left, const uint8_t *_top)
298 {
299 386 pixel *dst = (pixel *) _dst;
300 386 const pixel *left = (const pixel *) _left;
301 386 const pixel *top = (const pixel *) _top;
302 386 int y, tl = top[-1];
303
304 386 stride /= sizeof(pixel);
305
2/2
✓ Branch 0 taken 12352 times.
✓ Branch 1 taken 386 times.
12738 for (y = 0; y < 32; y++) {
306 12352 int l_m_tl = left[31 - y] - tl;
307
308 12352 dst[ 0] = av_clip_pixel(top[ 0] + l_m_tl);
309 12352 dst[ 1] = av_clip_pixel(top[ 1] + l_m_tl);
310 12352 dst[ 2] = av_clip_pixel(top[ 2] + l_m_tl);
311 12352 dst[ 3] = av_clip_pixel(top[ 3] + l_m_tl);
312 12352 dst[ 4] = av_clip_pixel(top[ 4] + l_m_tl);
313 12352 dst[ 5] = av_clip_pixel(top[ 5] + l_m_tl);
314 12352 dst[ 6] = av_clip_pixel(top[ 6] + l_m_tl);
315 12352 dst[ 7] = av_clip_pixel(top[ 7] + l_m_tl);
316 12352 dst[ 8] = av_clip_pixel(top[ 8] + l_m_tl);
317 12352 dst[ 9] = av_clip_pixel(top[ 9] + l_m_tl);
318 12352 dst[10] = av_clip_pixel(top[10] + l_m_tl);
319 12352 dst[11] = av_clip_pixel(top[11] + l_m_tl);
320 12352 dst[12] = av_clip_pixel(top[12] + l_m_tl);
321 12352 dst[13] = av_clip_pixel(top[13] + l_m_tl);
322 12352 dst[14] = av_clip_pixel(top[14] + l_m_tl);
323 12352 dst[15] = av_clip_pixel(top[15] + l_m_tl);
324 12352 dst[16] = av_clip_pixel(top[16] + l_m_tl);
325 12352 dst[17] = av_clip_pixel(top[17] + l_m_tl);
326 12352 dst[18] = av_clip_pixel(top[18] + l_m_tl);
327 12352 dst[19] = av_clip_pixel(top[19] + l_m_tl);
328 12352 dst[20] = av_clip_pixel(top[20] + l_m_tl);
329 12352 dst[21] = av_clip_pixel(top[21] + l_m_tl);
330 12352 dst[22] = av_clip_pixel(top[22] + l_m_tl);
331 12352 dst[23] = av_clip_pixel(top[23] + l_m_tl);
332 12352 dst[24] = av_clip_pixel(top[24] + l_m_tl);
333 12352 dst[25] = av_clip_pixel(top[25] + l_m_tl);
334 12352 dst[26] = av_clip_pixel(top[26] + l_m_tl);
335 12352 dst[27] = av_clip_pixel(top[27] + l_m_tl);
336 12352 dst[28] = av_clip_pixel(top[28] + l_m_tl);
337 12352 dst[29] = av_clip_pixel(top[29] + l_m_tl);
338 12352 dst[30] = av_clip_pixel(top[30] + l_m_tl);
339 12352 dst[31] = av_clip_pixel(top[31] + l_m_tl);
340 12352 dst += stride;
341 }
342 386 }
343
344 #if BIT_DEPTH != 12
345
346 353773 static void dc_4x4_c(uint8_t *_dst, ptrdiff_t stride,
347 const uint8_t *_left, const uint8_t *_top)
348 {
349 353773 pixel *dst = (pixel *) _dst;
350 353773 const pixel *left = (const pixel *) _left;
351 353773 const pixel *top = (const pixel *) _top;
352 353773 pixel4 dc = PIXEL_SPLAT_X4((left[0] + left[1] + left[2] + left[3] +
353 top[0] + top[1] + top[2] + top[3] + 4) >> 3);
354
355 353773 stride /= sizeof(pixel);
356 353773 AV_WN4PA(dst + stride * 0, dc);
357 353773 AV_WN4PA(dst + stride * 1, dc);
358 353773 AV_WN4PA(dst + stride * 2, dc);
359 353773 AV_WN4PA(dst + stride * 3, dc);
360 353773 }
361
362 142833 static void dc_8x8_c(uint8_t *_dst, ptrdiff_t stride,
363 const uint8_t *_left, const uint8_t *_top)
364 {
365 142833 pixel *dst = (pixel *) _dst;
366 142833 const pixel *left = (const pixel *) _left;
367 142833 const pixel *top = (const pixel *) _top;
368 142833 pixel4 dc = PIXEL_SPLAT_X4
369 ((left[0] + left[1] + left[2] + left[3] + left[4] + left[5] +
370 left[6] + left[7] + top[0] + top[1] + top[2] + top[3] +
371 top[4] + top[5] + top[6] + top[7] + 8) >> 4);
372 int y;
373
374 142833 stride /= sizeof(pixel);
375
2/2
✓ Branch 0 taken 1142664 times.
✓ Branch 1 taken 142833 times.
1285497 for (y = 0; y < 8; y++) {
376 1142664 AV_WN4PA(dst + 0, dc);
377 1142664 AV_WN4PA(dst + 4, dc);
378 1142664 dst += stride;
379 }
380 142833 }
381
382 20497 static void dc_16x16_c(uint8_t *_dst, ptrdiff_t stride,
383 const uint8_t *_left, const uint8_t *_top)
384 {
385 20497 pixel *dst = (pixel *) _dst;
386 20497 const pixel *left = (const pixel *) _left;
387 20497 const pixel *top = (const pixel *) _top;
388 20497 pixel4 dc = PIXEL_SPLAT_X4
389 ((left[0] + left[1] + left[2] + left[3] + left[4] + left[5] + left[6] +
390 left[7] + left[8] + left[9] + left[10] + left[11] + left[12] +
391 left[13] + left[14] + left[15] + top[0] + top[1] + top[2] + top[3] +
392 top[4] + top[5] + top[6] + top[7] + top[8] + top[9] + top[10] +
393 top[11] + top[12] + top[13] + top[14] + top[15] + 16) >> 5);
394 int y;
395
396 20497 stride /= sizeof(pixel);
397
2/2
✓ Branch 0 taken 327952 times.
✓ Branch 1 taken 20497 times.
348449 for (y = 0; y < 16; y++) {
398 327952 AV_WN4PA(dst + 0, dc);
399 327952 AV_WN4PA(dst + 4, dc);
400 327952 AV_WN4PA(dst + 8, dc);
401 327952 AV_WN4PA(dst + 12, dc);
402 327952 dst += stride;
403 }
404 20497 }
405
406 11002 static void dc_32x32_c(uint8_t *_dst, ptrdiff_t stride,
407 const uint8_t *_left, const uint8_t *_top)
408 {
409 11002 pixel *dst = (pixel *) _dst;
410 11002 const pixel *left = (const pixel *) _left;
411 11002 const pixel *top = (const pixel *) _top;
412 11002 pixel4 dc = PIXEL_SPLAT_X4
413 ((left[0] + left[1] + left[2] + left[3] + left[4] + left[5] + left[6] +
414 left[7] + left[8] + left[9] + left[10] + left[11] + left[12] +
415 left[13] + left[14] + left[15] + left[16] + left[17] + left[18] +
416 left[19] + left[20] + left[21] + left[22] + left[23] + left[24] +
417 left[25] + left[26] + left[27] + left[28] + left[29] + left[30] +
418 left[31] + top[0] + top[1] + top[2] + top[3] + top[4] + top[5] +
419 top[6] + top[7] + top[8] + top[9] + top[10] + top[11] + top[12] +
420 top[13] + top[14] + top[15] + top[16] + top[17] + top[18] + top[19] +
421 top[20] + top[21] + top[22] + top[23] + top[24] + top[25] + top[26] +
422 top[27] + top[28] + top[29] + top[30] + top[31] + 32) >> 6);
423 int y;
424
425 11002 stride /= sizeof(pixel);
426
2/2
✓ Branch 0 taken 352064 times.
✓ Branch 1 taken 11002 times.
363066 for (y = 0; y < 32; y++) {
427 352064 AV_WN4PA(dst + 0, dc);
428 352064 AV_WN4PA(dst + 4, dc);
429 352064 AV_WN4PA(dst + 8, dc);
430 352064 AV_WN4PA(dst + 12, dc);
431 352064 AV_WN4PA(dst + 16, dc);
432 352064 AV_WN4PA(dst + 20, dc);
433 352064 AV_WN4PA(dst + 24, dc);
434 352064 AV_WN4PA(dst + 28, dc);
435 352064 dst += stride;
436 }
437 11002 }
438
439 9204 static void dc_left_4x4_c(uint8_t *_dst, ptrdiff_t stride,
440 const uint8_t *_left, const uint8_t *top)
441 {
442 9204 pixel *dst = (pixel *) _dst;
443 9204 const pixel *left = (const pixel *) _left;
444 9204 pixel4 dc = PIXEL_SPLAT_X4((left[0] + left[1] + left[2] + left[3] + 2) >> 2);
445
446 9204 stride /= sizeof(pixel);
447 9204 AV_WN4PA(dst + stride * 0, dc);
448 9204 AV_WN4PA(dst + stride * 1, dc);
449 9204 AV_WN4PA(dst + stride * 2, dc);
450 9204 AV_WN4PA(dst + stride * 3, dc);
451 9204 }
452
453 3370 static void dc_left_8x8_c(uint8_t *_dst, ptrdiff_t stride,
454 const uint8_t *_left, const uint8_t *top)
455 {
456 3370 pixel *dst = (pixel *) _dst;
457 3370 const pixel *left = (const pixel *) _left;
458 3370 pixel4 dc = PIXEL_SPLAT_X4
459 ((left[0] + left[1] + left[2] + left[3] +
460 left[4] + left[5] + left[6] + left[7] + 4) >> 3);
461 int y;
462
463 3370 stride /= sizeof(pixel);
464
2/2
✓ Branch 0 taken 26960 times.
✓ Branch 1 taken 3370 times.
30330 for (y = 0; y < 8; y++) {
465 26960 AV_WN4PA(dst + 0, dc);
466 26960 AV_WN4PA(dst + 4, dc);
467 26960 dst += stride;
468 }
469 3370 }
470
471 1205 static void dc_left_16x16_c(uint8_t *_dst, ptrdiff_t stride,
472 const uint8_t *_left, const uint8_t *top)
473 {
474 1205 pixel *dst = (pixel *) _dst;
475 1205 const pixel *left = (const pixel *) _left;
476 1205 pixel4 dc = PIXEL_SPLAT_X4
477 ((left[0] + left[1] + left[2] + left[3] + left[4] + left[5] +
478 left[6] + left[7] + left[8] + left[9] + left[10] + left[11] +
479 left[12] + left[13] + left[14] + left[15] + 8) >> 4);
480 int y;
481
482 1205 stride /= sizeof(pixel);
483
2/2
✓ Branch 0 taken 19280 times.
✓ Branch 1 taken 1205 times.
20485 for (y = 0; y < 16; y++) {
484 19280 AV_WN4PA(dst + 0, dc);
485 19280 AV_WN4PA(dst + 4, dc);
486 19280 AV_WN4PA(dst + 8, dc);
487 19280 AV_WN4PA(dst + 12, dc);
488 19280 dst += stride;
489 }
490 1205 }
491
492 1566 static void dc_left_32x32_c(uint8_t *_dst, ptrdiff_t stride,
493 const uint8_t *_left, const uint8_t *top)
494 {
495 1566 pixel *dst = (pixel *) _dst;
496 1566 const pixel *left = (const pixel *) _left;
497 1566 pixel4 dc = PIXEL_SPLAT_X4
498 ((left[0] + left[1] + left[2] + left[3] + left[4] + left[5] +
499 left[6] + left[7] + left[8] + left[9] + left[10] + left[11] +
500 left[12] + left[13] + left[14] + left[15] + left[16] + left[17] +
501 left[18] + left[19] + left[20] + left[21] + left[22] + left[23] +
502 left[24] + left[25] + left[26] + left[27] + left[28] + left[29] +
503 left[30] + left[31] + 16) >> 5);
504 int y;
505
506 1566 stride /= sizeof(pixel);
507
2/2
✓ Branch 0 taken 50112 times.
✓ Branch 1 taken 1566 times.
51678 for (y = 0; y < 32; y++) {
508 50112 AV_WN4PA(dst + 0, dc);
509 50112 AV_WN4PA(dst + 4, dc);
510 50112 AV_WN4PA(dst + 8, dc);
511 50112 AV_WN4PA(dst + 12, dc);
512 50112 AV_WN4PA(dst + 16, dc);
513 50112 AV_WN4PA(dst + 20, dc);
514 50112 AV_WN4PA(dst + 24, dc);
515 50112 AV_WN4PA(dst + 28, dc);
516 50112 dst += stride;
517 }
518 1566 }
519
520 11040 static void dc_top_4x4_c(uint8_t *_dst, ptrdiff_t stride,
521 const uint8_t *left, const uint8_t *_top)
522 {
523 11040 pixel *dst = (pixel *) _dst;
524 11040 const pixel *top = (const pixel *) _top;
525 11040 pixel4 dc = PIXEL_SPLAT_X4((top[0] + top[1] + top[2] + top[3] + 2) >> 2);
526
527 11040 stride /= sizeof(pixel);
528 11040 AV_WN4PA(dst + stride * 0, dc);
529 11040 AV_WN4PA(dst + stride * 1, dc);
530 11040 AV_WN4PA(dst + stride * 2, dc);
531 11040 AV_WN4PA(dst + stride * 3, dc);
532 11040 }
533
534 7211 static void dc_top_8x8_c(uint8_t *_dst, ptrdiff_t stride,
535 const uint8_t *left, const uint8_t *_top)
536 {
537 7211 pixel *dst = (pixel *) _dst;
538 7211 const pixel *top = (const pixel *) _top;
539 7211 pixel4 dc = PIXEL_SPLAT_X4
540 ((top[0] + top[1] + top[2] + top[3] +
541 top[4] + top[5] + top[6] + top[7] + 4) >> 3);
542 int y;
543
544 7211 stride /= sizeof(pixel);
545
2/2
✓ Branch 0 taken 57688 times.
✓ Branch 1 taken 7211 times.
64899 for (y = 0; y < 8; y++) {
546 57688 AV_WN4PA(dst + 0, dc);
547 57688 AV_WN4PA(dst + 4, dc);
548 57688 dst += stride;
549 }
550 7211 }
551
552 2624 static void dc_top_16x16_c(uint8_t *_dst, ptrdiff_t stride,
553 const uint8_t *left, const uint8_t *_top)
554 {
555 2624 pixel *dst = (pixel *) _dst;
556 2624 const pixel *top = (const pixel *) _top;
557 2624 pixel4 dc = PIXEL_SPLAT_X4
558 ((top[0] + top[1] + top[2] + top[3] + top[4] + top[5] +
559 top[6] + top[7] + top[8] + top[9] + top[10] + top[11] +
560 top[12] + top[13] + top[14] + top[15] + 8) >> 4);
561 int y;
562
563 2624 stride /= sizeof(pixel);
564
2/2
✓ Branch 0 taken 41984 times.
✓ Branch 1 taken 2624 times.
44608 for (y = 0; y < 16; y++) {
565 41984 AV_WN4PA(dst + 0, dc);
566 41984 AV_WN4PA(dst + 4, dc);
567 41984 AV_WN4PA(dst + 8, dc);
568 41984 AV_WN4PA(dst + 12, dc);
569 41984 dst += stride;
570 }
571 2624 }
572
573 1340 static void dc_top_32x32_c(uint8_t *_dst, ptrdiff_t stride,
574 const uint8_t *left, const uint8_t *_top)
575 {
576 1340 pixel *dst = (pixel *) _dst;
577 1340 const pixel *top = (const pixel *) _top;
578 1340 pixel4 dc = PIXEL_SPLAT_X4
579 ((top[0] + top[1] + top[2] + top[3] + top[4] + top[5] +
580 top[6] + top[7] + top[8] + top[9] + top[10] + top[11] +
581 top[12] + top[13] + top[14] + top[15] + top[16] + top[17] +
582 top[18] + top[19] + top[20] + top[21] + top[22] + top[23] +
583 top[24] + top[25] + top[26] + top[27] + top[28] + top[29] +
584 top[30] + top[31] + 16) >> 5);
585 int y;
586
587 1340 stride /= sizeof(pixel);
588
2/2
✓ Branch 0 taken 42880 times.
✓ Branch 1 taken 1340 times.
44220 for (y = 0; y < 32; y++) {
589 42880 AV_WN4PA(dst + 0, dc);
590 42880 AV_WN4PA(dst + 4, dc);
591 42880 AV_WN4PA(dst + 8, dc);
592 42880 AV_WN4PA(dst + 12, dc);
593 42880 AV_WN4PA(dst + 16, dc);
594 42880 AV_WN4PA(dst + 20, dc);
595 42880 AV_WN4PA(dst + 24, dc);
596 42880 AV_WN4PA(dst + 28, dc);
597 42880 dst += stride;
598 }
599 1340 }
600
601 #endif /* BIT_DEPTH != 12 */
602
603 386 static void dc_128_4x4_c(uint8_t *_dst, ptrdiff_t stride,
604 const uint8_t *left, const uint8_t *top)
605 {
606 386 pixel *dst = (pixel *) _dst;
607 386 pixel4 val = PIXEL_SPLAT_X4(128 << (BIT_DEPTH - 8));
608
609 386 stride /= sizeof(pixel);
610 386 AV_WN4PA(dst + stride * 0, val);
611 386 AV_WN4PA(dst + stride * 1, val);
612 386 AV_WN4PA(dst + stride * 2, val);
613 386 AV_WN4PA(dst + stride * 3, val);
614 386 }
615
616 223 static void dc_128_8x8_c(uint8_t *_dst, ptrdiff_t stride,
617 const uint8_t *left, const uint8_t *top)
618 {
619 223 pixel *dst = (pixel *) _dst;
620 223 pixel4 val = PIXEL_SPLAT_X4(128 << (BIT_DEPTH - 8));
621 int y;
622
623 223 stride /= sizeof(pixel);
624
2/2
✓ Branch 0 taken 1784 times.
✓ Branch 1 taken 223 times.
2007 for (y = 0; y < 8; y++) {
625 1784 AV_WN4PA(dst + 0, val);
626 1784 AV_WN4PA(dst + 4, val);
627 1784 dst += stride;
628 }
629 223 }
630
631 155 static void dc_128_16x16_c(uint8_t *_dst, ptrdiff_t stride,
632 const uint8_t *left, const uint8_t *top)
633 {
634 155 pixel *dst = (pixel *) _dst;
635 155 pixel4 val = PIXEL_SPLAT_X4(128 << (BIT_DEPTH - 8));
636 int y;
637
638 155 stride /= sizeof(pixel);
639
2/2
✓ Branch 0 taken 2480 times.
✓ Branch 1 taken 155 times.
2635 for (y = 0; y < 16; y++) {
640 2480 AV_WN4PA(dst + 0, val);
641 2480 AV_WN4PA(dst + 4, val);
642 2480 AV_WN4PA(dst + 8, val);
643 2480 AV_WN4PA(dst + 12, val);
644 2480 dst += stride;
645 }
646 155 }
647
648 139 static void dc_128_32x32_c(uint8_t *_dst, ptrdiff_t stride,
649 const uint8_t *left, const uint8_t *top)
650 {
651 139 pixel *dst = (pixel *) _dst;
652 139 pixel4 val = PIXEL_SPLAT_X4(128 << (BIT_DEPTH - 8));
653 int y;
654
655 139 stride /= sizeof(pixel);
656
2/2
✓ Branch 0 taken 4448 times.
✓ Branch 1 taken 139 times.
4587 for (y = 0; y < 32; y++) {
657 4448 AV_WN4PA(dst + 0, val);
658 4448 AV_WN4PA(dst + 4, val);
659 4448 AV_WN4PA(dst + 8, val);
660 4448 AV_WN4PA(dst + 12, val);
661 4448 AV_WN4PA(dst + 16, val);
662 4448 AV_WN4PA(dst + 20, val);
663 4448 AV_WN4PA(dst + 24, val);
664 4448 AV_WN4PA(dst + 28, val);
665 4448 dst += stride;
666 }
667 139 }
668
669 2528 static void dc_127_4x4_c(uint8_t *_dst, ptrdiff_t stride,
670 const uint8_t *left, const uint8_t *top)
671 {
672 2528 pixel *dst = (pixel *) _dst;
673 2528 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) - 1);
674
675 2528 stride /= sizeof(pixel);
676 2528 AV_WN4PA(dst + stride * 0, val);
677 2528 AV_WN4PA(dst + stride * 1, val);
678 2528 AV_WN4PA(dst + stride * 2, val);
679 2528 AV_WN4PA(dst + stride * 3, val);}
680
681 334 static void dc_127_8x8_c(uint8_t *_dst, ptrdiff_t stride,
682 const uint8_t *left, const uint8_t *top)
683 {
684 334 pixel *dst = (pixel *) _dst;
685 334 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) - 1);
686 int y;
687
688 334 stride /= sizeof(pixel);
689
2/2
✓ Branch 0 taken 2672 times.
✓ Branch 1 taken 334 times.
3006 for (y = 0; y < 8; y++) {
690 2672 AV_WN4PA(dst + 0, val);
691 2672 AV_WN4PA(dst + 4, val);
692 2672 dst += stride;
693 }
694 334 }
695
696 178 static void dc_127_16x16_c(uint8_t *_dst, ptrdiff_t stride,
697 const uint8_t *left, const uint8_t *top)
698 {
699 178 pixel *dst = (pixel *) _dst;
700 178 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) - 1);
701 int y;
702
703 178 stride /= sizeof(pixel);
704
2/2
✓ Branch 0 taken 2848 times.
✓ Branch 1 taken 178 times.
3026 for (y = 0; y < 16; y++) {
705 2848 AV_WN4PA(dst + 0, val);
706 2848 AV_WN4PA(dst + 4, val);
707 2848 AV_WN4PA(dst + 8, val);
708 2848 AV_WN4PA(dst + 12, val);
709 2848 dst += stride;
710 }
711 178 }
712
713 83 static void dc_127_32x32_c(uint8_t *_dst, ptrdiff_t stride,
714 const uint8_t *left, const uint8_t *top)
715 {
716 83 pixel *dst = (pixel *) _dst;
717 83 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) - 1);
718 int y;
719
720 83 stride /= sizeof(pixel);
721
2/2
✓ Branch 0 taken 2656 times.
✓ Branch 1 taken 83 times.
2739 for (y = 0; y < 32; y++) {
722 2656 AV_WN4PA(dst + 0, val);
723 2656 AV_WN4PA(dst + 4, val);
724 2656 AV_WN4PA(dst + 8, val);
725 2656 AV_WN4PA(dst + 12, val);
726 2656 AV_WN4PA(dst + 16, val);
727 2656 AV_WN4PA(dst + 20, val);
728 2656 AV_WN4PA(dst + 24, val);
729 2656 AV_WN4PA(dst + 28, val);
730 2656 dst += stride;
731 }
732 83 }
733
734 2229 static void dc_129_4x4_c(uint8_t *_dst, ptrdiff_t stride,
735 const uint8_t *left, const uint8_t *top)
736 {
737 2229 pixel *dst = (pixel *) _dst;
738 2229 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) + 1);
739
740 2229 stride /= sizeof(pixel);
741 2229 AV_WN4PA(dst + stride * 0, val);
742 2229 AV_WN4PA(dst + stride * 1, val);
743 2229 AV_WN4PA(dst + stride * 2, val);
744 2229 AV_WN4PA(dst + stride * 3, val);
745 2229 }
746
747 348 static void dc_129_8x8_c(uint8_t *_dst, ptrdiff_t stride,
748 const uint8_t *left, const uint8_t *top)
749 {
750 348 pixel *dst = (pixel *) _dst;
751 348 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) + 1);
752 int y;
753
754 348 stride /= sizeof(pixel);
755
2/2
✓ Branch 0 taken 2784 times.
✓ Branch 1 taken 348 times.
3132 for (y = 0; y < 8; y++) {
756 2784 AV_WN4PA(dst + 0, val);
757 2784 AV_WN4PA(dst + 4, val);
758 2784 dst += stride;
759 }
760 348 }
761
762 258 static void dc_129_16x16_c(uint8_t *_dst, ptrdiff_t stride,
763 const uint8_t *left, const uint8_t *top)
764 {
765 258 pixel *dst = (pixel *) _dst;
766 258 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) + 1);
767 int y;
768
769 258 stride /= sizeof(pixel);
770
2/2
✓ Branch 0 taken 4128 times.
✓ Branch 1 taken 258 times.
4386 for (y = 0; y < 16; y++) {
771 4128 AV_WN4PA(dst + 0, val);
772 4128 AV_WN4PA(dst + 4, val);
773 4128 AV_WN4PA(dst + 8, val);
774 4128 AV_WN4PA(dst + 12, val);
775 4128 dst += stride;
776 }
777 258 }
778
779 140 static void dc_129_32x32_c(uint8_t *_dst, ptrdiff_t stride,
780 const uint8_t *left, const uint8_t *top)
781 {
782 140 pixel *dst = (pixel *) _dst;
783 140 pixel4 val = PIXEL_SPLAT_X4((128 << (BIT_DEPTH - 8)) + 1);
784 int y;
785
786 140 stride /= sizeof(pixel);
787
2/2
✓ Branch 0 taken 4480 times.
✓ Branch 1 taken 140 times.
4620 for (y = 0; y < 32; y++) {
788 4480 AV_WN4PA(dst + 0, val);
789 4480 AV_WN4PA(dst + 4, val);
790 4480 AV_WN4PA(dst + 8, val);
791 4480 AV_WN4PA(dst + 12, val);
792 4480 AV_WN4PA(dst + 16, val);
793 4480 AV_WN4PA(dst + 20, val);
794 4480 AV_WN4PA(dst + 24, val);
795 4480 AV_WN4PA(dst + 28, val);
796 4480 dst += stride;
797 }
798 140 }
799
800 #if BIT_DEPTH != 12
801
802 #if BIT_DEPTH == 8
803 #define memset_bpc memset
804 #else
805 8040 static inline void memset_bpc(uint16_t *dst, int val, int len) {
806 int n;
807
2/2
✓ Branch 0 taken 45272 times.
✓ Branch 1 taken 8040 times.
53312 for (n = 0; n < len; n++) {
808 45272 dst[n] = val;
809 }
810 8040 }
811 #endif
812
813 #define DST(x, y) dst[(x) + (y) * stride]
814
815 20364 static void diag_downleft_4x4_c(uint8_t *_dst, ptrdiff_t stride,
816 const uint8_t *left, const uint8_t *_top)
817 {
818 20364 pixel *dst = (pixel *) _dst;
819 20364 const pixel *top = (const pixel *) _top;
820 20364 int a0 = top[0], a1 = top[1], a2 = top[2], a3 = top[3],
821 20364 a4 = top[4], a5 = top[5], a6 = top[6], a7 = top[7];
822
823 20364 stride /= sizeof(pixel);
824 20364 DST(0,0) = (a0 + a1 * 2 + a2 + 2) >> 2;
825 20364 DST(1,0) = DST(0,1) = (a1 + a2 * 2 + a3 + 2) >> 2;
826 20364 DST(2,0) = DST(1,1) = DST(0,2) = (a2 + a3 * 2 + a4 + 2) >> 2;
827 20364 DST(3,0) = DST(2,1) = DST(1,2) = DST(0,3) = (a3 + a4 * 2 + a5 + 2) >> 2;
828 20364 DST(3,1) = DST(2,2) = DST(1,3) = (a4 + a5 * 2 + a6 + 2) >> 2;
829 20364 DST(3,2) = DST(2,3) = (a5 + a6 * 2 + a7 + 2) >> 2;
830 20364 DST(3,3) = a7; // note: this is different from vp8 and such
831 20364 }
832
833 #define def_diag_downleft(size) \
834 static void diag_downleft_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
835 const uint8_t *left, const uint8_t *_top) \
836 { \
837 pixel *dst = (pixel *) _dst; \
838 const pixel *top = (const pixel *) _top; \
839 int i, j; \
840 pixel v[size - 1]; \
841 \
842 stride /= sizeof(pixel); \
843 for (i = 0; i < size - 2; i++) \
844 v[i] = (top[i] + top[i + 1] * 2 + top[i + 2] + 2) >> 2; \
845 v[size - 2] = (top[size - 2] + top[size - 1] * 3 + 2) >> 2; \
846 \
847 for (j = 0; j < size; j++) { \
848 memcpy(dst + j*stride, v + j, (size - 1 - j) * sizeof(pixel)); \
849 memset_bpc(dst + j*stride + size - 1 - j, top[size - 1], j + 1); \
850 } \
851 }
852
853
5/5
✓ Branch 0 taken 34014 times.
✓ Branch 1 taken 5669 times.
✓ Branch 2 taken 44112 times.
✓ Branch 3 taken 6754 times.
✓ Branch 4 taken 155 times.
85035 def_diag_downleft(8)
854
5/5
✓ Branch 0 taken 15792 times.
✓ Branch 1 taken 1128 times.
✓ Branch 2 taken 17856 times.
✓ Branch 3 taken 1308 times.
✓ Branch 4 taken 12 times.
34968 def_diag_downleft(16)
855
5/5
✓ Branch 0 taken 4560 times.
✓ Branch 1 taken 152 times.
✓ Branch 2 taken 4288 times.
✓ Branch 3 taken 710 times.
✓ Branch 4 taken 18 times.
9576 def_diag_downleft(32)
856
857 57827 static void diag_downright_4x4_c(uint8_t *_dst, ptrdiff_t stride,
858 const uint8_t *_left, const uint8_t *_top)
859 {
860 57827 pixel *dst = (pixel *) _dst;
861 57827 const pixel *top = (const pixel *) _top;
862 57827 const pixel *left = (const pixel *) _left;
863 57827 int tl = top[-1], a0 = top[0], a1 = top[1], a2 = top[2], a3 = top[3],
864 57827 l0 = left[3], l1 = left[2], l2 = left[1], l3 = left[0];
865
866 57827 stride /= sizeof(pixel);
867 57827 DST(0,3) = (l1 + l2 * 2 + l3 + 2) >> 2;
868 57827 DST(0,2) = DST(1,3) = (l0 + l1 * 2 + l2 + 2) >> 2;
869 57827 DST(0,1) = DST(1,2) = DST(2,3) = (tl + l0 * 2 + l1 + 2) >> 2;
870 57827 DST(0,0) = DST(1,1) = DST(2,2) = DST(3,3) = (l0 + tl * 2 + a0 + 2) >> 2;
871 57827 DST(1,0) = DST(2,1) = DST(3,2) = (tl + a0 * 2 + a1 + 2) >> 2;
872 57827 DST(2,0) = DST(3,1) = (a0 + a1 * 2 + a2 + 2) >> 2;
873 57827 DST(3,0) = (a1 + a2 * 2 + a3 + 2) >> 2;
874 57827 }
875
876 #define def_diag_downright(size) \
877 static void diag_downright_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
878 const uint8_t *_left, const uint8_t *_top) \
879 { \
880 pixel *dst = (pixel *) _dst; \
881 const pixel *top = (const pixel *) _top; \
882 const pixel *left = (const pixel *) _left; \
883 int i, j; \
884 pixel v[size + size - 1]; \
885 \
886 stride /= sizeof(pixel); \
887 for (i = 0; i < size - 2; i++) { \
888 v[i ] = (left[i] + left[i + 1] * 2 + left[i + 2] + 2) >> 2; \
889 v[size + 1 + i] = (top[i] + top[i + 1] * 2 + top[i + 2] + 2) >> 2; \
890 } \
891 v[size - 2] = (left[size - 2] + left[size - 1] * 2 + top[-1] + 2) >> 2; \
892 v[size - 1] = (left[size - 1] + top[-1] * 2 + top[ 0] + 2) >> 2; \
893 v[size ] = (top[-1] + top[0] * 2 + top[ 1] + 2) >> 2; \
894 \
895 for (j = 0; j < size; j++) \
896 memcpy(dst + j*stride, v + size - 1 - j, size * sizeof(pixel)); \
897 }
898
899
4/4
✓ Branch 0 taken 41100 times.
✓ Branch 1 taken 6850 times.
✓ Branch 2 taken 54800 times.
✓ Branch 3 taken 6850 times.
102750 def_diag_downright(8)
900
4/4
✓ Branch 0 taken 12754 times.
✓ Branch 1 taken 911 times.
✓ Branch 2 taken 14576 times.
✓ Branch 3 taken 911 times.
28241 def_diag_downright(16)
901
4/4
✓ Branch 0 taken 5100 times.
✓ Branch 1 taken 170 times.
✓ Branch 2 taken 5440 times.
✓ Branch 3 taken 170 times.
10710 def_diag_downright(32)
902
903 44165 static void vert_right_4x4_c(uint8_t *_dst, ptrdiff_t stride,
904 const uint8_t *_left, const uint8_t *_top)
905 {
906 44165 pixel *dst = (pixel *) _dst;
907 44165 const pixel *top = (const pixel *) _top;
908 44165 const pixel *left = (const pixel *) _left;
909 44165 int tl = top[-1], a0 = top[0], a1 = top[1], a2 = top[2], a3 = top[3],
910 44165 l0 = left[3], l1 = left[2], l2 = left[1];
911
912 44165 stride /= sizeof(pixel);
913 44165 DST(0,3) = (l0 + l1 * 2 + l2 + 2) >> 2;
914 44165 DST(0,2) = (tl + l0 * 2 + l1 + 2) >> 2;
915 44165 DST(0,0) = DST(1,2) = (tl + a0 + 1) >> 1;
916 44165 DST(0,1) = DST(1,3) = (l0 + tl * 2 + a0 + 2) >> 2;
917 44165 DST(1,0) = DST(2,2) = (a0 + a1 + 1) >> 1;
918 44165 DST(1,1) = DST(2,3) = (tl + a0 * 2 + a1 + 2) >> 2;
919 44165 DST(2,0) = DST(3,2) = (a1 + a2 + 1) >> 1;
920 44165 DST(2,1) = DST(3,3) = (a0 + a1 * 2 + a2 + 2) >> 2;
921 44165 DST(3,0) = (a2 + a3 + 1) >> 1;
922 44165 DST(3,1) = (a1 + a2 * 2 + a3 + 2) >> 2;
923 44165 }
924
925 #define def_vert_right(size) \
926 static void vert_right_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
927 const uint8_t *_left, const uint8_t *_top) \
928 { \
929 pixel *dst = (pixel *) _dst; \
930 const pixel *top = (const pixel *) _top; \
931 const pixel *left = (const pixel *) _left; \
932 int i, j; \
933 pixel ve[size + size/2 - 1], vo[size + size/2 - 1]; \
934 \
935 stride /= sizeof(pixel); \
936 for (i = 0; i < size/2 - 2; i++) { \
937 vo[i] = (left[i*2 + 3] + left[i*2 + 2] * 2 + left[i*2 + 1] + 2) >> 2; \
938 ve[i] = (left[i*2 + 4] + left[i*2 + 3] * 2 + left[i*2 + 2] + 2) >> 2; \
939 } \
940 vo[size/2 - 2] = (left[size - 1] + left[size - 2] * 2 + left[size - 3] + 2) >> 2; \
941 ve[size/2 - 2] = (top[-1] + left[size - 1] * 2 + left[size - 2] + 2) >> 2; \
942 \
943 ve[size/2 - 1] = (top[-1] + top[0] + 1) >> 1; \
944 vo[size/2 - 1] = (left[size - 1] + top[-1] * 2 + top[0] + 2) >> 2; \
945 for (i = 0; i < size - 1; i++) { \
946 ve[size/2 + i] = (top[i] + top[i + 1] + 1) >> 1; \
947 vo[size/2 + i] = (top[i - 1] + top[i] * 2 + top[i + 1] + 2) >> 2; \
948 } \
949 \
950 for (j = 0; j < size / 2; j++) { \
951 memcpy(dst + j*2 *stride, ve + size/2 - 1 - j, size * sizeof(pixel)); \
952 memcpy(dst + (j*2 + 1)*stride, vo + size/2 - 1 - j, size * sizeof(pixel)); \
953 } \
954 }
955
956
6/6
✓ Branch 0 taken 9738 times.
✓ Branch 1 taken 4869 times.
✓ Branch 2 taken 34083 times.
✓ Branch 3 taken 4869 times.
✓ Branch 4 taken 19476 times.
✓ Branch 5 taken 4869 times.
68166 def_vert_right(8)
957
6/6
✓ Branch 0 taken 5610 times.
✓ Branch 1 taken 935 times.
✓ Branch 2 taken 14025 times.
✓ Branch 3 taken 935 times.
✓ Branch 4 taken 7480 times.
✓ Branch 5 taken 935 times.
28050 def_vert_right(16)
958
6/6
✓ Branch 0 taken 2646 times.
✓ Branch 1 taken 189 times.
✓ Branch 2 taken 5859 times.
✓ Branch 3 taken 189 times.
✓ Branch 4 taken 3024 times.
✓ Branch 5 taken 189 times.
11718 def_vert_right(32)
959
960 47549 static void hor_down_4x4_c(uint8_t *_dst, ptrdiff_t stride,
961 const uint8_t *_left, const uint8_t *_top)
962 {
963 47549 pixel *dst = (pixel *) _dst;
964 47549 const pixel *top = (const pixel *) _top;
965 47549 const pixel *left = (const pixel *) _left;
966 47549 int l0 = left[3], l1 = left[2], l2 = left[1], l3 = left[0],
967 47549 tl = top[-1], a0 = top[0], a1 = top[1], a2 = top[2];
968
969 47549 stride /= sizeof(pixel);
970 47549 DST(2,0) = (tl + a0 * 2 + a1 + 2) >> 2;
971 47549 DST(3,0) = (a0 + a1 * 2 + a2 + 2) >> 2;
972 47549 DST(0,0) = DST(2,1) = (tl + l0 + 1) >> 1;
973 47549 DST(1,0) = DST(3,1) = (a0 + tl * 2 + l0 + 2) >> 2;
974 47549 DST(0,1) = DST(2,2) = (l0 + l1 + 1) >> 1;
975 47549 DST(1,1) = DST(3,2) = (tl + l0 * 2 + l1 + 2) >> 2;
976 47549 DST(0,2) = DST(2,3) = (l1 + l2 + 1) >> 1;
977 47549 DST(1,2) = DST(3,3) = (l0 + l1 * 2 + l2 + 2) >> 2;
978 47549 DST(0,3) = (l2 + l3 + 1) >> 1;
979 47549 DST(1,3) = (l1 + l2 * 2 + l3 + 2) >> 2;
980 47549 }
981
982 #define def_hor_down(size) \
983 static void hor_down_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
984 const uint8_t *_left, const uint8_t *_top) \
985 { \
986 pixel *dst = (pixel *) _dst; \
987 const pixel *top = (const pixel *) _top; \
988 const pixel *left = (const pixel *) _left; \
989 int i, j; \
990 pixel v[size * 3 - 2]; \
991 \
992 stride /= sizeof(pixel); \
993 for (i = 0; i < size - 2; i++) { \
994 v[i*2 ] = (left[i + 1] + left[i + 0] + 1) >> 1; \
995 v[i*2 + 1] = (left[i + 2] + left[i + 1] * 2 + left[i + 0] + 2) >> 2; \
996 v[size*2 + i] = (top[i - 1] + top[i] * 2 + top[i + 1] + 2) >> 2; \
997 } \
998 v[size*2 - 2] = (top[-1] + left[size - 1] + 1) >> 1; \
999 v[size*2 - 4] = (left[size - 1] + left[size - 2] + 1) >> 1; \
1000 v[size*2 - 1] = (top[0] + top[-1] * 2 + left[size - 1] + 2) >> 2; \
1001 v[size*2 - 3] = (top[-1] + left[size - 1] * 2 + left[size - 2] + 2) >> 2; \
1002 \
1003 for (j = 0; j < size; j++) \
1004 memcpy(dst + j*stride, v + size*2 - 2 - j*2, size * sizeof(pixel)); \
1005 }
1006
1007
4/4
✓ Branch 0 taken 30348 times.
✓ Branch 1 taken 5058 times.
✓ Branch 2 taken 40464 times.
✓ Branch 3 taken 5058 times.
75870 def_hor_down(8)
1008
4/4
✓ Branch 0 taken 8148 times.
✓ Branch 1 taken 582 times.
✓ Branch 2 taken 9312 times.
✓ Branch 3 taken 582 times.
18042 def_hor_down(16)
1009
4/4
✓ Branch 0 taken 2310 times.
✓ Branch 1 taken 77 times.
✓ Branch 2 taken 2464 times.
✓ Branch 3 taken 77 times.
4851 def_hor_down(32)
1010
1011 36962 static void vert_left_4x4_c(uint8_t *_dst, ptrdiff_t stride,
1012 const uint8_t *left, const uint8_t *_top)
1013 {
1014 36962 pixel *dst = (pixel *) _dst;
1015 36962 const pixel *top = (const pixel *) _top;
1016 36962 int a0 = top[0], a1 = top[1], a2 = top[2], a3 = top[3],
1017 36962 a4 = top[4], a5 = top[5], a6 = top[6];
1018
1019 36962 stride /= sizeof(pixel);
1020 36962 DST(0,0) = (a0 + a1 + 1) >> 1;
1021 36962 DST(0,1) = (a0 + a1 * 2 + a2 + 2) >> 2;
1022 36962 DST(1,0) = DST(0,2) = (a1 + a2 + 1) >> 1;
1023 36962 DST(1,1) = DST(0,3) = (a1 + a2 * 2 + a3 + 2) >> 2;
1024 36962 DST(2,0) = DST(1,2) = (a2 + a3 + 1) >> 1;
1025 36962 DST(2,1) = DST(1,3) = (a2 + a3 * 2 + a4 + 2) >> 2;
1026 36962 DST(3,0) = DST(2,2) = (a3 + a4 + 1) >> 1;
1027 36962 DST(3,1) = DST(2,3) = (a3 + a4 * 2 + a5 + 2) >> 2;
1028 36962 DST(3,2) = (a4 + a5 + 1) >> 1;
1029 36962 DST(3,3) = (a4 + a5 * 2 + a6 + 2) >> 2;
1030 36962 }
1031
1032 #define def_vert_left(size) \
1033 static void vert_left_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
1034 const uint8_t *left, const uint8_t *_top) \
1035 { \
1036 pixel *dst = (pixel *) _dst; \
1037 const pixel *top = (const pixel *) _top; \
1038 int i, j; \
1039 pixel ve[size - 1], vo[size - 1]; \
1040 \
1041 stride /= sizeof(pixel); \
1042 for (i = 0; i < size - 2; i++) { \
1043 ve[i] = (top[i] + top[i + 1] + 1) >> 1; \
1044 vo[i] = (top[i] + top[i + 1] * 2 + top[i + 2] + 2) >> 2; \
1045 } \
1046 ve[size - 2] = (top[size - 2] + top[size - 1] + 1) >> 1; \
1047 vo[size - 2] = (top[size - 2] + top[size - 1] * 3 + 2) >> 2; \
1048 \
1049 for (j = 0; j < size / 2; j++) { \
1050 memcpy(dst + j*2 * stride, ve + j, (size - j - 1) * sizeof(pixel)); \
1051 memset_bpc(dst + j*2 * stride + size - j - 1, top[size - 1], j + 1); \
1052 memcpy(dst + (j*2 + 1) * stride, vo + j, (size - j - 1) * sizeof(pixel)); \
1053 memset_bpc(dst + (j*2 + 1) * stride + size - j - 1, top[size - 1], j + 1); \
1054 } \
1055 }
1056
1057
6/6
✓ Branch 0 taken 39852 times.
✓ Branch 1 taken 6642 times.
✓ Branch 2 taken 25396 times.
✓ Branch 3 taken 6349 times.
✓ Branch 4 taken 1172 times.
✓ Branch 5 taken 293 times.
73062 def_vert_left(8)
1058
6/6
✓ Branch 0 taken 19390 times.
✓ Branch 1 taken 1385 times.
✓ Branch 2 taken 10376 times.
✓ Branch 3 taken 1297 times.
✓ Branch 4 taken 704 times.
✓ Branch 5 taken 88 times.
31855 def_vert_left(16)
1059
6/6
✓ Branch 0 taken 5760 times.
✓ Branch 1 taken 192 times.
✓ Branch 2 taken 2912 times.
✓ Branch 3 taken 182 times.
✓ Branch 4 taken 160 times.
✓ Branch 5 taken 10 times.
9024 def_vert_left(32)
1060
1061 51466 static void hor_up_4x4_c(uint8_t *_dst, ptrdiff_t stride,
1062 const uint8_t *_left, const uint8_t *top)
1063 {
1064 51466 pixel *dst = (pixel *) _dst;
1065 51466 const pixel *left = (const pixel *) _left;
1066 51466 int l0 = left[0], l1 = left[1], l2 = left[2], l3 = left[3];
1067
1068 51466 stride /= sizeof(pixel);
1069 51466 DST(0,0) = (l0 + l1 + 1) >> 1;
1070 51466 DST(1,0) = (l0 + l1 * 2 + l2 + 2) >> 2;
1071 51466 DST(0,1) = DST(2,0) = (l1 + l2 + 1) >> 1;
1072 51466 DST(1,1) = DST(3,0) = (l1 + l2 * 2 + l3 + 2) >> 2;
1073 51466 DST(0,2) = DST(2,1) = (l2 + l3 + 1) >> 1;
1074 51466 DST(1,2) = DST(3,1) = (l2 + l3 * 3 + 2) >> 2;
1075 51466 DST(0,3) = DST(1,3) = DST(2,2) = DST(2,3) = DST(3,2) = DST(3,3) = l3;
1076 51466 }
1077
1078 #define def_hor_up(size) \
1079 static void hor_up_##size##x##size##_c(uint8_t *_dst, ptrdiff_t stride, \
1080 const uint8_t *_left, const uint8_t *top) \
1081 { \
1082 pixel *dst = (pixel *) _dst; \
1083 const pixel *left = (const pixel *) _left; \
1084 int i, j; \
1085 pixel v[size*2 - 2]; \
1086 \
1087 stride /= sizeof(pixel); \
1088 for (i = 0; i < size - 2; i++) { \
1089 v[i*2 ] = (left[i] + left[i + 1] + 1) >> 1; \
1090 v[i*2 + 1] = (left[i] + left[i + 1] * 2 + left[i + 2] + 2) >> 2; \
1091 } \
1092 v[size*2 - 4] = (left[size - 2] + left[size - 1] + 1) >> 1; \
1093 v[size*2 - 3] = (left[size - 2] + left[size - 1] * 3 + 2) >> 2; \
1094 \
1095 for (j = 0; j < size / 2; j++) \
1096 memcpy(dst + j*stride, v + j*2, size * sizeof(pixel)); \
1097 for (j = size / 2; j < size; j++) { \
1098 memcpy(dst + j*stride, v + j*2, (size*2 - 2 - j*2) * sizeof(pixel)); \
1099 memset_bpc(dst + j*stride + size*2 - 2 - j*2, left[size - 1], \
1100 2 + j*2 - size); \
1101 } \
1102 }
1103
1104
7/7
✓ Branch 0 taken 58146 times.
✓ Branch 1 taken 9691 times.
✓ Branch 2 taken 38764 times.
✓ Branch 3 taken 9691 times.
✓ Branch 4 taken 37252 times.
✓ Branch 5 taken 10825 times.
✓ Branch 6 taken 378 times.
145365 def_hor_up(8)
1105
7/7
✓ Branch 0 taken 24976 times.
✓ Branch 1 taken 1784 times.
✓ Branch 2 taken 14272 times.
✓ Branch 3 taken 1784 times.
✓ Branch 4 taken 14080 times.
✓ Branch 5 taken 1952 times.
✓ Branch 6 taken 24 times.
55304 def_hor_up(16)
1106
7/7
✓ Branch 0 taken 5040 times.
✓ Branch 1 taken 168 times.
✓ Branch 2 taken 2688 times.
✓ Branch 3 taken 168 times.
✓ Branch 4 taken 2432 times.
✓ Branch 5 taken 408 times.
✓ Branch 6 taken 16 times.
10584 def_hor_up(32)
1107
1108 #undef DST
1109
1110 #endif /* BIT_DEPTH != 12 */
1111
1112 #if BIT_DEPTH != 8
1113 void ff_vp9dsp_intrapred_init_10(VP9DSPContext *dsp);
1114 #endif
1115 #if BIT_DEPTH != 10
1116 static
1117 #endif
1118 749 av_cold void FUNC(ff_vp9dsp_intrapred_init)(VP9DSPContext *dsp)
1119 {
1120 #define init_intra_pred_bd_aware(tx, sz) \
1121 dsp->intra_pred[tx][TM_VP8_PRED] = tm_##sz##_c; \
1122 dsp->intra_pred[tx][DC_128_PRED] = dc_128_##sz##_c; \
1123 dsp->intra_pred[tx][DC_127_PRED] = dc_127_##sz##_c; \
1124 dsp->intra_pred[tx][DC_129_PRED] = dc_129_##sz##_c
1125
1126 #if BIT_DEPTH == 12
1127 75 ff_vp9dsp_intrapred_init_10(dsp);
1128 #define init_intra_pred(tx, sz) \
1129 init_intra_pred_bd_aware(tx, sz)
1130 #else
1131 #define init_intra_pred(tx, sz) \
1132 dsp->intra_pred[tx][VERT_PRED] = vert_##sz##_c; \
1133 dsp->intra_pred[tx][HOR_PRED] = hor_##sz##_c; \
1134 dsp->intra_pred[tx][DC_PRED] = dc_##sz##_c; \
1135 dsp->intra_pred[tx][DIAG_DOWN_LEFT_PRED] = diag_downleft_##sz##_c; \
1136 dsp->intra_pred[tx][DIAG_DOWN_RIGHT_PRED] = diag_downright_##sz##_c; \
1137 dsp->intra_pred[tx][VERT_RIGHT_PRED] = vert_right_##sz##_c; \
1138 dsp->intra_pred[tx][HOR_DOWN_PRED] = hor_down_##sz##_c; \
1139 dsp->intra_pred[tx][VERT_LEFT_PRED] = vert_left_##sz##_c; \
1140 dsp->intra_pred[tx][HOR_UP_PRED] = hor_up_##sz##_c; \
1141 dsp->intra_pred[tx][LEFT_DC_PRED] = dc_left_##sz##_c; \
1142 dsp->intra_pred[tx][TOP_DC_PRED] = dc_top_##sz##_c; \
1143 init_intra_pred_bd_aware(tx, sz)
1144 #endif
1145
1146 749 init_intra_pred(TX_4X4, 4x4);
1147 749 init_intra_pred(TX_8X8, 8x8);
1148 749 init_intra_pred(TX_16X16, 16x16);
1149 749 init_intra_pred(TX_32X32, 32x32);
1150
1151 #undef init_intra_pred
1152 #undef init_intra_pred_bd_aware
1153 749 }
1154
1155 #define itxfm_wrapper(type_a, type_b, sz, bits, has_dconly) \
1156 static void type_a##_##type_b##_##sz##x##sz##_add_c(uint8_t *_dst, \
1157 ptrdiff_t stride, \
1158 int16_t *_block, int eob) \
1159 { \
1160 int i, j; \
1161 pixel *dst = (pixel *) _dst; \
1162 dctcoef *block = (dctcoef *) _block, tmp[sz * sz], out[sz]; \
1163 \
1164 stride /= sizeof(pixel); \
1165 if (has_dconly && eob == 1) { \
1166 const int t = ((((dctint) block[0] * 11585 + (1 << 13)) >> 14) \
1167 * 11585 + (1 << 13)) >> 14; \
1168 block[0] = 0; \
1169 for (i = 0; i < sz; i++) { \
1170 for (j = 0; j < sz; j++) \
1171 dst[j * stride] = av_clip_pixel(dst[j * stride] + \
1172 (bits ? \
1173 (int)(t + (1U << (bits - 1))) >> bits : \
1174 t)); \
1175 dst++; \
1176 } \
1177 return; \
1178 } \
1179 \
1180 for (i = 0; i < sz; i++) \
1181 type_a##sz##_1d(block + i, sz, tmp + i * sz, 0); \
1182 memset(block, 0, sz * sz * sizeof(*block)); \
1183 for (i = 0; i < sz; i++) { \
1184 type_b##sz##_1d(tmp + i, sz, out, 1); \
1185 for (j = 0; j < sz; j++) \
1186 dst[j * stride] = av_clip_pixel(dst[j * stride] + \
1187 (bits ? \
1188 (int)(out[j] + (1U << (bits - 1))) >> bits : \
1189 out[j])); \
1190 dst++; \
1191 } \
1192 }
1193
1194 #define itxfm_wrap(sz, bits) \
1195 itxfm_wrapper(idct, idct, sz, bits, 1) \
1196 itxfm_wrapper(iadst, idct, sz, bits, 0) \
1197 itxfm_wrapper(idct, iadst, sz, bits, 0) \
1198 itxfm_wrapper(iadst, iadst, sz, bits, 0)
1199
1200 #define IN(x) ((dctint) in[(x) * stride])
1201
1202 3507364 static av_always_inline void idct4_1d(const dctcoef *in, ptrdiff_t stride,
1203 dctcoef *out, int pass)
1204 {
1205 dctint t0, t1, t2, t3;
1206
1207 3507364 t0 = ((IN(0) + IN(2)) * 11585 + (1 << 13)) >> 14;
1208 3507364 t1 = ((IN(0) - IN(2)) * 11585 + (1 << 13)) >> 14;
1209 3507364 t2 = (IN(1) * 6270 - IN(3) * 15137 + (1 << 13)) >> 14;
1210 3507364 t3 = (IN(1) * 15137 + IN(3) * 6270 + (1 << 13)) >> 14;
1211
1212 3507364 out[0] = t0 + t3;
1213 3507364 out[1] = t1 + t2;
1214 3507364 out[2] = t1 - t2;
1215 3507364 out[3] = t0 - t3;
1216 3507364 }
1217
1218 1519020 static av_always_inline void iadst4_1d(const dctcoef *in, ptrdiff_t stride,
1219 dctcoef *out, int pass)
1220 {
1221 dctint t0, t1, t2, t3;
1222
1223 1519020 t0 = 5283 * IN(0) + 15212 * IN(2) + 9929 * IN(3);
1224 1519020 t1 = 9929 * IN(0) - 5283 * IN(2) - 15212 * IN(3);
1225 1519020 t2 = 13377 * (IN(0) - IN(2) + IN(3));
1226 1519020 t3 = 13377 * IN(1);
1227
1228 1519020 out[0] = (t0 + t3 + (1 << 13)) >> 14;
1229 1519020 out[1] = (t1 + t3 + (1 << 13)) >> 14;
1230 1519020 out[2] = (t2 + (1 << 13)) >> 14;
1231 1519020 out[3] = (t0 + t1 - t3 + (1 << 13)) >> 14;
1232 1519020 }
1233
1234
13/13
✓ Branch 0 taken 113652 times.
✓ Branch 1 taken 1514444 times.
✓ Branch 2 taken 2113814 times.
✓ Branch 3 taken 454608 times.
✓ Branch 4 taken 5180720 times.
✓ Branch 5 taken 1295180 times.
✓ Branch 6 taken 1181528 times.
✓ Branch 7 taken 1627046 times.
✓ Branch 8 taken 332916 times.
✓ Branch 10 taken 5326656 times.
✓ Branch 11 taken 1331664 times.
✓ Branch 12 taken 1331664 times.
✓ Branch 13 taken 332916 times.
36188284 itxfm_wrap(4, 4)
1235
1236 2969888 static av_always_inline void idct8_1d(const dctcoef *in, ptrdiff_t stride,
1237 dctcoef *out, int pass)
1238 {
1239 dctint t0, t0a, t1, t1a, t2, t2a, t3, t3a, t4, t4a, t5, t5a, t6, t6a, t7, t7a;
1240
1241 2969888 t0a = ((IN(0) + IN(4)) * 11585 + (1 << 13)) >> 14;
1242 2969888 t1a = ((IN(0) - IN(4)) * 11585 + (1 << 13)) >> 14;
1243 2969888 t2a = (IN(2) * 6270 - IN(6) * 15137 + (1 << 13)) >> 14;
1244 2969888 t3a = (IN(2) * 15137 + IN(6) * 6270 + (1 << 13)) >> 14;
1245 2969888 t4a = (IN(1) * 3196 - IN(7) * 16069 + (1 << 13)) >> 14;
1246 2969888 t5a = (IN(5) * 13623 - IN(3) * 9102 + (1 << 13)) >> 14;
1247 2969888 t6a = (IN(5) * 9102 + IN(3) * 13623 + (1 << 13)) >> 14;
1248 2969888 t7a = (IN(1) * 16069 + IN(7) * 3196 + (1 << 13)) >> 14;
1249
1250 2969888 t0 = t0a + t3a;
1251 2969888 t1 = t1a + t2a;
1252 2969888 t2 = t1a - t2a;
1253 2969888 t3 = t0a - t3a;
1254 2969888 t4 = t4a + t5a;
1255 2969888 t5a = t4a - t5a;
1256 2969888 t7 = t7a + t6a;
1257 2969888 t6a = t7a - t6a;
1258
1259 2969888 t5 = ((t6a - t5a) * 11585 + (1 << 13)) >> 14;
1260 2969888 t6 = ((t6a + t5a) * 11585 + (1 << 13)) >> 14;
1261
1262 2969888 out[0] = t0 + t7;
1263 2969888 out[1] = t1 + t6;
1264 2969888 out[2] = t2 + t5;
1265 2969888 out[3] = t3 + t4;
1266 2969888 out[4] = t3 - t4;
1267 2969888 out[5] = t2 - t5;
1268 2969888 out[6] = t1 - t6;
1269 2969888 out[7] = t0 - t7;
1270 2969888 }
1271
1272 761408 static av_always_inline void iadst8_1d(const dctcoef *in, ptrdiff_t stride,
1273 dctcoef *out, int pass)
1274 {
1275 dctint t0, t0a, t1, t1a, t2, t2a, t3, t3a, t4, t4a, t5, t5a, t6, t6a, t7, t7a;
1276
1277 761408 t0a = 16305 * IN(7) + 1606 * IN(0);
1278 761408 t1a = 1606 * IN(7) - 16305 * IN(0);
1279 761408 t2a = 14449 * IN(5) + 7723 * IN(2);
1280 761408 t3a = 7723 * IN(5) - 14449 * IN(2);
1281 761408 t4a = 10394 * IN(3) + 12665 * IN(4);
1282 761408 t5a = 12665 * IN(3) - 10394 * IN(4);
1283 761408 t6a = 4756 * IN(1) + 15679 * IN(6);
1284 761408 t7a = 15679 * IN(1) - 4756 * IN(6);
1285
1286 761408 t0 = (t0a + t4a + (1 << 13)) >> 14;
1287 761408 t1 = (t1a + t5a + (1 << 13)) >> 14;
1288 761408 t2 = (t2a + t6a + (1 << 13)) >> 14;
1289 761408 t3 = (t3a + t7a + (1 << 13)) >> 14;
1290 761408 t4 = (t0a - t4a + (1 << 13)) >> 14;
1291 761408 t5 = (t1a - t5a + (1 << 13)) >> 14;
1292 761408 t6 = (t2a - t6a + (1 << 13)) >> 14;
1293 761408 t7 = (t3a - t7a + (1 << 13)) >> 14;
1294
1295 761408 t4a = 15137U * t4 + 6270U * t5;
1296 761408 t5a = 6270U * t4 - 15137U * t5;
1297 761408 t6a = 15137U * t7 - 6270U * t6;
1298 761408 t7a = 6270U * t7 + 15137U * t6;
1299
1300 761408 out[0] = t0 + t2;
1301 761408 out[7] = -(t1 + t3);
1302 761408 t2 = t0 - t2;
1303 761408 t3 = t1 - t3;
1304
1305 761408 out[1] = -((dctint)((1U << 13) + t4a + t6a) >> 14);
1306 761408 out[6] = (dctint)((1U << 13) + t5a + t7a) >> 14;
1307 761408 t6 = (dctint)((1U << 13) + t4a - t6a) >> 14;
1308 761408 t7 = (dctint)((1U << 13) + t5a - t7a) >> 14;
1309
1310 761408 out[3] = -((dctint)((t2 + t3) * 11585U + (1 << 13)) >> 14);
1311 761408 out[4] = (dctint)((t2 - t3) * 11585U + (1 << 13)) >> 14;
1312 761408 out[2] = (dctint)((t6 + t7) * 11585U + (1 << 13)) >> 14;
1313 761408 out[5] = -((dctint)((t6 - t7) * 11585U + (1 << 13)) >> 14);
1314 761408 }
1315
1316
13/13
✓ Branch 0 taken 34570 times.
✓ Branch 1 taken 755189 times.
✓ Branch 2 taken 2287049 times.
✓ Branch 3 taken 276560 times.
✓ Branch 4 taken 5048976 times.
✓ Branch 5 taken 631122 times.
✓ Branch 6 taken 596552 times.
✓ Branch 7 taken 1343665 times.
✓ Branch 8 taken 158637 times.
✓ Branch 10 taken 10152768 times.
✓ Branch 11 taken 1269096 times.
✓ Branch 12 taken 1269096 times.
✓ Branch 13 taken 158637 times.
42826592 itxfm_wrap(8, 5)
1317
1318 1130576 static av_always_inline void idct16_1d(const dctcoef *in, ptrdiff_t stride,
1319 dctcoef *out, int pass)
1320 {
1321 dctint t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15;
1322 dctint t0a, t1a, t2a, t3a, t4a, t5a, t6a, t7a;
1323 dctint t8a, t9a, t10a, t11a, t12a, t13a, t14a, t15a;
1324
1325 1130576 t0a = (dctint)((IN(0) + IN(8)) * 11585U + (1 << 13)) >> 14;
1326 1130576 t1a = (dctint)((IN(0) - IN(8)) * 11585U + (1 << 13)) >> 14;
1327 1130576 t2a = (dctint)(IN(4) * 6270U - IN(12) * 15137U + (1 << 13)) >> 14;
1328 1130576 t3a = (dctint)(IN(4) * 15137U + IN(12) * 6270U + (1 << 13)) >> 14;
1329 1130576 t4a = (dctint)(IN(2) * 3196U - IN(14) * 16069U + (1 << 13)) >> 14;
1330 1130576 t7a = (dctint)(IN(2) * 16069U + IN(14) * 3196U + (1 << 13)) >> 14;
1331 1130576 t5a = (dctint)(IN(10) * 13623U - IN(6) * 9102U + (1 << 13)) >> 14;
1332 1130576 t6a = (dctint)(IN(10) * 9102U + IN(6) * 13623U + (1 << 13)) >> 14;
1333 1130576 t8a = (dctint)(IN(1) * 1606U - IN(15) * 16305U + (1 << 13)) >> 14;
1334 1130576 t15a = (dctint)(IN(1) * 16305U + IN(15) * 1606U + (1 << 13)) >> 14;
1335 1130576 t9a = (dctint)(IN(9) * 12665U - IN(7) * 10394U + (1 << 13)) >> 14;
1336 1130576 t14a = (dctint)(IN(9) * 10394U + IN(7) * 12665U + (1 << 13)) >> 14;
1337 1130576 t10a = (dctint)(IN(5) * 7723U - IN(11) * 14449U + (1 << 13)) >> 14;
1338 1130576 t13a = (dctint)(IN(5) * 14449U + IN(11) * 7723U + (1 << 13)) >> 14;
1339 1130576 t11a = (dctint)(IN(13) * 15679U - IN(3) * 4756U + (1 << 13)) >> 14;
1340 1130576 t12a = (dctint)(IN(13) * 4756U + IN(3) * 15679U + (1 << 13)) >> 14;
1341
1342 1130576 t0 = t0a + t3a;
1343 1130576 t1 = t1a + t2a;
1344 1130576 t2 = t1a - t2a;
1345 1130576 t3 = t0a - t3a;
1346 1130576 t4 = t4a + t5a;
1347 1130576 t5 = t4a - t5a;
1348 1130576 t6 = t7a - t6a;
1349 1130576 t7 = t7a + t6a;
1350 1130576 t8 = t8a + t9a;
1351 1130576 t9 = t8a - t9a;
1352 1130576 t10 = t11a - t10a;
1353 1130576 t11 = t11a + t10a;
1354 1130576 t12 = t12a + t13a;
1355 1130576 t13 = t12a - t13a;
1356 1130576 t14 = t15a - t14a;
1357 1130576 t15 = t15a + t14a;
1358
1359 1130576 t5a = (dctint)((t6 - t5) * 11585U + (1 << 13)) >> 14;
1360 1130576 t6a = (dctint)((t6 + t5) * 11585U + (1 << 13)) >> 14;
1361 1130576 t9a = (dctint)( t14 * 6270U - t9 * 15137U + (1 << 13)) >> 14;
1362 1130576 t14a = (dctint)( t14 * 15137U + t9 * 6270U + (1 << 13)) >> 14;
1363 1130576 t10a = (dctint)(-(t13 * 15137U + t10 * 6270U) + (1 << 13)) >> 14;
1364 1130576 t13a = (dctint)( t13 * 6270U - t10 * 15137U + (1 << 13)) >> 14;
1365
1366 1130576 t0a = t0 + t7;
1367 1130576 t1a = t1 + t6a;
1368 1130576 t2a = t2 + t5a;
1369 1130576 t3a = t3 + t4;
1370 1130576 t4 = t3 - t4;
1371 1130576 t5 = t2 - t5a;
1372 1130576 t6 = t1 - t6a;
1373 1130576 t7 = t0 - t7;
1374 1130576 t8a = t8 + t11;
1375 1130576 t9 = t9a + t10a;
1376 1130576 t10 = t9a - t10a;
1377 1130576 t11a = t8 - t11;
1378 1130576 t12a = t15 - t12;
1379 1130576 t13 = t14a - t13a;
1380 1130576 t14 = t14a + t13a;
1381 1130576 t15a = t15 + t12;
1382
1383 1130576 t10a = (dctint)((t13 - t10) * 11585U + (1 << 13)) >> 14;
1384 1130576 t13a = (dctint)((t13 + t10) * 11585U + (1 << 13)) >> 14;
1385 1130576 t11 = (dctint)((t12a - t11a) * 11585U + (1 << 13)) >> 14;
1386 1130576 t12 = (dctint)((t12a + t11a) * 11585U + (1 << 13)) >> 14;
1387
1388 1130576 out[ 0] = t0a + t15a;
1389 1130576 out[ 1] = t1a + t14;
1390 1130576 out[ 2] = t2a + t13a;
1391 1130576 out[ 3] = t3a + t12;
1392 1130576 out[ 4] = t4 + t11;
1393 1130576 out[ 5] = t5 + t10a;
1394 1130576 out[ 6] = t6 + t9;
1395 1130576 out[ 7] = t7 + t8a;
1396 1130576 out[ 8] = t7 - t8a;
1397 1130576 out[ 9] = t6 - t9;
1398 1130576 out[10] = t5 - t10a;
1399 1130576 out[11] = t4 - t11;
1400 1130576 out[12] = t3a - t12;
1401 1130576 out[13] = t2a - t13a;
1402 1130576 out[14] = t1a - t14;
1403 1130576 out[15] = t0a - t15a;
1404 1130576 }
1405
1406 334320 static av_always_inline void iadst16_1d(const dctcoef *in, ptrdiff_t stride,
1407 dctcoef *out, int pass)
1408 {
1409 dctint t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15;
1410 dctint t0a, t1a, t2a, t3a, t4a, t5a, t6a, t7a;
1411 dctint t8a, t9a, t10a, t11a, t12a, t13a, t14a, t15a;
1412
1413 334320 t0 = IN(15) * 16364U + IN(0) * 804U;
1414 334320 t1 = IN(15) * 804U - IN(0) * 16364U;
1415 334320 t2 = IN(13) * 15893U + IN(2) * 3981U;
1416 334320 t3 = IN(13) * 3981U - IN(2) * 15893U;
1417 334320 t4 = IN(11) * 14811U + IN(4) * 7005U;
1418 334320 t5 = IN(11) * 7005U - IN(4) * 14811U;
1419 334320 t6 = IN(9) * 13160U + IN(6) * 9760U;
1420 334320 t7 = IN(9) * 9760U - IN(6) * 13160U;
1421 334320 t8 = IN(7) * 11003U + IN(8) * 12140U;
1422 334320 t9 = IN(7) * 12140U - IN(8) * 11003U;
1423 334320 t10 = IN(5) * 8423U + IN(10) * 14053U;
1424 334320 t11 = IN(5) * 14053U - IN(10) * 8423U;
1425 334320 t12 = IN(3) * 5520U + IN(12) * 15426U;
1426 334320 t13 = IN(3) * 15426U - IN(12) * 5520U;
1427 334320 t14 = IN(1) * 2404U + IN(14) * 16207U;
1428 334320 t15 = IN(1) * 16207U - IN(14) * 2404U;
1429
1430 334320 t0a = (dctint)((1U << 13) + t0 + t8 ) >> 14;
1431 334320 t1a = (dctint)((1U << 13) + t1 + t9 ) >> 14;
1432 334320 t2a = (dctint)((1U << 13) + t2 + t10) >> 14;
1433 334320 t3a = (dctint)((1U << 13) + t3 + t11) >> 14;
1434 334320 t4a = (dctint)((1U << 13) + t4 + t12) >> 14;
1435 334320 t5a = (dctint)((1U << 13) + t5 + t13) >> 14;
1436 334320 t6a = (dctint)((1U << 13) + t6 + t14) >> 14;
1437 334320 t7a = (dctint)((1U << 13) + t7 + t15) >> 14;
1438 334320 t8a = (dctint)((1U << 13) + t0 - t8 ) >> 14;
1439 334320 t9a = (dctint)((1U << 13) + t1 - t9 ) >> 14;
1440 334320 t10a = (dctint)((1U << 13) + t2 - t10) >> 14;
1441 334320 t11a = (dctint)((1U << 13) + t3 - t11) >> 14;
1442 334320 t12a = (dctint)((1U << 13) + t4 - t12) >> 14;
1443 334320 t13a = (dctint)((1U << 13) + t5 - t13) >> 14;
1444 334320 t14a = (dctint)((1U << 13) + t6 - t14) >> 14;
1445 334320 t15a = (dctint)((1U << 13) + t7 - t15) >> 14;
1446
1447 334320 t8 = t8a * 16069U + t9a * 3196U;
1448 334320 t9 = t8a * 3196U - t9a * 16069U;
1449 334320 t10 = t10a * 9102U + t11a * 13623U;
1450 334320 t11 = t10a * 13623U - t11a * 9102U;
1451 334320 t12 = t13a * 16069U - t12a * 3196U;
1452 334320 t13 = t13a * 3196U + t12a * 16069U;
1453 334320 t14 = t15a * 9102U - t14a * 13623U;
1454 334320 t15 = t15a * 13623U + t14a * 9102U;
1455
1456 334320 t0 = t0a + t4a;
1457 334320 t1 = t1a + t5a;
1458 334320 t2 = t2a + t6a;
1459 334320 t3 = t3a + t7a;
1460 334320 t4 = t0a - t4a;
1461 334320 t5 = t1a - t5a;
1462 334320 t6 = t2a - t6a;
1463 334320 t7 = t3a - t7a;
1464 334320 t8a = (dctint)((1U << 13) + t8 + t12) >> 14;
1465 334320 t9a = (dctint)((1U << 13) + t9 + t13) >> 14;
1466 334320 t10a = (dctint)((1U << 13) + t10 + t14) >> 14;
1467 334320 t11a = (dctint)((1U << 13) + t11 + t15) >> 14;
1468 334320 t12a = (dctint)((1U << 13) + t8 - t12) >> 14;
1469 334320 t13a = (dctint)((1U << 13) + t9 - t13) >> 14;
1470 334320 t14a = (dctint)((1U << 13) + t10 - t14) >> 14;
1471 334320 t15a = (dctint)((1U << 13) + t11 - t15) >> 14;
1472
1473 334320 t4a = t4 * 15137U + t5 * 6270U;
1474 334320 t5a = t4 * 6270U - t5 * 15137U;
1475 334320 t6a = t7 * 15137U - t6 * 6270U;
1476 334320 t7a = t7 * 6270U + t6 * 15137U;
1477 334320 t12 = t12a * 15137U + t13a * 6270U;
1478 334320 t13 = t12a * 6270U - t13a * 15137U;
1479 334320 t14 = t15a * 15137U - t14a * 6270U;
1480 334320 t15 = t15a * 6270U + t14a * 15137U;
1481
1482 334320 out[ 0] = t0 + t2;
1483 334320 out[15] = -(t1 + t3);
1484 334320 t2a = t0 - t2;
1485 334320 t3a = t1 - t3;
1486 334320 out[ 3] = -((dctint)((1U << 13) + t4a + t6a) >> 14);
1487 334320 out[12] = (dctint)((1U << 13) + t5a + t7a) >> 14;
1488 334320 t6 = (dctint)((1U << 13) + t4a - t6a) >> 14;
1489 334320 t7 = (dctint)((1U << 13) + t5a - t7a) >> 14;
1490 334320 out[ 1] = -(t8a + t10a);
1491 334320 out[14] = t9a + t11a;
1492 334320 t10 = t8a - t10a;
1493 334320 t11 = t9a - t11a;
1494 334320 out[ 2] = (dctint)((1U << 13) + t12 + t14) >> 14;
1495 334320 out[13] = -((dctint)((1U << 13) + t13 + t15) >> 14);
1496 334320 t14a = (dctint)((1U << 13) + t12 - t14) >> 14;
1497 334320 t15a = (dctint)((1U << 13) + t13 - t15) >> 14;
1498
1499 334320 out[ 7] = (dctint)(-(t2a + t3a) * 11585U + (1 << 13)) >> 14;
1500 334320 out[ 8] = (dctint)( (t2a - t3a) * 11585U + (1 << 13)) >> 14;
1501 334320 out[ 4] = (dctint)( (t7 + t6) * 11585U + (1 << 13)) >> 14;
1502 334320 out[11] = (dctint)( (t7 - t6) * 11585U + (1 << 13)) >> 14;
1503 334320 out[ 6] = (dctint)( (t11 + t10) * 11585U + (1 << 13)) >> 14;
1504 334320 out[ 9] = (dctint)( (t11 - t10) * 11585U + (1 << 13)) >> 14;
1505 334320 out[ 5] = (dctint)(-(t14a + t15a) * 11585U + (1 << 13)) >> 14;
1506 334320 out[10] = (dctint)( (t14a - t15a) * 11585U + (1 << 13)) >> 14;
1507 334320 }
1508
1509
13/13
✓ Branch 0 taken 8406 times.
✓ Branch 1 taken 309838 times.
✓ Branch 2 taken 2169540 times.
✓ Branch 3 taken 134496 times.
✓ Branch 4 taken 4641120 times.
✓ Branch 5 taken 290070 times.
✓ Branch 6 taken 281664 times.
✓ Branch 7 taken 468388 times.
✓ Branch 8 taken 28174 times.
✓ Branch 10 taken 7212544 times.
✓ Branch 11 taken 450784 times.
✓ Branch 12 taken 450784 times.
✓ Branch 13 taken 28174 times.
31049360 itxfm_wrap(16, 6)
1510
1511 649536 static av_always_inline void idct32_1d(const dctcoef *in, ptrdiff_t stride,
1512 dctcoef *out, int pass)
1513 {
1514 649536 dctint t0a = (dctint)((IN(0) + IN(16)) * 11585U + (1 << 13)) >> 14;
1515 649536 dctint t1a = (dctint)((IN(0) - IN(16)) * 11585U + (1 << 13)) >> 14;
1516 649536 dctint t2a = (dctint)(IN( 8) * 6270U - IN(24) * 15137U + (1 << 13)) >> 14;
1517 649536 dctint t3a = (dctint)(IN( 8) * 15137U + IN(24) * 6270U + (1 << 13)) >> 14;
1518 649536 dctint t4a = (dctint)(IN( 4) * 3196U - IN(28) * 16069U + (1 << 13)) >> 14;
1519 649536 dctint t7a = (dctint)(IN( 4) * 16069U + IN(28) * 3196U + (1 << 13)) >> 14;
1520 649536 dctint t5a = (dctint)(IN(20) * 13623U - IN(12) * 9102U + (1 << 13)) >> 14;
1521 649536 dctint t6a = (dctint)(IN(20) * 9102U + IN(12) * 13623U + (1 << 13)) >> 14;
1522 649536 dctint t8a = (dctint)(IN( 2) * 1606U - IN(30) * 16305U + (1 << 13)) >> 14;
1523 649536 dctint t15a = (dctint)(IN( 2) * 16305U + IN(30) * 1606U + (1 << 13)) >> 14;
1524 649536 dctint t9a = (dctint)(IN(18) * 12665U - IN(14) * 10394U + (1 << 13)) >> 14;
1525 649536 dctint t14a = (dctint)(IN(18) * 10394U + IN(14) * 12665U + (1 << 13)) >> 14;
1526 649536 dctint t10a = (dctint)(IN(10) * 7723U - IN(22) * 14449U + (1 << 13)) >> 14;
1527 649536 dctint t13a = (dctint)(IN(10) * 14449U + IN(22) * 7723U + (1 << 13)) >> 14;
1528 649536 dctint t11a = (dctint)(IN(26) * 15679U - IN( 6) * 4756U + (1 << 13)) >> 14;
1529 649536 dctint t12a = (dctint)(IN(26) * 4756U + IN( 6) * 15679U + (1 << 13)) >> 14;
1530 649536 dctint t16a = (dctint)(IN( 1) * 804U - IN(31) * 16364U + (1 << 13)) >> 14;
1531 649536 dctint t31a = (dctint)(IN( 1) * 16364U + IN(31) * 804U + (1 << 13)) >> 14;
1532 649536 dctint t17a = (dctint)(IN(17) * 12140U - IN(15) * 11003U + (1 << 13)) >> 14;
1533 649536 dctint t30a = (dctint)(IN(17) * 11003U + IN(15) * 12140U + (1 << 13)) >> 14;
1534 649536 dctint t18a = (dctint)(IN( 9) * 7005U - IN(23) * 14811U + (1 << 13)) >> 14;
1535 649536 dctint t29a = (dctint)(IN( 9) * 14811U + IN(23) * 7005U + (1 << 13)) >> 14;
1536 649536 dctint t19a = (dctint)(IN(25) * 15426U - IN( 7) * 5520U + (1 << 13)) >> 14;
1537 649536 dctint t28a = (dctint)(IN(25) * 5520U + IN( 7) * 15426U + (1 << 13)) >> 14;
1538 649536 dctint t20a = (dctint)(IN( 5) * 3981U - IN(27) * 15893U + (1 << 13)) >> 14;
1539 649536 dctint t27a = (dctint)(IN( 5) * 15893U + IN(27) * 3981U + (1 << 13)) >> 14;
1540 649536 dctint t21a = (dctint)(IN(21) * 14053U - IN(11) * 8423U + (1 << 13)) >> 14;
1541 649536 dctint t26a = (dctint)(IN(21) * 8423U + IN(11) * 14053U + (1 << 13)) >> 14;
1542 649536 dctint t22a = (dctint)(IN(13) * 9760U - IN(19) * 13160U + (1 << 13)) >> 14;
1543 649536 dctint t25a = (dctint)(IN(13) * 13160U + IN(19) * 9760U + (1 << 13)) >> 14;
1544 649536 dctint t23a = (dctint)(IN(29) * 16207U - IN( 3) * 2404U + (1 << 13)) >> 14;
1545 649536 dctint t24a = (dctint)(IN(29) * 2404U + IN( 3) * 16207U + (1 << 13)) >> 14;
1546
1547 649536 dctint t0 = t0a + t3a;
1548 649536 dctint t1 = t1a + t2a;
1549 649536 dctint t2 = t1a - t2a;
1550 649536 dctint t3 = t0a - t3a;
1551 649536 dctint t4 = t4a + t5a;
1552 649536 dctint t5 = t4a - t5a;
1553 649536 dctint t6 = t7a - t6a;
1554 649536 dctint t7 = t7a + t6a;
1555 649536 dctint t8 = t8a + t9a;
1556 649536 dctint t9 = t8a - t9a;
1557 649536 dctint t10 = t11a - t10a;
1558 649536 dctint t11 = t11a + t10a;
1559 649536 dctint t12 = t12a + t13a;
1560 649536 dctint t13 = t12a - t13a;
1561 649536 dctint t14 = t15a - t14a;
1562 649536 dctint t15 = t15a + t14a;
1563 649536 dctint t16 = t16a + t17a;
1564 649536 dctint t17 = t16a - t17a;
1565 649536 dctint t18 = t19a - t18a;
1566 649536 dctint t19 = t19a + t18a;
1567 649536 dctint t20 = t20a + t21a;
1568 649536 dctint t21 = t20a - t21a;
1569 649536 dctint t22 = t23a - t22a;
1570 649536 dctint t23 = t23a + t22a;
1571 649536 dctint t24 = t24a + t25a;
1572 649536 dctint t25 = t24a - t25a;
1573 649536 dctint t26 = t27a - t26a;
1574 649536 dctint t27 = t27a + t26a;
1575 649536 dctint t28 = t28a + t29a;
1576 649536 dctint t29 = t28a - t29a;
1577 649536 dctint t30 = t31a - t30a;
1578 649536 dctint t31 = t31a + t30a;
1579
1580 649536 t5a = (dctint)((t6 - t5) * 11585U + (1 << 13)) >> 14;
1581 649536 t6a = (dctint)((t6 + t5) * 11585U + (1 << 13)) >> 14;
1582 649536 t9a = (dctint)( t14 * 6270U - t9 * 15137U + (1 << 13)) >> 14;
1583 649536 t14a = (dctint)( t14 * 15137U + t9 * 6270U + (1 << 13)) >> 14;
1584 649536 t10a = (dctint)(-(t13 * 15137U + t10 * 6270U) + (1 << 13)) >> 14;
1585 649536 t13a = (dctint)( t13 * 6270U - t10 * 15137U + (1 << 13)) >> 14;
1586 649536 t17a = (dctint)( t30 * 3196U - t17 * 16069U + (1 << 13)) >> 14;
1587 649536 t30a = (dctint)( t30 * 16069U + t17 * 3196U + (1 << 13)) >> 14;
1588 649536 t18a = (dctint)(-(t29 * 16069U + t18 * 3196U) + (1 << 13)) >> 14;
1589 649536 t29a = (dctint)( t29 * 3196U - t18 * 16069U + (1 << 13)) >> 14;
1590 649536 t21a = (dctint)( t26 * 13623U - t21 * 9102U + (1 << 13)) >> 14;
1591 649536 t26a = (dctint)( t26 * 9102U + t21 * 13623U + (1 << 13)) >> 14;
1592 649536 t22a = (dctint)(-(t25 * 9102U + t22 * 13623U) + (1 << 13)) >> 14;
1593 649536 t25a = (dctint)( t25 * 13623U - t22 * 9102U + (1 << 13)) >> 14;
1594
1595 649536 t0a = t0 + t7;
1596 649536 t1a = t1 + t6a;
1597 649536 t2a = t2 + t5a;
1598 649536 t3a = t3 + t4;
1599 649536 t4a = t3 - t4;
1600 649536 t5 = t2 - t5a;
1601 649536 t6 = t1 - t6a;
1602 649536 t7a = t0 - t7;
1603 649536 t8a = t8 + t11;
1604 649536 t9 = t9a + t10a;
1605 649536 t10 = t9a - t10a;
1606 649536 t11a = t8 - t11;
1607 649536 t12a = t15 - t12;
1608 649536 t13 = t14a - t13a;
1609 649536 t14 = t14a + t13a;
1610 649536 t15a = t15 + t12;
1611 649536 t16a = t16 + t19;
1612 649536 t17 = t17a + t18a;
1613 649536 t18 = t17a - t18a;
1614 649536 t19a = t16 - t19;
1615 649536 t20a = t23 - t20;
1616 649536 t21 = t22a - t21a;
1617 649536 t22 = t22a + t21a;
1618 649536 t23a = t23 + t20;
1619 649536 t24a = t24 + t27;
1620 649536 t25 = t25a + t26a;
1621 649536 t26 = t25a - t26a;
1622 649536 t27a = t24 - t27;
1623 649536 t28a = t31 - t28;
1624 649536 t29 = t30a - t29a;
1625 649536 t30 = t30a + t29a;
1626 649536 t31a = t31 + t28;
1627
1628 649536 t10a = (dctint)((t13 - t10) * 11585U + (1 << 13)) >> 14;
1629 649536 t13a = (dctint)((t13 + t10) * 11585U + (1 << 13)) >> 14;
1630 649536 t11 = (dctint)((t12a - t11a) * 11585U + (1 << 13)) >> 14;
1631 649536 t12 = (dctint)((t12a + t11a) * 11585U + (1 << 13)) >> 14;
1632 649536 t18a = (dctint)( t29 * 6270U - t18 * 15137U + (1 << 13)) >> 14;
1633 649536 t29a = (dctint)( t29 * 15137U + t18 * 6270U + (1 << 13)) >> 14;
1634 649536 t19 = (dctint)( t28a * 6270U - t19a * 15137U + (1 << 13)) >> 14;
1635 649536 t28 = (dctint)( t28a * 15137U + t19a * 6270U + (1 << 13)) >> 14;
1636 649536 t20 = (dctint)(-(t27a * 15137U + t20a * 6270U) + (1 << 13)) >> 14;
1637 649536 t27 = (dctint)( t27a * 6270U - t20a * 15137U + (1 << 13)) >> 14;
1638 649536 t21a = (dctint)(-(t26 * 15137U + t21 * 6270U) + (1 << 13)) >> 14;
1639 649536 t26a = (dctint)( t26 * 6270U - t21 * 15137U + (1 << 13)) >> 14;
1640
1641 649536 t0 = t0a + t15a;
1642 649536 t1 = t1a + t14;
1643 649536 t2 = t2a + t13a;
1644 649536 t3 = t3a + t12;
1645 649536 t4 = t4a + t11;
1646 649536 t5a = t5 + t10a;
1647 649536 t6a = t6 + t9;
1648 649536 t7 = t7a + t8a;
1649 649536 t8 = t7a - t8a;
1650 649536 t9a = t6 - t9;
1651 649536 t10 = t5 - t10a;
1652 649536 t11a = t4a - t11;
1653 649536 t12a = t3a - t12;
1654 649536 t13 = t2a - t13a;
1655 649536 t14a = t1a - t14;
1656 649536 t15 = t0a - t15a;
1657 649536 t16 = t16a + t23a;
1658 649536 t17a = t17 + t22;
1659 649536 t18 = t18a + t21a;
1660 649536 t19a = t19 + t20;
1661 649536 t20a = t19 - t20;
1662 649536 t21 = t18a - t21a;
1663 649536 t22a = t17 - t22;
1664 649536 t23 = t16a - t23a;
1665 649536 t24 = t31a - t24a;
1666 649536 t25a = t30 - t25;
1667 649536 t26 = t29a - t26a;
1668 649536 t27a = t28 - t27;
1669 649536 t28a = t28 + t27;
1670 649536 t29 = t29a + t26a;
1671 649536 t30a = t30 + t25;
1672 649536 t31 = t31a + t24a;
1673
1674 649536 t20 = (dctint)((t27a - t20a) * 11585U + (1 << 13)) >> 14;
1675 649536 t27 = (dctint)((t27a + t20a) * 11585U + (1 << 13)) >> 14;
1676 649536 t21a = (dctint)((t26 - t21 ) * 11585U + (1 << 13)) >> 14;
1677 649536 t26a = (dctint)((t26 + t21 ) * 11585U + (1 << 13)) >> 14;
1678 649536 t22 = (dctint)((t25a - t22a) * 11585U + (1 << 13)) >> 14;
1679 649536 t25 = (dctint)((t25a + t22a) * 11585U + (1 << 13)) >> 14;
1680 649536 t23a = (dctint)((t24 - t23 ) * 11585U + (1 << 13)) >> 14;
1681 649536 t24a = (dctint)((t24 + t23 ) * 11585U + (1 << 13)) >> 14;
1682
1683 649536 out[ 0] = t0 + t31;
1684 649536 out[ 1] = t1 + t30a;
1685 649536 out[ 2] = t2 + t29;
1686 649536 out[ 3] = t3 + t28a;
1687 649536 out[ 4] = t4 + t27;
1688 649536 out[ 5] = t5a + t26a;
1689 649536 out[ 6] = t6a + t25;
1690 649536 out[ 7] = t7 + t24a;
1691 649536 out[ 8] = t8 + t23a;
1692 649536 out[ 9] = t9a + t22;
1693 649536 out[10] = t10 + t21a;
1694 649536 out[11] = t11a + t20;
1695 649536 out[12] = t12a + t19a;
1696 649536 out[13] = t13 + t18;
1697 649536 out[14] = t14a + t17a;
1698 649536 out[15] = t15 + t16;
1699 649536 out[16] = t15 - t16;
1700 649536 out[17] = t14a - t17a;
1701 649536 out[18] = t13 - t18;
1702 649536 out[19] = t12a - t19a;
1703 649536 out[20] = t11a - t20;
1704 649536 out[21] = t10 - t21a;
1705 649536 out[22] = t9a - t22;
1706 649536 out[23] = t8 - t23a;
1707 649536 out[24] = t7 - t24a;
1708 649536 out[25] = t6a - t25;
1709 649536 out[26] = t5a - t26a;
1710 649536 out[27] = t4 - t27;
1711 649536 out[28] = t3 - t28a;
1712 649536 out[29] = t2 - t29;
1713 649536 out[30] = t1 - t30a;
1714 649536 out[31] = t0 - t31;
1715 649536 }
1716
1717
12/12
✓ Branch 0 taken 3419 times.
✓ Branch 1 taken 10149 times.
✓ Branch 2 taken 3501056 times.
✓ Branch 3 taken 109408 times.
✓ Branch 4 taken 109408 times.
✓ Branch 5 taken 3419 times.
✓ Branch 7 taken 324768 times.
✓ Branch 8 taken 10149 times.
✓ Branch 10 taken 10392576 times.
✓ Branch 11 taken 324768 times.
✓ Branch 12 taken 324768 times.
✓ Branch 13 taken 10149 times.
14666144 itxfm_wrapper(idct, idct, 32, 6, 1)
1718
1719 229432 static av_always_inline void iwht4_1d(const dctcoef *in, ptrdiff_t stride,
1720 dctcoef *out, int pass)
1721 {
1722 int t0, t1, t2, t3, t4;
1723
1724
2/2
✓ Branch 0 taken 114716 times.
✓ Branch 1 taken 114716 times.
229432 if (pass == 0) {
1725 114716 t0 = IN(0) >> 2;
1726 114716 t1 = IN(3) >> 2;
1727 114716 t2 = IN(1) >> 2;
1728 114716 t3 = IN(2) >> 2;
1729 } else {
1730 114716 t0 = IN(0);
1731 114716 t1 = IN(3);
1732 114716 t2 = IN(1);
1733 114716 t3 = IN(2);
1734 }
1735
1736 229432 t0 += t2;
1737 229432 t3 -= t1;
1738 229432 t4 = (t0 - t3) >> 1;
1739 229432 t1 = t4 - t1;
1740 229432 t2 = t4 - t2;
1741 229432 t0 -= t1;
1742 229432 t3 += t2;
1743
1744 229432 out[0] = t0;
1745 229432 out[1] = t1;
1746 229432 out[2] = t2;
1747 229432 out[3] = t3;
1748 229432 }
1749
1750
6/6
✓ Branch 1 taken 114716 times.
✓ Branch 2 taken 28679 times.
✓ Branch 4 taken 458864 times.
✓ Branch 5 taken 114716 times.
✓ Branch 6 taken 114716 times.
✓ Branch 7 taken 28679 times.
716975 itxfm_wrapper(iwht, iwht, 4, 0, 0)
1751
1752 #undef IN
1753 #undef itxfm_wrapper
1754 #undef itxfm_wrap
1755
1756 674 static av_cold void vp9dsp_itxfm_init(VP9DSPContext *dsp)
1757 {
1758 #define init_itxfm(tx, sz) \
1759 dsp->itxfm_add[tx][DCT_DCT] = idct_idct_##sz##_add_c; \
1760 dsp->itxfm_add[tx][DCT_ADST] = iadst_idct_##sz##_add_c; \
1761 dsp->itxfm_add[tx][ADST_DCT] = idct_iadst_##sz##_add_c; \
1762 dsp->itxfm_add[tx][ADST_ADST] = iadst_iadst_##sz##_add_c
1763
1764 #define init_idct(tx, nm) \
1765 dsp->itxfm_add[tx][DCT_DCT] = \
1766 dsp->itxfm_add[tx][ADST_DCT] = \
1767 dsp->itxfm_add[tx][DCT_ADST] = \
1768 dsp->itxfm_add[tx][ADST_ADST] = nm##_add_c
1769
1770 674 init_itxfm(TX_4X4, 4x4);
1771 674 init_itxfm(TX_8X8, 8x8);
1772 674 init_itxfm(TX_16X16, 16x16);
1773 674 init_idct(TX_32X32, idct_idct_32x32);
1774 674 init_idct(4 /* lossless */, iwht_iwht_4x4);
1775
1776 #undef init_itxfm
1777 #undef init_idct
1778 674 }
1779
1780 4996710 static av_always_inline void loop_filter(pixel *dst, int E, int I, int H,
1781 ptrdiff_t stridea, ptrdiff_t strideb,
1782 int wd)
1783 {
1784 4996710 int i, F = 1 << (BIT_DEPTH - 8);
1785
1786 4996710 E <<= (BIT_DEPTH - 8);
1787 4996710 I <<= (BIT_DEPTH - 8);
1788 4996710 H <<= (BIT_DEPTH - 8);
1789
2/2
✓ Branch 0 taken 39973680 times.
✓ Branch 1 taken 4996710 times.
44970390 for (i = 0; i < 8; i++, dst += stridea) {
1790 int p7, p6, p5, p4;
1791 39973680 int p3 = dst[strideb * -4], p2 = dst[strideb * -3];
1792 39973680 int p1 = dst[strideb * -2], p0 = dst[strideb * -1];
1793 39973680 int q0 = dst[strideb * +0], q1 = dst[strideb * +1];
1794 39973680 int q2 = dst[strideb * +2], q3 = dst[strideb * +3];
1795 int q4, q5, q6, q7;
1796
2/2
✓ Branch 0 taken 33441230 times.
✓ Branch 1 taken 2003670 times.
35444900 int fm = FFABS(p3 - p2) <= I && FFABS(p2 - p1) <= I &&
1797
4/4
✓ Branch 0 taken 32270870 times.
✓ Branch 1 taken 1170360 times.
✓ Branch 2 taken 30904412 times.
✓ Branch 3 taken 1366458 times.
33441230 FFABS(p1 - p0) <= I && FFABS(q1 - q0) <= I &&
1798
6/6
✓ Branch 0 taken 35444900 times.
✓ Branch 1 taken 4528780 times.
✓ Branch 2 taken 30068454 times.
✓ Branch 3 taken 835958 times.
✓ Branch 4 taken 29487645 times.
✓ Branch 5 taken 580809 times.
104906225 FFABS(q2 - q1) <= I && FFABS(q3 - q2) <= I &&
1799
2/2
✓ Branch 0 taken 29102155 times.
✓ Branch 1 taken 385490 times.
29487645 FFABS(p0 - q0) * 2 + (FFABS(p1 - q1) >> 1) <= E;
1800 int flat8out, flat8in;
1801
1802
2/2
✓ Branch 0 taken 10871525 times.
✓ Branch 1 taken 29102155 times.
39973680 if (!fm)
1803 10871525 continue;
1804
1805
2/2
✓ Branch 0 taken 8619979 times.
✓ Branch 1 taken 20482176 times.
29102155 if (wd >= 16) {
1806 8619979 p7 = dst[strideb * -8];
1807 8619979 p6 = dst[strideb * -7];
1808 8619979 p5 = dst[strideb * -6];
1809 8619979 p4 = dst[strideb * -5];
1810 8619979 q4 = dst[strideb * +4];
1811 8619979 q5 = dst[strideb * +5];
1812 8619979 q6 = dst[strideb * +6];
1813 8619979 q7 = dst[strideb * +7];
1814
1815
2/2
✓ Branch 0 taken 3512551 times.
✓ Branch 1 taken 246819 times.
12379349 flat8out = FFABS(p7 - p0) <= F && FFABS(p6 - p0) <= F &&
1816
4/4
✓ Branch 0 taken 3412022 times.
✓ Branch 1 taken 100529 times.
✓ Branch 2 taken 3355779 times.
✓ Branch 3 taken 56243 times.
3512551 FFABS(p5 - p0) <= F && FFABS(p4 - p0) <= F &&
1817
4/4
✓ Branch 0 taken 2852287 times.
✓ Branch 1 taken 503492 times.
✓ Branch 2 taken 2731550 times.
✓ Branch 3 taken 120737 times.
3355779 FFABS(q4 - q0) <= F && FFABS(q5 - q0) <= F &&
1818
6/6
✓ Branch 0 taken 3759370 times.
✓ Branch 1 taken 4860609 times.
✓ Branch 2 taken 2626731 times.
✓ Branch 3 taken 104819 times.
✓ Branch 4 taken 2532582 times.
✓ Branch 5 taken 94149 times.
12379349 FFABS(q6 - q0) <= F && FFABS(q7 - q0) <= F;
1819 }
1820
1821
2/2
✓ Branch 0 taken 19971256 times.
✓ Branch 1 taken 9130899 times.
29102155 if (wd >= 8)
1822
2/2
✓ Branch 0 taken 11060909 times.
✓ Branch 1 taken 522752 times.
31554917 flat8in = FFABS(p3 - p0) <= F && FFABS(p2 - p0) <= F &&
1823
4/4
✓ Branch 0 taken 10947564 times.
✓ Branch 1 taken 113345 times.
✓ Branch 2 taken 10101974 times.
✓ Branch 3 taken 845590 times.
11060909 FFABS(p1 - p0) <= F && FFABS(q1 - q0) <= F &&
1824
6/6
✓ Branch 0 taken 11583661 times.
✓ Branch 1 taken 8387595 times.
✓ Branch 2 taken 9431485 times.
✓ Branch 3 taken 670489 times.
✓ Branch 4 taken 8879484 times.
✓ Branch 5 taken 552001 times.
31554917 FFABS(q2 - q0) <= F && FFABS(q3 - q0) <= F;
1825
1826
6/6
✓ Branch 0 taken 8619979 times.
✓ Branch 1 taken 20482176 times.
✓ Branch 2 taken 2532582 times.
✓ Branch 3 taken 6087397 times.
✓ Branch 4 taken 2509697 times.
✓ Branch 5 taken 22885 times.
29102155 if (wd >= 16 && flat8out && flat8in) {
1827 2509697 dst[strideb * -7] = (p7 + p7 + p7 + p7 + p7 + p7 + p7 + p6 * 2 +
1828 2509697 p5 + p4 + p3 + p2 + p1 + p0 + q0 + 8) >> 4;
1829 2509697 dst[strideb * -6] = (p7 + p7 + p7 + p7 + p7 + p7 + p6 + p5 * 2 +
1830 2509697 p4 + p3 + p2 + p1 + p0 + q0 + q1 + 8) >> 4;
1831 2509697 dst[strideb * -5] = (p7 + p7 + p7 + p7 + p7 + p6 + p5 + p4 * 2 +
1832 2509697 p3 + p2 + p1 + p0 + q0 + q1 + q2 + 8) >> 4;
1833 2509697 dst[strideb * -4] = (p7 + p7 + p7 + p7 + p6 + p5 + p4 + p3 * 2 +
1834 2509697 p2 + p1 + p0 + q0 + q1 + q2 + q3 + 8) >> 4;
1835 2509697 dst[strideb * -3] = (p7 + p7 + p7 + p6 + p5 + p4 + p3 + p2 * 2 +
1836 2509697 p1 + p0 + q0 + q1 + q2 + q3 + q4 + 8) >> 4;
1837 2509697 dst[strideb * -2] = (p7 + p7 + p6 + p5 + p4 + p3 + p2 + p1 * 2 +
1838 2509697 p0 + q0 + q1 + q2 + q3 + q4 + q5 + 8) >> 4;
1839 2509697 dst[strideb * -1] = (p7 + p6 + p5 + p4 + p3 + p2 + p1 + p0 * 2 +
1840 2509697 q0 + q1 + q2 + q3 + q4 + q5 + q6 + 8) >> 4;
1841 2509697 dst[strideb * +0] = (p6 + p5 + p4 + p3 + p2 + p1 + p0 + q0 * 2 +
1842 2509697 q1 + q2 + q3 + q4 + q5 + q6 + q7 + 8) >> 4;
1843 2509697 dst[strideb * +1] = (p5 + p4 + p3 + p2 + p1 + p0 + q0 + q1 * 2 +
1844 2509697 q2 + q3 + q4 + q5 + q6 + q7 + q7 + 8) >> 4;
1845 2509697 dst[strideb * +2] = (p4 + p3 + p2 + p1 + p0 + q0 + q1 + q2 * 2 +
1846 2509697 q3 + q4 + q5 + q6 + q7 + q7 + q7 + 8) >> 4;
1847 2509697 dst[strideb * +3] = (p3 + p2 + p1 + p0 + q0 + q1 + q2 + q3 * 2 +
1848 2509697 q4 + q5 + q6 + q7 + q7 + q7 + q7 + 8) >> 4;
1849 2509697 dst[strideb * +4] = (p2 + p1 + p0 + q0 + q1 + q2 + q3 + q4 * 2 +
1850 2509697 q5 + q6 + q7 + q7 + q7 + q7 + q7 + 8) >> 4;
1851 2509697 dst[strideb * +5] = (p1 + p0 + q0 + q1 + q2 + q3 + q4 + q5 * 2 +
1852 2509697 q6 + q7 + q7 + q7 + q7 + q7 + q7 + 8) >> 4;
1853 2509697 dst[strideb * +6] = (p0 + q0 + q1 + q2 + q3 + q4 + q5 + q6 * 2 +
1854 2509697 q7 + q7 + q7 + q7 + q7 + q7 + q7 + 8) >> 4;
1855
4/4
✓ Branch 0 taken 17461559 times.
✓ Branch 1 taken 9130899 times.
✓ Branch 2 taken 6369787 times.
✓ Branch 3 taken 11091772 times.
26592458 } else if (wd >= 8 && flat8in) {
1856 6369787 dst[strideb * -3] = (p3 + p3 + p3 + 2 * p2 + p1 + p0 + q0 + 4) >> 3;
1857 6369787 dst[strideb * -2] = (p3 + p3 + p2 + 2 * p1 + p0 + q0 + q1 + 4) >> 3;
1858 6369787 dst[strideb * -1] = (p3 + p2 + p1 + 2 * p0 + q0 + q1 + q2 + 4) >> 3;
1859 6369787 dst[strideb * +0] = (p2 + p1 + p0 + 2 * q0 + q1 + q2 + q3 + 4) >> 3;
1860 6369787 dst[strideb * +1] = (p1 + p0 + q0 + 2 * q1 + q2 + q3 + q3 + 4) >> 3;
1861 6369787 dst[strideb * +2] = (p0 + q0 + q1 + 2 * q2 + q3 + q3 + q3 + 4) >> 3;
1862 } else {
1863
4/4
✓ Branch 0 taken 11557538 times.
✓ Branch 1 taken 8665133 times.
✓ Branch 2 taken 3254020 times.
✓ Branch 3 taken 8303518 times.
20222671 int hev = FFABS(p1 - p0) > H || FFABS(q1 - q0) > H;
1864
1865
2/2
✓ Branch 0 taken 11919153 times.
✓ Branch 1 taken 8303518 times.
20222671 if (hev) {
1866 11919153 int f = av_clip_intp2(p1 - q1, BIT_DEPTH - 1), f1, f2;
1867 11919153 f = av_clip_intp2(3 * (q0 - p0) + f, BIT_DEPTH - 1);
1868
1869 11919153 f1 = FFMIN(f + 4, (1 << (BIT_DEPTH - 1)) - 1) >> 3;
1870 11919153 f2 = FFMIN(f + 3, (1 << (BIT_DEPTH - 1)) - 1) >> 3;
1871
1872 11919153 dst[strideb * -1] = av_clip_pixel(p0 + f2);
1873 11919153 dst[strideb * +0] = av_clip_pixel(q0 - f1);
1874 } else {
1875 8303518 int f = av_clip_intp2(3 * (q0 - p0), BIT_DEPTH - 1), f1, f2;
1876
1877 8303518 f1 = FFMIN(f + 4, (1 << (BIT_DEPTH - 1)) - 1) >> 3;
1878 8303518 f2 = FFMIN(f + 3, (1 << (BIT_DEPTH - 1)) - 1) >> 3;
1879
1880 8303518 dst[strideb * -1] = av_clip_pixel(p0 + f2);
1881 8303518 dst[strideb * +0] = av_clip_pixel(q0 - f1);
1882
1883 8303518 f = (f1 + 1) >> 1;
1884 8303518 dst[strideb * -2] = av_clip_pixel(p1 + f);
1885 8303518 dst[strideb * +1] = av_clip_pixel(q1 - f);
1886 }
1887 }
1888 }
1889 4996710 }
1890
1891 #define lf_8_fn(dir, wd, stridea, strideb) \
1892 static void loop_filter_##dir##_##wd##_8_c(uint8_t *_dst, \
1893 ptrdiff_t stride, \
1894 int E, int I, int H) \
1895 { \
1896 pixel *dst = (pixel *) _dst; \
1897 stride /= sizeof(pixel); \
1898 loop_filter(dst, E, I, H, stridea, strideb, wd); \
1899 }
1900
1901 #define lf_8_fns(wd) \
1902 lf_8_fn(h, wd, stride, 1) \
1903 lf_8_fn(v, wd, 1, stride)
1904
1905 3267118 lf_8_fns(4)
1906 4002492 lf_8_fns(8)
1907 2723810 lf_8_fns(16)
1908
1909 #undef lf_8_fn
1910 #undef lf_8_fns
1911
1912 #define lf_16_fn(dir, stridea) \
1913 static void loop_filter_##dir##_16_16_c(uint8_t *dst, \
1914 ptrdiff_t stride, \
1915 int E, int I, int H) \
1916 { \
1917 loop_filter_##dir##_16_8_c(dst, stride, E, I, H); \
1918 loop_filter_##dir##_16_8_c(dst + 8 * stridea, stride, E, I, H); \
1919 }
1920
1921 337992 lf_16_fn(h, stride)
1922 341929 lf_16_fn(v, sizeof(pixel))
1923
1924 #undef lf_16_fn
1925
1926 #define lf_mix_fn(dir, wd1, wd2, stridea) \
1927 static void loop_filter_##dir##_##wd1##wd2##_16_c(uint8_t *dst, \
1928 ptrdiff_t stride, \
1929 int E, int I, int H) \
1930 { \
1931 loop_filter_##dir##_##wd1##_8_c(dst, stride, E & 0xff, I & 0xff, H & 0xff); \
1932 loop_filter_##dir##_##wd2##_8_c(dst + 8 * stridea, stride, E >> 8, I >> 8, H >> 8); \
1933 }
1934
1935 #define lf_mix_fns(wd1, wd2) \
1936 lf_mix_fn(h, wd1, wd2, stride) \
1937 lf_mix_fn(v, wd1, wd2, sizeof(pixel))
1938
1939 1036834 lf_mix_fns(4, 4)
1940 260876 lf_mix_fns(4, 8)
1941 266456 lf_mix_fns(8, 4)
1942 1698914 lf_mix_fns(8, 8)
1943
1944 #undef lf_mix_fn
1945 #undef lf_mix_fns
1946
1947 674 static av_cold void vp9dsp_loopfilter_init(VP9DSPContext *dsp)
1948 {
1949 674 dsp->loop_filter_8[0][0] = loop_filter_h_4_8_c;
1950 674 dsp->loop_filter_8[0][1] = loop_filter_v_4_8_c;
1951 674 dsp->loop_filter_8[1][0] = loop_filter_h_8_8_c;
1952 674 dsp->loop_filter_8[1][1] = loop_filter_v_8_8_c;
1953 674 dsp->loop_filter_8[2][0] = loop_filter_h_16_8_c;
1954 674 dsp->loop_filter_8[2][1] = loop_filter_v_16_8_c;
1955
1956 674 dsp->loop_filter_16[0] = loop_filter_h_16_16_c;
1957 674 dsp->loop_filter_16[1] = loop_filter_v_16_16_c;
1958
1959 674 dsp->loop_filter_mix2[0][0][0] = loop_filter_h_44_16_c;
1960 674 dsp->loop_filter_mix2[0][0][1] = loop_filter_v_44_16_c;
1961 674 dsp->loop_filter_mix2[0][1][0] = loop_filter_h_48_16_c;
1962 674 dsp->loop_filter_mix2[0][1][1] = loop_filter_v_48_16_c;
1963 674 dsp->loop_filter_mix2[1][0][0] = loop_filter_h_84_16_c;
1964 674 dsp->loop_filter_mix2[1][0][1] = loop_filter_v_84_16_c;
1965 674 dsp->loop_filter_mix2[1][1][0] = loop_filter_h_88_16_c;
1966 674 dsp->loop_filter_mix2[1][1][1] = loop_filter_v_88_16_c;
1967 674 }
1968
1969 #if BIT_DEPTH != 12
1970
1971 169249 static av_always_inline void copy_c(uint8_t *restrict dst, ptrdiff_t dst_stride,
1972 const uint8_t *restrict src,
1973 ptrdiff_t src_stride, int w, int h)
1974 {
1975 do {
1976 2002600 memcpy(dst, src, w * sizeof(pixel));
1977
1978 2002600 dst += dst_stride;
1979 2002600 src += src_stride;
1980
2/2
✓ Branch 0 taken 1833351 times.
✓ Branch 1 taken 169249 times.
2002600 } while (--h);
1981 169249 }
1982
1983 6636 static av_always_inline void avg_c(uint8_t *restrict _dst, ptrdiff_t dst_stride,
1984 const uint8_t *restrict _src,
1985 ptrdiff_t src_stride, int w, int h)
1986 {
1987 6636 pixel *dst = (pixel *) _dst;
1988 6636 const pixel *src = (const pixel *) _src;
1989
1990 6636 dst_stride /= sizeof(pixel);
1991 6636 src_stride /= sizeof(pixel);
1992 do {
1993 int x;
1994
1995
2/2
✓ Branch 0 taken 303764 times.
✓ Branch 1 taken 66444 times.
370208 for (x = 0; x < w; x += 4)
1996 303764 AV_WN4PA(&dst[x], rnd_avg_pixel4(AV_RN4PA(&dst[x]), AV_RN4P(&src[x])));
1997
1998 66444 dst += dst_stride;
1999 66444 src += src_stride;
2000
2/2
✓ Branch 0 taken 59808 times.
✓ Branch 1 taken 6636 times.
66444 } while (--h);
2001 6636 }
2002
2003 #define fpel_fn(type, sz) \
2004 static void type##sz##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2005 const uint8_t *src, ptrdiff_t src_stride, \
2006 int h, int mx, int my) \
2007 { \
2008 type##_c(dst, dst_stride, src, src_stride, sz, h); \
2009 }
2010
2011 #define copy_avg_fn(sz) \
2012 fpel_fn(copy, sz) \
2013 fpel_fn(avg, sz)
2014
2015 9132 copy_avg_fn(64)
2016 37580 copy_avg_fn(32)
2017 74040 copy_avg_fn(16)
2018 132390 copy_avg_fn(8)
2019 98628 copy_avg_fn(4)
2020
2021 #undef fpel_fn
2022 #undef copy_avg_fn
2023
2024 #endif /* BIT_DEPTH != 12 */
2025
2026 #define FILTER_8TAP(src, x, F, stride) \
2027 av_clip_pixel((F[0] * src[x + -3 * stride] + \
2028 F[1] * src[x + -2 * stride] + \
2029 F[2] * src[x + -1 * stride] + \
2030 F[3] * src[x + +0 * stride] + \
2031 F[4] * src[x + +1 * stride] + \
2032 F[5] * src[x + +2 * stride] + \
2033 F[6] * src[x + +3 * stride] + \
2034 F[7] * src[x + +4 * stride] + 64) >> 7)
2035
2036 475587 static av_always_inline void do_8tap_1d_c(uint8_t *_dst, ptrdiff_t dst_stride,
2037 const uint8_t *_src, ptrdiff_t src_stride,
2038 int w, int h, ptrdiff_t ds,
2039 const int16_t *filter, int avg)
2040 {
2041 475587 pixel *dst = (pixel *) _dst;
2042 475587 const pixel *src = (const pixel *) _src;
2043
2044 475587 dst_stride /= sizeof(pixel);
2045 475587 src_stride /= sizeof(pixel);
2046 do {
2047 int x;
2048
2049
2/2
✓ Branch 0 taken 66093232 times.
✓ Branch 1 taken 4286152 times.
70379384 for (x = 0; x < w; x++)
2050
2/2
✓ Branch 0 taken 5447824 times.
✓ Branch 1 taken 60645408 times.
66093232 if (avg) {
2051 5447824 dst[x] = (dst[x] + FILTER_8TAP(src, x, filter, ds) + 1) >> 1;
2052 } else {
2053 60645408 dst[x] = FILTER_8TAP(src, x, filter, ds);
2054 }
2055
2056 4286152 dst += dst_stride;
2057 4286152 src += src_stride;
2058
2/2
✓ Branch 0 taken 3810565 times.
✓ Branch 1 taken 475587 times.
4286152 } while (--h);
2059 475587 }
2060
2061 #define filter_8tap_1d_fn(opn, opa, dir, ds) \
2062 static av_noinline void opn##_8tap_1d_##dir##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2063 const uint8_t *src, ptrdiff_t src_stride, \
2064 int w, int h, const int16_t *filter) \
2065 { \
2066 do_8tap_1d_c(dst, dst_stride, src, src_stride, w, h, ds, filter, opa); \
2067 }
2068
2069 188113 filter_8tap_1d_fn(put, 0, v, src_stride / sizeof(pixel))
2070 261154 filter_8tap_1d_fn(put, 0, h, 1)
2071 10787 filter_8tap_1d_fn(avg, 1, v, src_stride / sizeof(pixel))
2072 15533 filter_8tap_1d_fn(avg, 1, h, 1)
2073
2074 #undef filter_8tap_1d_fn
2075
2076 1103022 static av_always_inline void do_8tap_2d_c(uint8_t *_dst, ptrdiff_t dst_stride,
2077 const uint8_t *_src, ptrdiff_t src_stride,
2078 int w, int h, const int16_t *filterx,
2079 const int16_t *filtery, int avg)
2080 {
2081 1103022 int tmp_h = h + 7;
2082 1103022 pixel tmp[64 * 71], *tmp_ptr = tmp;
2083 1103022 pixel *dst = (pixel *) _dst;
2084 1103022 const pixel *src = (const pixel *) _src;
2085
2086 1103022 dst_stride /= sizeof(pixel);
2087 1103022 src_stride /= sizeof(pixel);
2088 1103022 src -= src_stride * 3;
2089 do {
2090 int x;
2091
2092
2/2
✓ Branch 0 taken 168564992 times.
✓ Branch 1 taken 16239078 times.
184804070 for (x = 0; x < w; x++)
2093 168564992 tmp_ptr[x] = FILTER_8TAP(src, x, filterx, 1);
2094
2095 16239078 tmp_ptr += 64;
2096 16239078 src += src_stride;
2097
2/2
✓ Branch 0 taken 15136056 times.
✓ Branch 1 taken 1103022 times.
16239078 } while (--tmp_h);
2098
2099 1103022 tmp_ptr = tmp + 64 * 3;
2100 do {
2101 int x;
2102
2103
2/2
✓ Branch 0 taken 108601648 times.
✓ Branch 1 taken 8517924 times.
117119572 for (x = 0; x < w; x++)
2104
2/2
✓ Branch 0 taken 9056928 times.
✓ Branch 1 taken 99544720 times.
108601648 if (avg) {
2105 9056928 dst[x] = (dst[x] + FILTER_8TAP(tmp_ptr, x, filtery, 64) + 1) >> 1;
2106 } else {
2107 99544720 dst[x] = FILTER_8TAP(tmp_ptr, x, filtery, 64);
2108 }
2109
2110 8517924 tmp_ptr += 64;
2111 8517924 dst += dst_stride;
2112
2/2
✓ Branch 0 taken 7414902 times.
✓ Branch 1 taken 1103022 times.
8517924 } while (--h);
2113 1103022 }
2114
2115 #define filter_8tap_2d_fn(opn, opa) \
2116 static av_noinline void opn##_8tap_2d_hv_c(uint8_t *dst, ptrdiff_t dst_stride, \
2117 const uint8_t *src, ptrdiff_t src_stride, \
2118 int w, int h, const int16_t *filterx, \
2119 const int16_t *filtery) \
2120 { \
2121 do_8tap_2d_c(dst, dst_stride, src, src_stride, w, h, filterx, filtery, opa); \
2122 }
2123
2124 1051983 filter_8tap_2d_fn(put, 0)
2125 51039 filter_8tap_2d_fn(avg, 1)
2126
2127 #undef filter_8tap_2d_fn
2128
2129 #define filter_fn_1d(sz, dir, dir_m, type, type_idx, avg) \
2130 static void avg##_8tap_##type##_##sz##dir##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2131 const uint8_t *src, ptrdiff_t src_stride, \
2132 int h, int mx, int my) \
2133 { \
2134 avg##_8tap_1d_##dir##_c(dst, dst_stride, src, src_stride, sz, h, \
2135 ff_vp9_subpel_filters[type_idx][dir_m]); \
2136 }
2137
2138 #define filter_fn_2d(sz, type, type_idx, avg) \
2139 static void avg##_8tap_##type##_##sz##hv_c(uint8_t *dst, ptrdiff_t dst_stride, \
2140 const uint8_t *src, ptrdiff_t src_stride, \
2141 int h, int mx, int my) \
2142 { \
2143 avg##_8tap_2d_hv_c(dst, dst_stride, src, src_stride, sz, h, \
2144 ff_vp9_subpel_filters[type_idx][mx], \
2145 ff_vp9_subpel_filters[type_idx][my]); \
2146 }
2147
2148 #if BIT_DEPTH != 12
2149
2150 #define FILTER_BILIN(src, x, mxy, stride) \
2151 (src[x] + ((mxy * (src[x + stride] - src[x]) + 8) >> 4))
2152
2153 4285 static av_always_inline void do_bilin_1d_c(uint8_t *_dst, ptrdiff_t dst_stride,
2154 const uint8_t *_src, ptrdiff_t src_stride,
2155 int w, int h, ptrdiff_t ds, int mxy, int avg)
2156 {
2157 4285 pixel *dst = (pixel *) _dst;
2158 4285 const pixel *src = (const pixel *) _src;
2159
2160 4285 dst_stride /= sizeof(pixel);
2161 4285 src_stride /= sizeof(pixel);
2162 do {
2163 int x;
2164
2165
2/2
✓ Branch 0 taken 652832 times.
✓ Branch 1 taken 41296 times.
694128 for (x = 0; x < w; x++)
2166
2/2
✓ Branch 0 taken 65472 times.
✓ Branch 1 taken 587360 times.
652832 if (avg) {
2167 65472 dst[x] = (dst[x] + FILTER_BILIN(src, x, mxy, ds) + 1) >> 1;
2168 } else {
2169 587360 dst[x] = FILTER_BILIN(src, x, mxy, ds);
2170 }
2171
2172 41296 dst += dst_stride;
2173 41296 src += src_stride;
2174
2/2
✓ Branch 0 taken 37011 times.
✓ Branch 1 taken 4285 times.
41296 } while (--h);
2175 4285 }
2176
2177 #define bilin_1d_fn(opn, opa, dir, ds) \
2178 static av_noinline void opn##_bilin_1d_##dir##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2179 const uint8_t *src, ptrdiff_t src_stride, \
2180 int w, int h, int mxy) \
2181 { \
2182 do_bilin_1d_c(dst, dst_stride, src, src_stride, w, h, ds, mxy, opa); \
2183 }
2184
2185 1527 bilin_1d_fn(put, 0, v, src_stride / sizeof(pixel))
2186 2698 bilin_1d_fn(put, 0, h, 1)
2187 30 bilin_1d_fn(avg, 1, v, src_stride / sizeof(pixel))
2188 30 bilin_1d_fn(avg, 1, h, 1)
2189
2190 #undef bilin_1d_fn
2191
2192 8301 static av_always_inline void do_bilin_2d_c(uint8_t *_dst, ptrdiff_t dst_stride,
2193 const uint8_t *_src, ptrdiff_t src_stride,
2194 int w, int h, int mx, int my, int avg)
2195 {
2196 8301 pixel tmp[64 * 65], *tmp_ptr = tmp;
2197 8301 int tmp_h = h + 1;
2198 8301 pixel *dst = (pixel *) _dst;
2199 8301 const pixel *src = (const pixel *) _src;
2200
2201 8301 dst_stride /= sizeof(pixel);
2202 8301 src_stride /= sizeof(pixel);
2203 do {
2204 int x;
2205
2206
2/2
✓ Branch 0 taken 796840 times.
✓ Branch 1 taken 71877 times.
868717 for (x = 0; x < w; x++)
2207 796840 tmp_ptr[x] = FILTER_BILIN(src, x, mx, 1);
2208
2209 71877 tmp_ptr += 64;
2210 71877 src += src_stride;
2211
2/2
✓ Branch 0 taken 63576 times.
✓ Branch 1 taken 8301 times.
71877 } while (--tmp_h);
2212
2213 8301 tmp_ptr = tmp;
2214 do {
2215 int x;
2216
2217
2/2
✓ Branch 0 taken 730080 times.
✓ Branch 1 taken 63576 times.
793656 for (x = 0; x < w; x++)
2218
2/2
✓ Branch 0 taken 32736 times.
✓ Branch 1 taken 697344 times.
730080 if (avg) {
2219 32736 dst[x] = (dst[x] + FILTER_BILIN(tmp_ptr, x, my, 64) + 1) >> 1;
2220 } else {
2221 697344 dst[x] = FILTER_BILIN(tmp_ptr, x, my, 64);
2222 }
2223
2224 63576 tmp_ptr += 64;
2225 63576 dst += dst_stride;
2226
2/2
✓ Branch 0 taken 55275 times.
✓ Branch 1 taken 8301 times.
63576 } while (--h);
2227 8301 }
2228
2229 #define bilin_2d_fn(opn, opa) \
2230 static av_noinline void opn##_bilin_2d_hv_c(uint8_t *dst, ptrdiff_t dst_stride, \
2231 const uint8_t *src, ptrdiff_t src_stride, \
2232 int w, int h, int mx, int my) \
2233 { \
2234 do_bilin_2d_c(dst, dst_stride, src, src_stride, w, h, mx, my, opa); \
2235 }
2236
2237 8271 bilin_2d_fn(put, 0)
2238 30 bilin_2d_fn(avg, 1)
2239
2240 #undef bilin_2d_fn
2241
2242 #define bilinf_fn_1d(sz, dir, dir_m, avg) \
2243 static void avg##_bilin_##sz##dir##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2244 const uint8_t *src, ptrdiff_t src_stride, \
2245 int h, int mx, int my) \
2246 { \
2247 avg##_bilin_1d_##dir##_c(dst, dst_stride, src, src_stride, sz, h, dir_m); \
2248 }
2249
2250 #define bilinf_fn_2d(sz, avg) \
2251 static void avg##_bilin_##sz##hv_c(uint8_t *dst, ptrdiff_t dst_stride, \
2252 const uint8_t *src, ptrdiff_t src_stride, \
2253 int h, int mx, int my) \
2254 { \
2255 avg##_bilin_2d_hv_c(dst, dst_stride, src, src_stride, sz, h, mx, my); \
2256 }
2257
2258 #else
2259
2260 #define bilinf_fn_1d(a, b, c, d)
2261 #define bilinf_fn_2d(a, b)
2262
2263 #endif
2264
2265 #define filter_fn(sz, avg) \
2266 filter_fn_1d(sz, h, mx, regular, FILTER_8TAP_REGULAR, avg) \
2267 filter_fn_1d(sz, v, my, regular, FILTER_8TAP_REGULAR, avg) \
2268 filter_fn_2d(sz, regular, FILTER_8TAP_REGULAR, avg) \
2269 filter_fn_1d(sz, h, mx, smooth, FILTER_8TAP_SMOOTH, avg) \
2270 filter_fn_1d(sz, v, my, smooth, FILTER_8TAP_SMOOTH, avg) \
2271 filter_fn_2d(sz, smooth, FILTER_8TAP_SMOOTH, avg) \
2272 filter_fn_1d(sz, h, mx, sharp, FILTER_8TAP_SHARP, avg) \
2273 filter_fn_1d(sz, v, my, sharp, FILTER_8TAP_SHARP, avg) \
2274 filter_fn_2d(sz, sharp, FILTER_8TAP_SHARP, avg) \
2275 bilinf_fn_1d(sz, h, mx, avg) \
2276 bilinf_fn_1d(sz, v, my, avg) \
2277 bilinf_fn_2d(sz, avg)
2278
2279 #define filter_fn_set(avg) \
2280 filter_fn(64, avg) \
2281 filter_fn(32, avg) \
2282 filter_fn(16, avg) \
2283 filter_fn(8, avg) \
2284 filter_fn(4, avg)
2285
2286 3027492 filter_fn_set(put)
2287 154898 filter_fn_set(avg)
2288
2289 #undef filter_fn
2290 #undef filter_fn_set
2291 #undef filter_fn_1d
2292 #undef filter_fn_2d
2293 #undef bilinf_fn_1d
2294 #undef bilinf_fn_2d
2295
2296 #if BIT_DEPTH != 8
2297 void ff_vp9dsp_mc_init_10(VP9DSPContext *dsp);
2298 #endif
2299 #if BIT_DEPTH != 10
2300 static
2301 #endif
2302 749 av_cold void FUNC(ff_vp9dsp_mc_init)(VP9DSPContext *dsp)
2303 {
2304 #if BIT_DEPTH == 12
2305 75 ff_vp9dsp_mc_init_10(dsp);
2306 #else /* BIT_DEPTH == 12 */
2307
2308 #define init_fpel(idx1, idx2, sz, type) \
2309 dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][0][0] = type##sz##_c; \
2310 dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][0][0] = type##sz##_c; \
2311 dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][0][0] = type##sz##_c; \
2312 dsp->mc[idx1][FILTER_BILINEAR ][idx2][0][0] = type##sz##_c
2313
2314 #define init_copy_avg(idx, sz) \
2315 init_fpel(idx, 0, sz, copy); \
2316 init_fpel(idx, 1, sz, avg)
2317
2318 674 init_copy_avg(0, 64);
2319 674 init_copy_avg(1, 32);
2320 674 init_copy_avg(2, 16);
2321 674 init_copy_avg(3, 8);
2322 674 init_copy_avg(4, 4);
2323
2324 #undef init_copy_avg
2325 #undef init_fpel
2326
2327 #endif /* BIT_DEPTH == 12 */
2328
2329 #define init_subpel1_bd_aware(idx1, idx2, idxh, idxv, sz, dir, type) \
2330 dsp->mc[idx1][FILTER_8TAP_SMOOTH ][idx2][idxh][idxv] = type##_8tap_smooth_##sz##dir##_c; \
2331 dsp->mc[idx1][FILTER_8TAP_REGULAR][idx2][idxh][idxv] = type##_8tap_regular_##sz##dir##_c; \
2332 dsp->mc[idx1][FILTER_8TAP_SHARP ][idx2][idxh][idxv] = type##_8tap_sharp_##sz##dir##_c
2333
2334 #if BIT_DEPTH == 12
2335 #define init_subpel1 init_subpel1_bd_aware
2336 #else
2337 #define init_subpel1(idx1, idx2, idxh, idxv, sz, dir, type) \
2338 init_subpel1_bd_aware(idx1, idx2, idxh, idxv, sz, dir, type); \
2339 dsp->mc[idx1][FILTER_BILINEAR ][idx2][idxh][idxv] = type##_bilin_##sz##dir##_c
2340 #endif
2341
2342 #define init_subpel2(idx, idxh, idxv, dir, type) \
2343 init_subpel1(0, idx, idxh, idxv, 64, dir, type); \
2344 init_subpel1(1, idx, idxh, idxv, 32, dir, type); \
2345 init_subpel1(2, idx, idxh, idxv, 16, dir, type); \
2346 init_subpel1(3, idx, idxh, idxv, 8, dir, type); \
2347 init_subpel1(4, idx, idxh, idxv, 4, dir, type)
2348
2349 #define init_subpel3(idx, type) \
2350 init_subpel2(idx, 1, 1, hv, type); \
2351 init_subpel2(idx, 0, 1, v, type); \
2352 init_subpel2(idx, 1, 0, h, type)
2353
2354 749 init_subpel3(0, put);
2355 749 init_subpel3(1, avg);
2356
2357 #undef init_subpel1
2358 #undef init_subpel2
2359 #undef init_subpel3
2360 #undef init_subpel1_bd_aware
2361 749 }
2362
2363 2067 static av_always_inline void do_scaled_8tap_c(uint8_t *_dst, ptrdiff_t dst_stride,
2364 const uint8_t *_src, ptrdiff_t src_stride,
2365 int w, int h, int mx, int my,
2366 int dx, int dy, int avg,
2367 const int16_t (*filters)[8])
2368 {
2369 2067 int tmp_h = (((h - 1) * dy + my) >> 4) + 8;
2370 2067 pixel tmp[64 * 135], *tmp_ptr = tmp;
2371 2067 pixel *dst = (pixel *) _dst;
2372 2067 const pixel *src = (const pixel *) _src;
2373
2374 2067 dst_stride /= sizeof(pixel);
2375 2067 src_stride /= sizeof(pixel);
2376 2067 src -= src_stride * 3;
2377 do {
2378 int x;
2379 30118 int imx = mx, ioff = 0;
2380
2381
2/2
✓ Branch 0 taken 335448 times.
✓ Branch 1 taken 30118 times.
365566 for (x = 0; x < w; x++) {
2382 335448 tmp_ptr[x] = FILTER_8TAP(src, ioff, filters[imx], 1);
2383 335448 imx += dx;
2384 335448 ioff += imx >> 4;
2385 335448 imx &= 0xf;
2386 }
2387
2388 30118 tmp_ptr += 64;
2389 30118 src += src_stride;
2390
2/2
✓ Branch 0 taken 28051 times.
✓ Branch 1 taken 2067 times.
30118 } while (--tmp_h);
2391
2392 2067 tmp_ptr = tmp + 64 * 3;
2393 do {
2394 int x;
2395 17088 const int16_t *filter = filters[my];
2396
2397
2/2
✓ Branch 0 taken 224064 times.
✓ Branch 1 taken 17088 times.
241152 for (x = 0; x < w; x++)
2398
1/2
✗ Branch 0 not taken.
✓ Branch 1 taken 224064 times.
224064 if (avg) {
2399 dst[x] = (dst[x] + FILTER_8TAP(tmp_ptr, x, filter, 64) + 1) >> 1;
2400 } else {
2401 224064 dst[x] = FILTER_8TAP(tmp_ptr, x, filter, 64);
2402 }
2403
2404 17088 my += dy;
2405 17088 tmp_ptr += (my >> 4) * 64;
2406 17088 my &= 0xf;
2407 17088 dst += dst_stride;
2408
2/2
✓ Branch 0 taken 15021 times.
✓ Branch 1 taken 2067 times.
17088 } while (--h);
2409 2067 }
2410
2411 #define scaled_filter_8tap_fn(opn, opa) \
2412 static av_noinline void opn##_scaled_8tap_c(uint8_t *dst, ptrdiff_t dst_stride, \
2413 const uint8_t *src, ptrdiff_t src_stride, \
2414 int w, int h, int mx, int my, int dx, int dy, \
2415 const int16_t (*filters)[8]) \
2416 { \
2417 do_scaled_8tap_c(dst, dst_stride, src, src_stride, w, h, mx, my, dx, dy, \
2418 opa, filters); \
2419 }
2420
2421 2067 scaled_filter_8tap_fn(put, 0)
2422 scaled_filter_8tap_fn(avg, 1)
2423
2424 #undef scaled_filter_8tap_fn
2425
2426 #undef FILTER_8TAP
2427
2428 #define scaled_filter_fn(sz, type, type_idx, avg) \
2429 static void avg##_scaled_##type##_##sz##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2430 const uint8_t *src, ptrdiff_t src_stride, \
2431 int h, int mx, int my, int dx, int dy) \
2432 { \
2433 avg##_scaled_8tap_c(dst, dst_stride, src, src_stride, sz, h, mx, my, dx, dy, \
2434 ff_vp9_subpel_filters[type_idx]); \
2435 }
2436
2437 #if BIT_DEPTH != 12
2438
2439 static av_always_inline void do_scaled_bilin_c(uint8_t *_dst, ptrdiff_t dst_stride,
2440 const uint8_t *_src, ptrdiff_t src_stride,
2441 int w, int h, int mx, int my,
2442 int dx, int dy, int avg)
2443 {
2444 pixel tmp[64 * 129], *tmp_ptr = tmp;
2445 int tmp_h = (((h - 1) * dy + my) >> 4) + 2;
2446 pixel *dst = (pixel *) _dst;
2447 const pixel *src = (const pixel *) _src;
2448
2449 dst_stride /= sizeof(pixel);
2450 src_stride /= sizeof(pixel);
2451 do {
2452 int x;
2453 int imx = mx, ioff = 0;
2454
2455 for (x = 0; x < w; x++) {
2456 tmp_ptr[x] = FILTER_BILIN(src, ioff, imx, 1);
2457 imx += dx;
2458 ioff += imx >> 4;
2459 imx &= 0xf;
2460 }
2461
2462 tmp_ptr += 64;
2463 src += src_stride;
2464 } while (--tmp_h);
2465
2466 tmp_ptr = tmp;
2467 do {
2468 int x;
2469
2470 for (x = 0; x < w; x++)
2471 if (avg) {
2472 dst[x] = (dst[x] + FILTER_BILIN(tmp_ptr, x, my, 64) + 1) >> 1;
2473 } else {
2474 dst[x] = FILTER_BILIN(tmp_ptr, x, my, 64);
2475 }
2476
2477 my += dy;
2478 tmp_ptr += (my >> 4) * 64;
2479 my &= 0xf;
2480 dst += dst_stride;
2481 } while (--h);
2482 }
2483
2484 #define scaled_bilin_fn(opn, opa) \
2485 static av_noinline void opn##_scaled_bilin_c(uint8_t *dst, ptrdiff_t dst_stride, \
2486 const uint8_t *src, ptrdiff_t src_stride, \
2487 int w, int h, int mx, int my, int dx, int dy) \
2488 { \
2489 do_scaled_bilin_c(dst, dst_stride, src, src_stride, w, h, mx, my, dx, dy, opa); \
2490 }
2491
2492 scaled_bilin_fn(put, 0)
2493 scaled_bilin_fn(avg, 1)
2494
2495 #undef scaled_bilin_fn
2496
2497 #undef FILTER_BILIN
2498
2499 #define scaled_bilinf_fn(sz, avg) \
2500 static void avg##_scaled_bilin_##sz##_c(uint8_t *dst, ptrdiff_t dst_stride, \
2501 const uint8_t *src, ptrdiff_t src_stride, \
2502 int h, int mx, int my, int dx, int dy) \
2503 { \
2504 avg##_scaled_bilin_c(dst, dst_stride, src, src_stride, sz, h, mx, my, dx, dy); \
2505 }
2506
2507 #else
2508
2509 #define scaled_bilinf_fn(a, b)
2510
2511 #endif
2512
2513 #define scaled_filter_fns(sz, avg) \
2514 scaled_filter_fn(sz, regular, FILTER_8TAP_REGULAR, avg) \
2515 scaled_filter_fn(sz, smooth, FILTER_8TAP_SMOOTH, avg) \
2516 scaled_filter_fn(sz, sharp, FILTER_8TAP_SHARP, avg) \
2517 scaled_bilinf_fn(sz, avg)
2518
2519 #define scaled_filter_fn_set(avg) \
2520 scaled_filter_fns(64, avg) \
2521 scaled_filter_fns(32, avg) \
2522 scaled_filter_fns(16, avg) \
2523 scaled_filter_fns(8, avg) \
2524 scaled_filter_fns(4, avg)
2525
2526 4134 scaled_filter_fn_set(put)
2527 scaled_filter_fn_set(avg)
2528
2529 #undef scaled_filter_fns
2530 #undef scaled_filter_fn_set
2531 #undef scaled_filter_fn
2532 #undef scaled_bilinf_fn
2533
2534 #if BIT_DEPTH != 8
2535 void ff_vp9dsp_scaled_mc_init_10(VP9DSPContext *dsp);
2536 #endif
2537 #if BIT_DEPTH != 10
2538 static
2539 #endif
2540 749 av_cold void FUNC(ff_vp9dsp_scaled_mc_init)(VP9DSPContext *dsp)
2541 {
2542 #define init_scaled_bd_aware(idx1, idx2, sz, type) \
2543 dsp->smc[idx1][FILTER_8TAP_SMOOTH ][idx2] = type##_scaled_smooth_##sz##_c; \
2544 dsp->smc[idx1][FILTER_8TAP_REGULAR][idx2] = type##_scaled_regular_##sz##_c; \
2545 dsp->smc[idx1][FILTER_8TAP_SHARP ][idx2] = type##_scaled_sharp_##sz##_c
2546
2547 #if BIT_DEPTH == 12
2548 75 ff_vp9dsp_scaled_mc_init_10(dsp);
2549 #define init_scaled(a,b,c,d) init_scaled_bd_aware(a,b,c,d)
2550 #else
2551 #define init_scaled(idx1, idx2, sz, type) \
2552 init_scaled_bd_aware(idx1, idx2, sz, type); \
2553 dsp->smc[idx1][FILTER_BILINEAR ][idx2] = type##_scaled_bilin_##sz##_c
2554 #endif
2555
2556 #define init_scaled_put_avg(idx, sz) \
2557 init_scaled(idx, 0, sz, put); \
2558 init_scaled(idx, 1, sz, avg)
2559
2560 749 init_scaled_put_avg(0, 64);
2561 749 init_scaled_put_avg(1, 32);
2562 749 init_scaled_put_avg(2, 16);
2563 749 init_scaled_put_avg(3, 8);
2564 749 init_scaled_put_avg(4, 4);
2565
2566 #undef init_scaled_put_avg
2567 #undef init_scaled
2568 #undef init_scaled_bd_aware
2569 749 }
2570
2571 674 av_cold void FUNC(ff_vp9dsp_init)(VP9DSPContext *dsp)
2572 {
2573 674 FUNC(ff_vp9dsp_intrapred_init)(dsp);
2574 674 vp9dsp_itxfm_init(dsp);
2575 674 vp9dsp_loopfilter_init(dsp);
2576 674 FUNC(ff_vp9dsp_mc_init)(dsp);
2577 674 FUNC(ff_vp9dsp_scaled_mc_init)(dsp);
2578 674 }
2579

空气净化器有什么作用 做果冻用什么粉 女同学过生日送什么礼物比较好 茉莉花是什么颜色 吃靶向药不能吃什么
10月27日什么星座 5.19是什么星座 云吞面是什么面 上皮内低度病变是什么意思 腋下发黑是什么原因
皮肤属于什么系统 亲嘴什么感觉 费洛蒙是什么 腋下黑是什么原因 血脂高会导致什么后果
2月23号是什么星座 胎儿左肾盂分离是什么意思 木瓜有什么功效 肛裂挂号挂什么科 diy是什么意思
吃蛋白粉有什么好处和坏处hcv8jop4ns6r.cn 急性肠胃炎有什么症状helloaicloud.com 派特ct主要检查什么hcv8jop2ns2r.cn 心口疼是什么原因引起的hcv7jop4ns6r.cn 身份证照片穿什么颜色衣服hcv9jop5ns3r.cn
冰片是什么hcv7jop7ns2r.cn 出佛身血是什么意思hcv9jop5ns5r.cn 什么毛什么血hcv8jop1ns4r.cn 恭敬地看的词语是什么hcv9jop1ns4r.cn 其他垃圾有什么hcv8jop3ns5r.cn
usc是什么意思hcv8jop4ns4r.cn pears是什么意思hcv9jop4ns7r.cn 戌时右眼跳是什么预兆hcv8jop2ns2r.cn 亭亭净植是什么意思hcv9jop5ns6r.cn 征求是什么意思hcv8jop5ns5r.cn
咖位是什么意思hcv8jop6ns3r.cn 女人为什么要少吃鳝鱼hcv9jop5ns4r.cn gmail是什么邮箱xjhesheng.com 张国荣属什么生肖hcv7jop9ns9r.cn 无机磷偏低有什么影响hcv8jop4ns7r.cn
百度