3 |
* XVID MPEG-4 VIDEO CODEC |
* XVID MPEG-4 VIDEO CODEC |
4 |
* - Forward DCT - |
* - Forward DCT - |
5 |
* |
* |
6 |
* These routines are from Independent JPEG Group's free JPEG software |
* Copyright (C) 2006-2011 Xvid Solutions GmbH |
|
* Copyright (C) 1991-1998, Thomas G. Lane (see the file README.IJG) |
|
7 |
* |
* |
8 |
* This program is free software ; you can redistribute it and/or modify |
* This program is free software ; you can redistribute it and/or modify |
9 |
* it under the terms of the GNU General Public License as published by |
* it under the terms of the GNU General Public License as published by |
23 |
* |
* |
24 |
****************************************************************************/ |
****************************************************************************/ |
25 |
|
|
|
/* Copyright (C) 1996, MPEG Software Simulation Group. All Rights Reserved. */ |
|
|
|
|
26 |
/* |
/* |
27 |
* Disclaimer of Warranty |
* Authors: Skal |
28 |
* |
* |
29 |
* These software programs are available to the user without any license fee or |
* "Fast and precise" LLM implementation of FDCT/IDCT, where |
30 |
* royalty on an "as is" basis. The MPEG Software Simulation Group disclaims |
* rotations are decomposed using: |
31 |
* any and all warranties, whether express, implied, or statuary, including any |
* tmp = (x+y).cos t |
32 |
* implied warranties or merchantability or of fitness for a particular |
* x' = tmp + y.(sin t - cos t) |
33 |
* purpose. In no event shall the copyright-holder be liable for any |
* y' = tmp - x.(sin t + cos t) |
34 |
* incidental, punitive, or consequential damages of any kind whatsoever |
* |
35 |
* arising from the use of these programs. |
* See details at the end of this file... |
36 |
* |
* |
37 |
* This disclaimer of warranty extends to the user of these programs and user's |
* Reference (e.g.): |
38 |
* customers, employees, agents, transferees, successors, and assigns. |
* Loeffler C., Ligtenberg A., and Moschytz C.S.: |
39 |
* |
* Practical Fast 1D DCT Algorithm with Eleven Multiplications, |
40 |
* The MPEG Software Simulation Group does not represent or warrant that the |
* Proc. ICASSP 1989, 988-991. |
41 |
* programs furnished hereunder are free of infringement of any third-party |
* |
42 |
* patents. |
* IEEE-1180-like error specs for FDCT: |
43 |
* |
* Peak error: 1.0000 |
44 |
* Commercial implementations of MPEG-1 and MPEG-2 video, including shareware, |
* Peak MSE: 0.0340 |
45 |
* are subject to royalty fees to patent holders. Many of these patents are |
* Overall MSE: 0.0200 |
46 |
* general enough such that they are unavoidable regardless of implementation |
* Peak ME: 0.0191 |
47 |
* design. |
* Overall ME: -0.0033 |
48 |
* |
* |
49 |
*/ |
********************************************************/ |
|
|
|
|
/* This routine is a slow-but-accurate integer implementation of the |
|
|
* forward DCT (Discrete Cosine Transform). Taken from the IJG software |
|
|
* |
|
|
* A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT |
|
|
* on each column. Direct algorithms are also available, but they are |
|
|
* much more complex and seem not to be any faster when reduced to code. |
|
|
* |
|
|
* This implementation is based on an algorithm described in |
|
|
* C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT |
|
|
* Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics, |
|
|
* Speech, and Signal Processing 1989 (ICASSP '89), pp. 988-991. |
|
|
* The primary algorithm described there uses 11 multiplies and 29 adds. |
|
|
* We use their alternate method with 12 multiplies and 32 adds. |
|
|
* The advantage of this method is that no data path contains more than one |
|
|
* multiplication; this allows a very simple and accurate implementation in |
|
|
* scaled fixed-point arithmetic, with a minimal number of shifts. |
|
|
* |
|
|
* The poop on this scaling stuff is as follows: |
|
|
* |
|
|
* Each 1-D DCT step produces outputs which are a factor of sqrt(N) |
|
|
* larger than the true DCT outputs. The final outputs are therefore |
|
|
* a factor of N larger than desired; since N=8 this can be cured by |
|
|
* a simple right shift at the end of the algorithm. The advantage of |
|
|
* this arrangement is that we save two multiplications per 1-D DCT, |
|
|
* because the y0 and y4 outputs need not be divided by sqrt(N). |
|
|
* In the IJG code, this factor of 8 is removed by the quantization step |
|
|
* (in jcdctmgr.c), here it is removed. |
|
|
* |
|
|
* We have to do addition and subtraction of the integer inputs, which |
|
|
* is no problem, and multiplication by fractional constants, which is |
|
|
* a problem to do in integer arithmetic. We multiply all the constants |
|
|
* by CONST_SCALE and convert them to integer constants (thus retaining |
|
|
* CONST_BITS bits of precision in the constants). After doing a |
|
|
* multiplication we have to divide the product by CONST_SCALE, with proper |
|
|
* rounding, to produce the correct output. This division can be done |
|
|
* cheaply as a right shift of CONST_BITS bits. We postpone shifting |
|
|
* as long as possible so that partial sums can be added together with |
|
|
* full fractional precision. |
|
|
* |
|
|
* The outputs of the first pass are scaled up by PASS1_BITS bits so that |
|
|
* they are represented to better-than-integral precision. These outputs |
|
|
* require 8 + PASS1_BITS + 3 bits; this fits in a 16-bit word |
|
|
* with the recommended scaling. (For 12-bit sample data, the intermediate |
|
|
* array is INT32 anyway.) |
|
|
* |
|
|
* To avoid overflow of the 32-bit intermediate results in pass 2, we must |
|
|
* have 8 + CONST_BITS + PASS1_BITS <= 26. Error analysis |
|
|
* shows that the values given below are the most effective. |
|
|
* |
|
|
* We can gain a little more speed, with a further compromise in accuracy, |
|
|
* by omitting the addition in a descaling shift. This yields an incorrectly |
|
|
* rounded result half the time... |
|
|
*/ |
|
50 |
|
|
51 |
#include "fdct.h" |
#include "fdct.h" |
52 |
|
|
53 |
#define USE_ACCURATE_ROUNDING |
/* function pointer */ |
54 |
|
fdctFuncPtr fdct; |
55 |
|
|
56 |
#define RIGHT_SHIFT(x, shft) ((x) >> (shft)) |
/* |
57 |
|
////////////////////////////////////////////////////////// |
58 |
|
*/ |
59 |
|
|
60 |
|
#define BUTF(a, b, tmp) \ |
61 |
|
(tmp) = (a)+(b); \ |
62 |
|
(b) = (a)-(b); \ |
63 |
|
(a) = (tmp) |
64 |
|
|
65 |
|
#define LOAD_BUTF(m1, m2, a, b, tmp, S) \ |
66 |
|
(m1) = (S)[(a)] + (S)[(b)]; \ |
67 |
|
(m2) = (S)[(a)] - (S)[(b)] |
68 |
|
|
69 |
|
#define ROTATE(m1,m2,c,k1,k2,tmp,Fix,Rnd) \ |
70 |
|
(tmp) = ( (m1) + (m2) )*(c); \ |
71 |
|
(m1) *= k1; \ |
72 |
|
(m2) *= k2; \ |
73 |
|
(tmp) += (Rnd); \ |
74 |
|
(m1) = ((m1)+(tmp))>>(Fix); \ |
75 |
|
(m2) = ((m2)+(tmp))>>(Fix); |
76 |
|
|
77 |
|
#define ROTATE2(m1,m2,c,k1,k2,tmp) \ |
78 |
|
(tmp) = ( (m1) + (m2) )*(c); \ |
79 |
|
(m1) *= k1; \ |
80 |
|
(m2) *= k2; \ |
81 |
|
(m1) = (m1)+(tmp); \ |
82 |
|
(m2) = (m2)+(tmp); |
83 |
|
|
84 |
|
#define ROTATE0(m1,m2,c,k1,k2,tmp) \ |
85 |
|
(m1) = ( (m2) )*(c); \ |
86 |
|
(m2) = (m2)*k2+(m1); |
87 |
|
|
88 |
|
#define SHIFTL(x,n) ((x)<<(n)) |
89 |
|
#define SHIFTR(x, n) ((x)>>(n)) |
90 |
|
#define HALF(n) (1<<((n)-1)) |
91 |
|
|
92 |
|
#define IPASS 3 |
93 |
|
#define FPASS 2 |
94 |
|
#define FIX 16 |
95 |
|
|
96 |
|
#if 1 |
97 |
|
|
98 |
|
#define ROT6_C 35468 |
99 |
|
#define ROT6_SmC 50159 |
100 |
|
#define ROT6_SpC 121095 |
101 |
|
#define ROT17_C 77062 |
102 |
|
#define ROT17_SmC 25571 |
103 |
|
#define ROT17_SpC 128553 |
104 |
|
#define ROT37_C 58981 |
105 |
|
#define ROT37_SmC 98391 |
106 |
|
#define ROT37_SpC 19571 |
107 |
|
#define ROT13_C 167963 |
108 |
|
#define ROT13_SmC 134553 |
109 |
|
#define ROT13_SpC 201373 |
110 |
|
|
|
#ifdef USE_ACCURATE_ROUNDING |
|
|
#define ONE ((int) 1) |
|
|
#define DESCALE(x, n) RIGHT_SHIFT((x) + (ONE << ((n) - 1)), n) |
|
111 |
#else |
#else |
|
#define DESCALE(x, n) RIGHT_SHIFT(x, n) |
|
|
#endif |
|
112 |
|
|
113 |
#define CONST_BITS 13 |
#include <math.h> |
114 |
#define PASS1_BITS 2 |
#define FX(x) ( (int)floor((x)*(1<<FIX) + .5 ) ) |
115 |
|
|
116 |
#define FIX_0_298631336 ((int) 2446) /* FIX(0.298631336) */ |
static const double c1 = cos(1.*M_PI/16); |
117 |
#define FIX_0_390180644 ((int) 3196) /* FIX(0.390180644) */ |
static const double c2 = cos(2.*M_PI/16); |
118 |
#define FIX_0_541196100 ((int) 4433) /* FIX(0.541196100) */ |
static const double c3 = cos(3.*M_PI/16); |
119 |
#define FIX_0_765366865 ((int) 6270) /* FIX(0.765366865) */ |
static const double c4 = cos(4.*M_PI/16); |
120 |
#define FIX_0_899976223 ((int) 7373) /* FIX(0.899976223) */ |
static const double c5 = cos(5.*M_PI/16); |
121 |
#define FIX_1_175875602 ((int) 9633) /* FIX(1.175875602) */ |
static const double c6 = cos(6.*M_PI/16); |
122 |
#define FIX_1_501321110 ((int) 12299) /* FIX(1.501321110) */ |
static const double c7 = cos(7.*M_PI/16); |
123 |
#define FIX_1_847759065 ((int) 15137) /* FIX(1.847759065) */ |
|
124 |
#define FIX_1_961570560 ((int) 16069) /* FIX(1.961570560) */ |
static const int ROT6_C = FX(c2-c6); // 0.541 |
125 |
#define FIX_2_053119869 ((int) 16819) /* FIX(2.053119869) */ |
static const int ROT6_SmC = FX(2*c6); // 0.765 |
126 |
#define FIX_2_562915447 ((int) 20995) /* FIX(2.562915447) */ |
static const int ROT6_SpC = FX(2*c2); // 1.847 |
127 |
#define FIX_3_072711026 ((int) 25172) /* FIX(3.072711026) */ |
|
128 |
|
static const int ROT17_C = FX(c1+c7); // 1.175 |
129 |
|
static const int ROT17_SmC = FX(2*c7); // 0.390 |
130 |
|
static const int ROT17_SpC = FX(2*c1); // 1.961 |
131 |
|
|
132 |
|
static const int ROT37_C = FX((c3-c7)/c4); // 0.899 |
133 |
|
static const int ROT37_SmC = FX(2*(c5+c7)); // 1.501 |
134 |
|
static const int ROT37_SpC = FX(2*(c1-c3)); // 0.298 |
135 |
|
|
136 |
|
static const int ROT13_C = FX((c1+c3)/c4); // 2.562 |
137 |
|
static const int ROT13_SmC = FX(2*(c3+c7)); // 2.053 |
138 |
|
static const int ROT13_SpC = FX(2*(c1+c5)); // 3.072 |
139 |
|
|
140 |
/* function pointer */ |
#endif |
|
fdctFuncPtr fdct; |
|
141 |
|
|
142 |
/* |
/* |
143 |
* Perform an integer forward DCT on one block of samples. |
////////////////////////////////////////////////////////// |
144 |
|
// Forward transform |
145 |
*/ |
*/ |
146 |
|
|
147 |
void |
void fdct_int32( short *const In ) |
|
fdct_int32(short *const block) |
|
148 |
{ |
{ |
149 |
int tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7; |
short *pIn; |
|
int tmp10, tmp11, tmp12, tmp13; |
|
|
int z1, z2, z3, z4, z5; |
|
|
short *blkptr; |
|
|
int *dataptr; |
|
|
int data[64]; |
|
150 |
int i; |
int i; |
151 |
|
|
152 |
/* Pass 1: process rows. */ |
pIn = In; |
153 |
/* Note results are scaled up by sqrt(8) compared to a true DCT; */ |
for(i=8; i>0; --i) |
154 |
/* furthermore, we scale the results by 2**PASS1_BITS. */ |
{ |
155 |
|
int mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7, Spill; |
|
dataptr = data; |
|
|
blkptr = block; |
|
|
for (i = 0; i < 8; i++) { |
|
|
tmp0 = blkptr[0] + blkptr[7]; |
|
|
tmp7 = blkptr[0] - blkptr[7]; |
|
|
tmp1 = blkptr[1] + blkptr[6]; |
|
|
tmp6 = blkptr[1] - blkptr[6]; |
|
|
tmp2 = blkptr[2] + blkptr[5]; |
|
|
tmp5 = blkptr[2] - blkptr[5]; |
|
|
tmp3 = blkptr[3] + blkptr[4]; |
|
|
tmp4 = blkptr[3] - blkptr[4]; |
|
156 |
|
|
157 |
/* Even part per LL&M figure 1 --- note that published figure is faulty; |
// even |
|
* rotator "sqrt(2)*c1" should be "sqrt(2)*c6". |
|
|
*/ |
|
158 |
|
|
159 |
tmp10 = tmp0 + tmp3; |
LOAD_BUTF(mm1,mm6, 1, 6, mm0, pIn); |
160 |
tmp13 = tmp0 - tmp3; |
LOAD_BUTF(mm2,mm5, 2, 5, mm0, pIn); |
161 |
tmp11 = tmp1 + tmp2; |
LOAD_BUTF(mm3,mm4, 3, 4, mm0, pIn); |
162 |
tmp12 = tmp1 - tmp2; |
LOAD_BUTF(mm0,mm7, 0, 7, Spill, pIn); |
|
|
|
|
dataptr[0] = (tmp10 + tmp11) << PASS1_BITS; |
|
|
dataptr[4] = (tmp10 - tmp11) << PASS1_BITS; |
|
|
|
|
|
z1 = (tmp12 + tmp13) * FIX_0_541196100; |
|
|
dataptr[2] = |
|
|
DESCALE(z1 + tmp13 * FIX_0_765366865, CONST_BITS - PASS1_BITS); |
|
|
dataptr[6] = |
|
|
DESCALE(z1 + tmp12 * (-FIX_1_847759065), CONST_BITS - PASS1_BITS); |
|
|
|
|
|
/* Odd part per figure 8 --- note paper omits factor of sqrt(2). |
|
|
* cK represents cos(K*pi/16). |
|
|
* i0..i3 in the paper are tmp4..tmp7 here. |
|
|
*/ |
|
163 |
|
|
164 |
z1 = tmp4 + tmp7; |
BUTF(mm1, mm2, Spill); |
165 |
z2 = tmp5 + tmp6; |
BUTF(mm0, mm3, Spill); |
|
z3 = tmp4 + tmp6; |
|
|
z4 = tmp5 + tmp7; |
|
|
z5 = (z3 + z4) * FIX_1_175875602; /* sqrt(2) * c3 */ |
|
|
|
|
|
tmp4 *= FIX_0_298631336; /* sqrt(2) * (-c1+c3+c5-c7) */ |
|
|
tmp5 *= FIX_2_053119869; /* sqrt(2) * ( c1+c3-c5+c7) */ |
|
|
tmp6 *= FIX_3_072711026; /* sqrt(2) * ( c1+c3+c5-c7) */ |
|
|
tmp7 *= FIX_1_501321110; /* sqrt(2) * ( c1+c3-c5-c7) */ |
|
|
z1 *= -FIX_0_899976223; /* sqrt(2) * (c7-c3) */ |
|
|
z2 *= -FIX_2_562915447; /* sqrt(2) * (-c1-c3) */ |
|
|
z3 *= -FIX_1_961570560; /* sqrt(2) * (-c3-c5) */ |
|
|
z4 *= -FIX_0_390180644; /* sqrt(2) * (c5-c3) */ |
|
|
|
|
|
z3 += z5; |
|
|
z4 += z5; |
|
|
|
|
|
dataptr[7] = DESCALE(tmp4 + z1 + z3, CONST_BITS - PASS1_BITS); |
|
|
dataptr[5] = DESCALE(tmp5 + z2 + z4, CONST_BITS - PASS1_BITS); |
|
|
dataptr[3] = DESCALE(tmp6 + z2 + z3, CONST_BITS - PASS1_BITS); |
|
|
dataptr[1] = DESCALE(tmp7 + z1 + z4, CONST_BITS - PASS1_BITS); |
|
166 |
|
|
167 |
dataptr += 8; /* advance pointer to next row */ |
ROTATE(mm3, mm2, ROT6_C, ROT6_SmC, -ROT6_SpC, Spill, FIX-FPASS, HALF(FIX-FPASS)); |
168 |
blkptr += 8; |
pIn[2] = mm3; |
169 |
} |
pIn[6] = mm2; |
170 |
|
|
171 |
/* Pass 2: process columns. |
BUTF(mm0, mm1, Spill); |
172 |
* We remove the PASS1_BITS scaling, but leave the results scaled up |
pIn[0] = SHIFTL(mm0, FPASS); |
173 |
* by an overall factor of 8. |
pIn[4] = SHIFTL(mm1, FPASS); |
|
*/ |
|
174 |
|
|
|
dataptr = data; |
|
|
for (i = 0; i < 8; i++) { |
|
|
tmp0 = dataptr[0] + dataptr[56]; |
|
|
tmp7 = dataptr[0] - dataptr[56]; |
|
|
tmp1 = dataptr[8] + dataptr[48]; |
|
|
tmp6 = dataptr[8] - dataptr[48]; |
|
|
tmp2 = dataptr[16] + dataptr[40]; |
|
|
tmp5 = dataptr[16] - dataptr[40]; |
|
|
tmp3 = dataptr[24] + dataptr[32]; |
|
|
tmp4 = dataptr[24] - dataptr[32]; |
|
175 |
|
|
176 |
/* Even part per LL&M figure 1 --- note that published figure is faulty; |
// odd |
|
* rotator "sqrt(2)*c1" should be "sqrt(2)*c6". |
|
|
*/ |
|
177 |
|
|
178 |
tmp10 = tmp0 + tmp3; |
mm3 = mm5 + mm7; |
179 |
tmp13 = tmp0 - tmp3; |
mm2 = mm4 + mm6; |
180 |
tmp11 = tmp1 + tmp2; |
ROTATE(mm2, mm3, ROT17_C, -ROT17_SpC, -ROT17_SmC, mm0, FIX-FPASS, HALF(FIX-FPASS)); |
181 |
tmp12 = tmp1 - tmp2; |
ROTATE(mm4, mm7, -ROT37_C, ROT37_SpC, ROT37_SmC, mm0, FIX-FPASS, HALF(FIX-FPASS)); |
182 |
|
mm7 += mm3; |
183 |
dataptr[0] = DESCALE(tmp10 + tmp11, PASS1_BITS); |
mm4 += mm2; |
184 |
dataptr[32] = DESCALE(tmp10 - tmp11, PASS1_BITS); |
pIn[1] = mm7; |
185 |
|
pIn[7] = mm4; |
|
z1 = (tmp12 + tmp13) * FIX_0_541196100; |
|
|
dataptr[16] = |
|
|
DESCALE(z1 + tmp13 * FIX_0_765366865, CONST_BITS + PASS1_BITS); |
|
|
dataptr[48] = |
|
|
DESCALE(z1 + tmp12 * (-FIX_1_847759065), CONST_BITS + PASS1_BITS); |
|
|
|
|
|
/* Odd part per figure 8 --- note paper omits factor of sqrt(2). |
|
|
* cK represents cos(K*pi/16). |
|
|
* i0..i3 in the paper are tmp4..tmp7 here. |
|
|
*/ |
|
186 |
|
|
187 |
z1 = tmp4 + tmp7; |
ROTATE(mm5, mm6, -ROT13_C, ROT13_SmC, ROT13_SpC, mm0, FIX-FPASS, HALF(FIX-FPASS)); |
188 |
z2 = tmp5 + tmp6; |
mm5 += mm3; |
189 |
z3 = tmp4 + tmp6; |
mm6 += mm2; |
190 |
z4 = tmp5 + tmp7; |
pIn[3] = mm6; |
191 |
z5 = (z3 + z4) * FIX_1_175875602; /* sqrt(2) * c3 */ |
pIn[5] = mm5; |
|
|
|
|
tmp4 *= FIX_0_298631336; /* sqrt(2) * (-c1+c3+c5-c7) */ |
|
|
tmp5 *= FIX_2_053119869; /* sqrt(2) * ( c1+c3-c5+c7) */ |
|
|
tmp6 *= FIX_3_072711026; /* sqrt(2) * ( c1+c3+c5-c7) */ |
|
|
tmp7 *= FIX_1_501321110; /* sqrt(2) * ( c1+c3-c5-c7) */ |
|
|
z1 *= -FIX_0_899976223; /* sqrt(2) * (c7-c3) */ |
|
|
z2 *= -FIX_2_562915447; /* sqrt(2) * (-c1-c3) */ |
|
|
z3 *= -FIX_1_961570560; /* sqrt(2) * (-c3-c5) */ |
|
|
z4 *= -FIX_0_390180644; /* sqrt(2) * (c5-c3) */ |
|
|
|
|
|
z3 += z5; |
|
|
z4 += z5; |
|
|
|
|
|
dataptr[56] = DESCALE(tmp4 + z1 + z3, CONST_BITS + PASS1_BITS); |
|
|
dataptr[40] = DESCALE(tmp5 + z2 + z4, CONST_BITS + PASS1_BITS); |
|
|
dataptr[24] = DESCALE(tmp6 + z2 + z3, CONST_BITS + PASS1_BITS); |
|
|
dataptr[8] = DESCALE(tmp7 + z1 + z4, CONST_BITS + PASS1_BITS); |
|
192 |
|
|
193 |
dataptr++; /* advance pointer to next column */ |
pIn += 8; |
194 |
} |
} |
195 |
/* descale */ |
|
196 |
for (i = 0; i < 64; i++) |
pIn = In; |
197 |
block[i] = (short int) DESCALE(data[i], 3); |
for(i=8; i>0; --i) |
198 |
|
{ |
199 |
|
int mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7, Spill; |
200 |
|
|
201 |
|
// even |
202 |
|
|
203 |
|
LOAD_BUTF(mm1,mm6, 1*8, 6*8, mm0, pIn); |
204 |
|
LOAD_BUTF(mm2,mm5, 2*8, 5*8, mm0, pIn); |
205 |
|
BUTF(mm1, mm2, mm0); |
206 |
|
|
207 |
|
LOAD_BUTF(mm3,mm4, 3*8, 4*8, mm0, pIn); |
208 |
|
LOAD_BUTF(mm0,mm7, 0*8, 7*8, Spill, pIn); |
209 |
|
BUTF(mm0, mm3, Spill); |
210 |
|
|
211 |
|
ROTATE(mm3, mm2, ROT6_C, ROT6_SmC, -ROT6_SpC, Spill, 0, HALF(FIX+FPASS+3)); |
212 |
|
pIn[2*8] = (int16_t)SHIFTR(mm3,FIX+FPASS+3); |
213 |
|
pIn[6*8] = (int16_t)SHIFTR(mm2,FIX+FPASS+3); |
214 |
|
|
215 |
|
mm0 += HALF(FPASS+3) - 1; |
216 |
|
BUTF(mm0, mm1, Spill); |
217 |
|
pIn[0*8] = (int16_t)SHIFTR(mm0, FPASS+3); |
218 |
|
pIn[4*8] = (int16_t)SHIFTR(mm1, FPASS+3); |
219 |
|
|
220 |
|
// odd |
221 |
|
|
222 |
|
mm3 = mm5 + mm7; |
223 |
|
mm2 = mm4 + mm6; |
224 |
|
|
225 |
|
ROTATE(mm2, mm3, ROT17_C, -ROT17_SpC, -ROT17_SmC, mm0, 0, HALF(FIX+FPASS+3)); |
226 |
|
ROTATE2(mm4, mm7, -ROT37_C, ROT37_SpC, ROT37_SmC, mm0); |
227 |
|
mm7 += mm3; |
228 |
|
mm4 += mm2; |
229 |
|
pIn[7*8] = (int16_t)SHIFTR(mm4,FIX+FPASS+3); |
230 |
|
pIn[1*8] = (int16_t)SHIFTR(mm7,FIX+FPASS+3); |
231 |
|
|
232 |
|
ROTATE2(mm5, mm6, -ROT13_C, ROT13_SmC, ROT13_SpC, mm0); |
233 |
|
mm5 += mm3; |
234 |
|
mm6 += mm2; |
235 |
|
pIn[5*8] = (int16_t)SHIFTR(mm5,FIX+FPASS+3); |
236 |
|
pIn[3*8] = (int16_t)SHIFTR(mm6,FIX+FPASS+3); |
237 |
|
|
238 |
|
pIn++; |
239 |
} |
} |
240 |
|
} |
241 |
|
#undef FIX |
242 |
|
#undef FPASS |
243 |
|
#undef IPASS |
244 |
|
|
245 |
|
#undef BUTF |
246 |
|
#undef LOAD_BUTF |
247 |
|
#undef ROTATE |
248 |
|
#undef ROTATE2 |
249 |
|
#undef SHIFTL |
250 |
|
#undef SHIFTR |
251 |
|
|
252 |
|
////////////////////////////////////////////////////////// |
253 |
|
// - Data flow schematics for FDCT - |
254 |
|
// Output is scaled by 2.sqrt(2) |
255 |
|
// Initial butterflies (in0/in7, etc.) are not fully depicted. |
256 |
|
// Note: Rot6 coeffs are multiplied by sqrt(2). |
257 |
|
////////////////////////////////////////////////////////// |
258 |
|
/* |
259 |
|
<---------Stage1 =even part=-----------> |
260 |
|
|
261 |
|
in3 mm3 +_____.___-___________.____* out6 |
262 |
|
x \ / | |
263 |
|
in4 mm4 \ | |
264 |
|
/ \ | |
265 |
|
in0 mm0 +_____o___+__.___-___ | ___* out4 |
266 |
|
x \ / | |
267 |
|
in7 mm7 \ (Rot6) |
268 |
|
/ \ | |
269 |
|
in1 mm1 +_____o___+__o___+___ | ___* out0 |
270 |
|
x \ / | |
271 |
|
in6 mm6 / | |
272 |
|
/ \ | |
273 |
|
in2 mm2 +_____.___-___________o____* out2 |
274 |
|
x |
275 |
|
in5 mm5 |
276 |
|
|
277 |
|
<---------Stage2 =odd part=----------------> |
278 |
|
|
279 |
|
mm7*___._________.___-___[xSqrt2]___* out3 |
280 |
|
| \ / |
281 |
|
(Rot3) \ |
282 |
|
| / \ |
283 |
|
mm5*__ | ___o____o___+___.___-______* out7 |
284 |
|
| | \ / |
285 |
|
| (Rot1) \ |
286 |
|
| | / \ |
287 |
|
mm6*__ |____.____o___+___o___+______* out1 |
288 |
|
| \ / |
289 |
|
| / |
290 |
|
| / \ |
291 |
|
mm4*___o_________.___-___[xSqrt2]___* out5 |
292 |
|
|
293 |
|
|
294 |
|
|
295 |
|
Alternative schematics for stage 2: |
296 |
|
----------------------------------- |
297 |
|
|
298 |
|
mm7 *___[xSqrt2]____o___+____o_______* out1 |
299 |
|
\ / | |
300 |
|
/ (Rot1) |
301 |
|
/ \ | |
302 |
|
mm6 *____o___+______.___-___ | __.___* out5 |
303 |
|
\ / | | |
304 |
|
/ | | |
305 |
|
/ \ | | |
306 |
|
mm5 *____.___-______.___-____.__ | __* out7 |
307 |
|
\ / | |
308 |
|
\ (Rot3) |
309 |
|
/ \ | |
310 |
|
mm4 *___[xSqrt2]____o___+________o___* out3 |
311 |
|
|
312 |
|
*/ |