+--------------------------------------------------------------------+ Short explanation for the XviD data strutures and routines The encoding part If you have further questions, visit http://www.xvid.org +--------------------------------------------------------------------+ Document version : $Id: xvid-encoder.txt,v 1.3 2002-06-27 14:49:05 edgomez Exp $ +--------------------------------------------------------------------+ | Abstract +--------------------------------------------------------------------+ This document presents the basic structures and API of XviD. It tries to explain how to use them to obtain a simple profile compliant MPEG4 stream feeding the encoder with a sequence of frames. +-------------------------------------------------------------------+ | Document +-------------------------------------------------------------------+ Chapter 1 : The XviD version +-----------------------------------------------------------------+ The Xvid version is defined at library compilation time using the constant defined in xvid.h #define API_VERSION ((2 << 16) | (1)) Where 2 stands for the major XviD version, and 1 for the minor version number. The current version of the API is 2.1 and should be incremented each time a user defined structure is modified (XVID_INIT_PARAM, XVID_ENC_PARAM ... we will discuss about them later). When you're writing a program/library which uses the XviD library, you must check your XviD API version against the available library version. We will see how to check the version number in the next chapter. Chapter 2 : The XVID_INIT_PARAM +-----------------------------------------------------------------+ typedef struct { int cpu_flags; [in/out] int api_version; [out] int core_build; [out] } XVID_INIT_PARAM; Used in: xvid_init(NULL, 0, &xinit, NULL); This tructure is used and filled by the xvid_init function depending on the cpu_flags value. List of valid flags for the cpu_flags member : - XVID_CPU_MMX : cpu feature - XVID_CPU_MMXEXT : cpu feature - XVID_CPU_SSE : cpu feature - XVID_CPU_SSE2 : cpu feature - XVID_CPU_3DNOW : cpu feature - XVID_CPU_3DNOWEXT : cpu feature - XVID_CPU_TSC : cpu feature - XVID_CPU_IA64 : cpu feature - XVID_CPU_CHKONLY : command - XVID_CPU_FORCE : command In order to set a flag : xinit.cpu_flags |= desired_flag_constant; 1st case : you call xvid_init without setting the XVID_CPU_CHKONLY or the XVID_CPU_FORCE flag, the xvid_init function detects auto magically the host cpu features and fills the cpu_flags member. The xvid_init function also performs all internal function pointers initialization according to deteced features and then returns XVID_ERR_OK. 2nd case : you call xvid_init setting the XVID_CPU_CHKONLY flag, the xvid_init function will just detect the host cpu features and return XVID_ERR_OK without initializing the internal function pointers (NB: The XviD library is not usable after such a call to xvid_init). 3rd case : you call xvid_init with the cpu_flags XVID_CPU_FORCE and desired feature flags set up (eg : XVID_CPU_SSE | XVID_CPU_MMX). In this case you force XviD to use the given cpu features passed in the cpu_flags member. Use this if you know what you're doing. NB for PowerPC archs : the ppc arch has not automatic detection, the library must be compiled for a specific ppc target using the right Makefile (the cpu_flags is irrevelevant for these archs). Use Makefile.linuxppc for standard ppc optimized functions and Makefile.linuxppc_altivec for altivec simd optimized functions. NB for IA64 archs : There's optimized ia64 assembly functions provided in the library, they must be forced using the XVID_CPU_FORCE|XVID_CPU_IA64 pair of flags. To check the XviD library version against your own XviD header file, you have just to call the xvid_init function (no matter the cpu_flags) and compare the returnded xinit.api_version integer with your API_VERSION number. The core_build build member is not relevant at the moment but is reserved for future use (when XviD would have reached a certain stability in its API and releases). Chapter 3 : XVID_ENC_PARAM structure +-----------------------------------------------------------------+ typedef struct { int width, height; [in] int fincr, fbase; [in] int rc_bitrate; [in] int rc_reaction_delay_factor; [in] int rc_averaging_period; [in] int rc_buffer; [in] int max_quantizer; [in] int min_quantizer; [in] int max_key_interval; [in] void *handle; [out] } XVID_ENC_PARAM; Used in: xerr = xvid_encore(NULL, XVID_ENC_CREATE, &xparam, NULL); This structure has to be filled to create a new encoding instance: - width and height. They have to be set to the size of the image to be encoded. - fincr and fbase (<0 forces default value 25fps - [25,1]). They are the MPEG-way of defining the framerate. If you have an integer framerate, say 24, 25 or 30fps, use fincr=1, fbase=framerate. However, if framerate is non-integer, like 23.996fps you can e.g. multiply with 1000, getting fincr=1000 and fbase=23996, giving you integer values again. - rc_bitrate (<0 forces default value : 900000). This the desired target bitrate. XviD will try to do its best to respect this setting but keep in mind XviD is still in development and it has not been tuned for very low bitrates. - Any other rc_xxxx parameter are for the bit rate controler in order to respect your rc_bitrate setting the best it can. (<0 forces default values) Default's are good enough and you should not change them. ToDo : describe briefly their impact on the bit rate variations and the rc_bitrate setting respect. - min_quantizer and max_quantizer (<0 forces default values : 1,31). These 2 memebers limit the range of allowed quantizers. Normally, quantizer's range is [1..31], so min=1 and max=31. NB : the HIGHER the quantizer, the LOWER the quality. the HIGHER the quantizer, the HIGHER the compression ratio. min_quant=1 is somewhat overkill, min_quant=2 is good enough max_quant depends on what you encode, leave it with 31 or lower it to something like 15 or 10 for better quality (but encoding with very low bitrate might fail then). - max_key_interval (<0 forces default value : 10*framerate == 10s) This is the maximum value of frames between two keyframes (I-frames). Keyframes are also inserted dynamically at scene breaks. It is important to have some keyframes, even in longer scenes, if you want to skip position in the resulting file, because skipping is only possible from one keyframe to the next. However, keyframes are much larger than non-keyframes, so do not use too many of them. A value of framerate*10 is a good choice normally. - handle This is the returned internal encoder instance. Chapter 4 : the XVID_ENC_FRAME structure. +-----------------------------------------------------------------+ typedef struct { int general; [in] int motion; [in] void *bitstream; [in] int length; [out] void *image; [in] int colorspace; [in] unsigned char *quant_intra_matrix; [in] unsigned char *quant_inter_matrix; [in] int quant; [in] int intra; [in/out] HINTINFO hint; [in/out] } XVID_ENC_FRAME; Used in: xerr = xvid_encore(enchandle, XVID_ENC_ENCODE, &xframe, &xstats); This is the main structure to encode a frame, it gives hints to the encoder on how to process an image. - general flag member. The general flag member informs XviD on general algorithm choices made by the library client. Valid flags : - XVID_CUSTOM_QMATRIX : informs xvid to use the custom user matrices. - XVID_H263QUANT : informs xvid to use H263 quantization algorithm. - XVID_MPEGQUANT : informs xvid to use MPEG quantization algorithm. - XVID_HALFPEL : informs xvid to perform a half pixel motion estimation. - XVID_ADAPTIVEQUANT : informs xvid to perform an adaptative quantization. - XVID_LUMIMASKING : infroms xvid to use a lumimasking algorithm. - XVID_LATEINTRA : ??? - XVID_INTERLACING : informs xvid to use the MPEG4 interlaced mode. - XVID_TOPFIELDFIRST : ??? - XVID_ALTERNATESCAN : ??? - XVID_HINTEDME_GET : informs xvid to return Motion Estimation vectors from the ME encoder algorithm. Used during a first pass. - XVID_HINTEDME_SET : informs xvid to use the user given motion estimation vectors as hints for the encoder ME algorithms. Used during a 2nd pass. - XVID_INTER4V : forces XviD to search a vector for each 8x8 block within the 16x16 Macro Block. This mode should be used only if the XVID_HALFPEL mode is activated (this could change in the future). - XVID_ME_ZERO : forces XviD to use the zero ME algorithm. - XVID_ME_LOGARITHMIC : forces XviD to use the logarithmic ME algorithm. - XVID_ME_FULLSEARCH : forces XviD to use the full search ME algorithm. - XVID_ME_PMVFAST : forces XviD to use the PMVFAST ME algorithm. - XVID_ME_EPZS : forces XviD to use the EPZS ME algorithm. ToDo : fill the void entries in flags, and describe briefly each ME algorithm. - motion member. Valid flags for 16x16 motion estimation (no XVID_INTER4V flag in the general flag). - PMV_ADVANCEDDIAMOND16 : XviD has a modified diamond algorithm that performs a bit faster than the original one. Use this flag if you want to use the speed optimized diamond serach. The quality loss is not big (better quality than square search but less than the normal diamond search). - PMV_HALFPELDIAMOND16 : switches the search algorithm from 1 or 2 full pixels precision to 1 or 2 half pixel precision. - PMV_HALFPELREFINE16 : After normal diamond search, an extra halfpel refinement step is performed. Should always be used if XVID_HALFPEL is on, because it gives a rather big increase in quality. - PMV_EXTSEARCH16 : Normal PMVfast predicts one start vector and does diamond search around this position. EXTSEARCH means that 2 more start vectors are used: (0,0) and median predictor and diamond search is done for those, too. Makes search slightly slower, but quality sometimes gets better. - PMV_EARLYSTOP16 : PMVfast and EPZS stop search if current best is below some dynamic threshhold. No diamond search is done, only halfpel refinement (if active). Without EARLYSTOP diamond search is always done. That would be much slower, but not really lead to better quality. - PMV_QUICKSTOP16 : like EARLYSTOP, but not even halfpel refinement is done. Normally worse quality, so it defaults to off. Might be removed, too. - PMV_UNRESTRICTED16 : "unrestricted ME" is a feature of MPEG4. It's not implemented, so this flag is ignored (not even checked). - PMV_OVERLAPPING16 : same as unrestricted. Not implemented, nor checked. - PMV_USESQUARES16 : Replace the diamond search with a square search. Valid flags when using 4 vectors mode prediction. They have the same meaning as their 16x16 counter part so we only give the list : - PMV_ADVANCEDDIAMOND8 - PMV_HALFPELDIAMOND8 - PMV_HALFPELREFINE8 - PMV_EXTSEARCH8 - PMV_EARLYSTOP8 - PMV_QUICKSTOP8 - PMV_UNRESTRICTED8 - PMV_OVERLAPPING8 - PMV_USESQUARES8 - quant member. The quantizer value is used when the DCT coefficients are divided to zero those coefficients not important (according to the target bitrate not the image quality :-) Valid values : - 0 (zero) : Then the rate controler chooses the right quantizer for you. Tipically used in ABR encoding or first pass of a VBR encoding session. - != 0 : Then you force the encoder to use this specific quantizer value. It is clamped in the interval [1..31]. Tipically used during the 2nd pass of a VBR encoding session. - intra member. [in usage] The intra value decides wether the frame is going to be a keyframe or not. Valid values : - 1 : forces the encoder to create a keyframe. Mainly used during a VBR 2nd pass. - 0 : forces the encoder not to create a keyframe. Minaly used during a VBR second pass - -1 : let the encoder decide (based on contents and max_key_interval). Mainly used in ABR mode and dunring a 1st VBR pass. [out usage] When first set to -1, the encoder returns the effective keyframe state of the frame. - 0 : the resulting frame is not a keyframe - 1 : the resulting frame is a keyframe (scene change). - 2 : the resulting frame is a keyframe (max_keyframe interval reached) - quant_intra_matrix and quant_inter_matrix members. These are pointers to to a pair of user quantization matrices. You must set the general XVID_CUSTOM_QMATRIX flag to make sure XviD uses them. When set to NULL, the default XviD matrices are used. NB : each time the matrices change, XviD must write a header into the bitstream, so try not changing these matrices very often. This will save space. Chapter 5 : The XVID_ENC_STATS structure +-----------------------------------------------------------------+ typedef struct { int quant; // [out] frame quantizer int hlength; // [out] header length (bytes) int kblks, mblks, ublks; // [out] } XVID_ENC_STATS; Used in: xerr = xvid_encore(enchandle, XVID_ENC_ENCODE, &xframe, &xstats); In this structure the encoder return statistical data about the encoding process, e.g. to be saved for two-pass-encoding. quant is the quantizer chosen for this frame (if you let ratecontrol do it) hlength is the length of the frame's header, including motion information etc. kblks, mblks, ublks are unused at the moment. Chapter 6 : The xvid_encode function +-----------------------------------------------------------------+ int xvid_encore(void * handle, int opt, void * param1, void * param2); XviD uses a single-function API, so everything you want to do is done by this routine. The opt parameter chooses the behaviour of the routine: XVID_ENC_CREATE: create a new encoder, XVID_ENC_PARAM in param1, a handle to the new encoder is returned in handle. XVID_ENC_ENCODE: encode one frame, XVID_ENC_FRAME-structure in param1, XVID_ENC_STATS in param2 (or NULL, if you are not interested in statistical data). XVID_DEC_DESTROY: shut down this encoder, do not use handle afterwards.