<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>high.definition.x264.standards.revision.3.1.addendum.1.nfo</title> <style type="text/css"> @font-face { font-family: nfo; font-style: normal; font-weight: normal; src: url(nfo.eot); } .nfo { padding: 12px; font-family: nfo, courier new; font-size: 11px; line-height: 1em; } </style> </head> <body> <pre class="nfo">�� High.Definition.x264.Standards.Revision.3.1.Addendum.1-HDX�� �� As there lately have been several pres with the wrong number of reference frames�� we thought we'd explain the reasoning behind the rule set.�� �� On various x264 pages on the internet the following can be read about reference frames:�� �� "Selects the maximum number of reference frames that can be used. Referenced frames�� are frames that refer to other frames (eg. if both frames are similar). Having a high �� referenced frame will improve quality but slow up encoding. For typical content, �� a reference frame of 3 to 5 is recommended. For content with a lot of repetition �� (eg. animation), a reference frame of 8 to 10 can be used."�� �� So, higher reference frames means higher quality, why do we then enforce max 4 �� reference frames on high resolution video? It has to do with hardware players. �� The popcorn hour, twix, wd etc. all support Level 4.1 (L4.1) of the ITU-T h264 �� specification. All graphic cards that have DXVA also support L4.1. So let's see �� what L4.1 says about reference frames.�� �� In table 'A-1 Level limits' on page 283 (pdf 305) of the ITU-T specification it �� says that MaxDPB for L4.1 is 12288 KiB. MaxDPB is the Maximum Decoded Picture Buffer, �� which is the largest size allowed of the decoded picture buffer when decoding a video.�� By supporting L4.1, the hardware players must have at least 12 MiB of buffer for �� storing the decoded pictures. This means that a video that requires a buffer of �� 13 MiB is not guaranteed to work on one of these players.�� �� As 16 * 16 pixels macroblocks are used, all resolutions needs to be mod16, for ease of�� reading, the maths to make them mod16 is not included below.�� �� The DPB in KiB is calculated as follows:�� �� DPB = vertical resolution * horizontal resolution * 1.5 * reference frames / 1024�� �� If we transform this formula to get the reference frames instead we get:�� �� ref = 12288 * 1024 / (vertical resolution * horizontal resolution * 1.5)�� �� We of course can't use partial frames for referencing and thus the reference frames �� should be rounded down to the closest integer. We can also transform this to get the �� maximum vertical resolution for a specific reference frames value, here we need to�� round the vertical resolution down to mod16:�� �� vertical res = 12288 * 1024 / (horizontal resolution * 1.5 * reference frames)�� �� With the above formula we can conclude that the highest vertical resolution that we �� can have ref 5 on and still be L4.1 compliant is 864 pixels.�� �� 873.813333 = 12288 * 1024 / ( 1920 * 1.5 * 5 )�� 864 = floor( 873.813333 / 16 ) * 16�� �� The 1.5 in the calculations above is the YV12 colourspace, it needs 12 bits to store �� 1 pixel. In other words, 1.5 bytes per pixel.�� �� So, to conclude this, the reason we put ref 4 as max for movies with vertical resolution�� greater than 864 in rules is not because we want to be able to encode releases faster. �� It's because we want releases to be L4.1 compliant and thus possible to play on the �� popcorn hour, twix and other hardware players. And we require at least ref 5 on all �� videos where it's possible while still respecting L4.1, this to ensure high quality.�� �� ITU-T specification:�� http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.264-200711-I!!PDF-E&type=items�� �� 12 bits per pixel for YV12:�� http://msdn.microsoft.com/en-us/library/aa904813.aspx#yuvformats_420formats_12bitsperpixel</pre> </body> </html>