Multimedia – Interact^Influence

Building WebRTC for Android

ENV
Ubuntu

入门以及下载源码
https://webrtc.org/native-code/development/
https://webrtc.org/native-code/android/

gclient config --name=src https://chromium.googlesource.com/external/webrtc.git
echo "target_os = ['android']" >> .gclient
gclient sync --force

gclient runhooks --force

查看支持的参数列表

gn args --list out/Debug

设置参数

gn gen out/Debug --args='target_os="android" rtc_include_tests=false enable_nocompile_tests=true libyuv_include_tests=false'

开始编译

ninja -C out/Debug 或者 ninja -C out/Release

内存不够的时候就用 -j1 或者 -j2

需要使用项目自带的一些工具的时候需要执行

source ./build/android/envsetup.sh

可能出现的问题

guohai@ubuntu:/home/guohai/WebRTC/src$ ninja -C out/Debug
ninja: Entering directory `out/Debug'
[4/3003] ACTION //base:android_runtime_jni_headers__jni_Runtime(//build/toolchain/android:android_clang_arm)
FAILED: gen/base/android_runtime_jni_headers/base/jni/Runtime_jni.h 
python ../../base/android/jni_generator/jni_generator.py --jar_file ../../third_party/android_tools/sdk/platforms/android-28/android.jar --input_file java/lang/Runtime.class --ptr_type=long --output_dir gen/base/android_runtime_jni_headers/base/jni --includes ../../../../../../../base/android/jni_generator/jni_generator_helper.h
Traceback (most recent call last):
  File "../../base/android/jni_generator/jni_generator.py", line 1405, in <module>
    sys.exit(main(sys.argv))
  File "../../base/android/jni_generator/jni_generator.py", line 1401, in main
    GenerateJNIHeader(input_file, output_file, options)
  File "../../base/android/jni_generator/jni_generator.py", line 1308, in GenerateJNIHeader
    jni_from_javap = JNIFromJavaP.CreateFromClass(input_file, options)
  File "../../base/android/jni_generator/jni_generator.py", line 773, in CreateFromClass
    stderr=subprocess.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Java 环境变量没有设置好，这里需要 javap 命令

/home/guohai/WebRTC/src/third_party/android_tools/sdk//build-tools/22.0.0/aapt: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory
sudo apt-get install lib32z1

H.264预测之帧间预测

这是一篇阅读笔记，直接点击图片可以查看清晰大图。

参考资料
The H.264 Advanced Video Compression Standard, Second Edition，以下简称THAVCS
Information technology – Coding of audio-visual objects – Part 10: Advanced Video Coding，以下简称SPEC

CodecVisa
JM
foreman_part_qcif.264 这个是foreman_part_qcif.yuv通过JM 8.6转换来的

http://blog.csdn.net/stpeace/article/details/8115392

帧间预测就是以已经编码好的帧(在display order上可以是当前帧的前面也可以是后面)作为参考帧，确定预测区域，生成预测块，计算出殘差，这些参考帧都存放于Decoded Picture Buffer当中。与帧内预测不同的是，帧间预测是以重建的帧为预测帧的，而帧内预测是一Loop filter之前的帧为预测帧。当前区块和预测区域之间的位移为运动向量(Motion Vector，简写为MV)，每个区块有各自的MV，并且MV可以是整数精度，二分之一精度或者四分之一精度(对于4:2:0的视频，C是八分之一精度)，这种非整数精度的预测区域都是通过插值算法从参考帧当中计算出来的。

注意，在实际当中MV的单位都是以最小的精度为单位的，比如Y分量的MV单位就是四分之一精度，C分量的MV单位就是八分之一精度，当然这里说的都是4:2:0的视频。

主要过程就是选取参考帧，插值，确定预测区块，确定预测类型，确定运动向量，预测运动向量，编码运动向量差量和殘差，deblocking filter。。。
这里插值计算需要注意的是，先计算二分之一，再计算四分之一，参见THAVCS 6.4.2.1 Generating interpolated sub-pixel samples。

我们来看个实例，先看MV值为整数的，也就是不用插值的。

跟帧内预测所用的码流一样，用CodecVisa打开，选取第二帧(这个码流一共三帧，分别为IPP)，选取第4行，第9列的那个MB。
看图inter-prediction-p-slice，

因为这里Y分量MV单位是四分之一精度，所以实际值除以4就是像素偏差。被高亮的块的值向左移动7个像素就刚刚和第一帧当中被高亮位置的值相等，这也就是说当前块是以第一帧所示区块为预测块的，注意帧间预测的参考帧都是重构出来的，都是看Final值，和帧内预测看Pre-LP值不一样，如图inter-prediction-reference-picture，

这是帧间预测的一个实例。

再来看看非整数精度的情况，MV为(-2.75, -2)，垂直方向上是整数，我们不用考虑，现在就看水平方向上。

如上两张图就分别是当前块和预测块，因为这个MV不是整数的，所以要先插值算出预测块，我们把有需要的数据提取出来，

187    185    187    195 A   199    201    200    200

187    184    191    198 B   199    201    199    200

188    183    174    169 C   183    202    198    200

189    185    166    130 D   132    172    199    202

如上数据就是参考帧重构后的数据，A，B，C和D就是我们要插值算出来的数据，也就是当前块的预测值。

先计算二分之一

Aa = round((1 * 185 - 5 * 187 + 20 * 195 + 20 * 199 - 5 * 201 + 1 * 200) / 32) = 198
Bb = round((1 * 184 - 5 * 191 + 20 * 198 + 20 * 199 - 5 * 201 + 1 * 199) / 32) = 199
Cc = round((1 * 183 - 5 * 174 + 20 * 169 + 20 * 183 - 5 * 202 + 1 * 198) / 32) = 173
Dd = round((1 * 185 - 5 * 166 + 20 * 130 + 20 * 132 - 5 * 172 + 1 * 199) / 32) = 123

然后四分之一

A = round((198 + 195) / 2) = 197
B = round((199 + 198) / 2) = 199
C = round((173 + 169) / 2) = 171
D = round((123 + 130) / 2) = 127

第二列的数据，方法同样
先计算二分之一

Aa + 1 = round((1 * 187 - 5 * 195 + 20 * 199 + 20 * 201 - 5 * 200 + 1 * 200) / 32) = 200
Bb + 1 = round((1 * 191 - 5 * 198 + 20 * 199 + 20 * 201 - 5 * 199 + 1 * 200) / 32) = 200
Cc + 1 = round((1 * 174 - 5 * 169 + 20 * 183 + 20 * 202 - 5 * 198 + 1 * 200) / 32) = 195
Dd + 1 = round((1 * 166 - 5 * 130 + 20 * 132 + 20 * 172 - 5 * 199 + 1 * 202) / 32) = 150

然后四分之一

A = round((200 + 199) / 2) = 200
B = round((200 + 199) / 2) = 200
C = round((195 + 183) / 2) = 189
D = round((150 + 132) / 2) = 141

后面的就不再罗列了，从结果来看，我们这里预测的结果

197    200    ....
199    200    ....
171    189    ....
127    141    ....

和CodecVisa有些许出入，但是变化趋势相同的，所以这个实验的基本目的达到了。

H.264预测之帧内预测

这是一篇阅读笔记，直接点击图片可以查看清晰大图。

参考资料
The H.264 Advanced Video Compression Standard, Second Edition，以下简称THAVCS
Information technology – Coding of audio-visual objects – Part 10: Advanced Video Coding，以下简称SPEC

CodecVisa
JM
foreman_part_qcif.264 这个是foreman_part_qcif.yuv通过JM 8.6转换来的

http://blog.csdn.net/stpeace/article/details/8114826

简单的理解帧内预测就是利用该帧里面已经编码和重构好的块来编码数据，具体做法就是用当前块减去预测块，得到的数据再编码

比如:

编码好的块 B
预测块 P
当前块 C

Delta = C - P
P是根据B预测来的，Delta是最终进行编码的数据，也就是我们经常说的Residual

对于亮度分量，P一般是4 x 4的块或者16 x 16的宏块(8 x 8的只在high profile当中出现)，对于细节比较丰富的地方使用4 x 4的块，对于比较平坦的地方使用16 x 16的块的(这样做的原因就是通常块分的越细就需要越多的位来存放这些块自身的信息，但预测的殘差会比较小；块分的越大，存放块本身信息所占用比较少的位，预测的时候殘差就会占用更多的位，所以这是一个权衡)。并且我们一般记作4 × 4 Luma Prediction和16 × 16 Luma Prediction。
4 × 4 Luma Prediction有9种预测模式，参见THAVCS书的Figure 6.10和Figure 6.11，名字就不一一列举了，这里有截图。

那这么多预测模式我们到底使用哪一个是如何决定的呢？这里就涉及到一个SAE(Sum of Absolute Errors)，这个SAE表示预测的错误或者偏差，明显SAE越大越不准确，所以这里我们当然选择SAE最小的那个，至于SAE是如何算出来的，我们暂时不考虑。
16 × 16 Luma Prediction有4种预测模式，参见THAVCS书的Figure 6.13，截图如下，当然选用那种模式跟上面的方法一样，也是看SAE。

对于色度分量，P一般是8 x 8的块，记作8 × 8 Chroma Prediction。它也是有4种预测模式，跟16 × 16 Luma Prediction一样，只是模式的编号不一样，具体如下，DC(mode 0)， Horizontal(mode 1)，Vertical(mode 2)和Plane(mode 3)。

知道这些总体上的知识之后，我们来稍微深入看看具体的，这个时候就需要用到CodecVisa了。打开码流，可以看到MB(MacroBlock)信息如下，
从MB统计信息(h.264-foreman-1st-frame-mb-statistics)来看，该Picture是I-Slice，一共有99个I-MBs，94个I_NxN类型，2个I_16x16_0_0_0类型，1个I_16x16_2_0_0类型，1个I_16x16_3_1_0类型以及1个I_16x16_2_0_1类型，具体这些类型的含义可以参见SPEC的Table 7-11 – Macroblock types for I slices。

各个MB的分布情况如下，当然下图只能比较容易的区分出16 x 16的5个MB，

另外第一个MB的详细信息如下，

I_16x16_0_0_0表示预测模型是0(Vertical)，而且我们知道这是Y分量，我们可以对照CodecVisa验证。通过h.264-foreman-1st-frame-i_16x16_0_0_0-mb我们可以看见这个MB的预测值都是237，而且因为它的预测模型是Vertical，所以我们可以推理出，该MB上面对应的位置的值应该也都是237，通过CodecVisa查看对应位置的数据截图如h.264-foreman-1st-frame-i_16x16_0_0_0-mb-up-mb，可以看出确实是237，这与我们说的Vertical模式正好吻合。至于实际值是236，这当然预测的值和实际值肯定是有误差的，所以这个误差-1通过Residual记录下来。

下面是预测值

I_16x16_2_0_0表示预测模型是2(DC)，这也是Y分量，同理我们也可以验证。DC是Up的值和Left的值的均值，即((237 * 12 + 229 + 222 + 213 + 206) + (236 * 16)) / (16 * 2)，约等于234，这里取整的误差我们认为这些误差是可以接受的，通过CodecVisa我们也可以观察出来，预测值确实为234。
Up和Left的MB如下


预测值为

另外比如4 x 4的Horizontal Up/Down这些要额外插值计算的预测原理也是如此，只不过稍微复杂点。
这里我们来简单分析下，首先是MB的划分，拆分成8 x 8，里面再拆分成4 x 4，处理的顺序就是先对第1个8 x 8块里面的4个4 x 4块按照光栅顺序处理，然后对第2个8 x 8块进行处理，如下图所示，

那么对某个具体的4 x 4块预测的时候就好办了，比如我们以Horizontal Up为例子，参照SPEC 8.3.1.2.9 Specification of Intra_4x4_Horizontal_Up prediction mode，截图如下，

这就是我们预测的公式，当中p[-1, y]就是我们用来预测的像素，如果没有的话，是不能完成这种模式的预测的。这个预测公式也很容易看懂，x和y就是你要预测的4 x 4块里面每个像素的坐标，所以它们范围只能是0至3。
另外下图我画了比较形象的图来说明，p[-1, y]就代表这里的I，J，K和L，同样图形的坐标里面的值是一样的，图形里面数字一样的表示是用同一个公式计算出来的(公式一样，代入值不一样，所以结果值不同，另外这种内部标记的我没有画完全，只是画出来一个来说明这种情况，就是坐标为[1, 0]和[1, 1]的两个位置)，看图，

JM的代码也有比较直观的展示，

  case  HOR_UP_PRED:/* diagonal prediction -22.5 deg to horizontal plane */
    if (!block_available_left)
      printf ("warning: Intra_4x4_Horizontal_Up prediction mode not allowed at mb %d\n",img->current_mb_nr);

	// 一个MB会被拆分成16个4 x 4的块(当然先拆分成4个8 x 8的块)，一个4 x 4的块又会包含16个像素，就是这里

	// zHU = x + 2 * y; // x = 0..3, y = 0..3

    img->mpr[0+ioff][0+joff] = (P_I + P_J + 1) / 2; // (p[−1, y + (x >> 1)] + p[−1, y + (x >> 1) + 1] + 1) >> 1 代入 x = 0, y = 0
    img->mpr[1+ioff][0+joff] = (P_I + 2*P_J + P_K + 2) / 4; // (p[−1, y + (x >> 1)] + 2 * p[−1, y + (x >> 1) + 1] + p[−1, y + (x >> 1) + 2] + 2) >> 2 // 代入 x = 1, y = 0 注意right shift
    img->mpr[2+ioff][0+joff] = 
    img->mpr[0+ioff][1+joff] = (P_J + P_K + 1) / 2;
    img->mpr[3+ioff][0+joff] = 
    img->mpr[1+ioff][1+joff] = (P_J + 2*P_K + P_L + 2) / 4;
    img->mpr[2+ioff][1+joff] = 
    img->mpr[0+ioff][2+joff] = (P_K + P_L + 1) / 2;
    img->mpr[3+ioff][1+joff] = 
    img->mpr[1+ioff][2+joff] = (P_K + 2*P_L + P_L + 2) / 4;
    img->mpr[3+ioff][2+joff] = 
    img->mpr[1+ioff][3+joff] = 
    img->mpr[0+ioff][3+joff] = 
    img->mpr[2+ioff][2+joff] = 
    img->mpr[2+ioff][3+joff] = 
    img->mpr[3+ioff][3+joff] = P_L; // p[ −1, 3 ]
    break;

这就是Horizontal Up预测，应该就有了比较透彻的理解。这照相机拍的图在这里显示的orientation不对，直接点过去看大图吧，大图是正确的。

THAVCS当中关于Loop filter的描述。
A filter is applied to every decoded macroblock to reduce blocking distortion [iv]. The de-
blocking filter is applied after the inverse transform in the encoder before reconstructing and
storing the macroblock for future predictions and in the decoder before reconstructing and
displaying the macroblock.

Intra-coded macroblocks are filtered, but intra prediction (section 6.3) is carried out using unfiltered reconstructed
macroblocks to form the prediction.

CodecVisa软件中Pre-LP表示Loop filter动作之前的数据，IDCT Coefficient表示Inverse DCT Coefficient。

在JM的ldecod和lencod当中会有与预测相关的代码，详见block.c当中名字以intrapred打头的方法，各种预测模式的算法都有。

H.264标记贴

这里标记/存放一些阅读到的资料
Exploring H.264. Part 1: Color models
Exploring H.264. Part 2: H.264 Bitstream format

WORLD’S SMALLEST H.264 ENCODER

一步一步解析H.264码流的NALU(SPS,PSS,IDR)

之前有说过H.264比较复杂，我是打算一块一块来分析。这里是学习笔记以及自己的理解，很多可能是摘抄的相关资料，感谢原始作者！理解错误的地方也希望大家指出。
注意：目前处于草稿状态，不定期更新！

在了解了H.264编码的基础介绍之后(Overview of the H.264/AVC Video Coding Standard，H.264-MPEG-4 Part 10 White Paper，ISO_IEC_14496-10_2012，新一代视频压缩编码标准H.264，H.264码流结构解析等等有非常多的资料，这里就不一一列举了，网上有很多，然后天之骄子/firstime还有李世平/Peter Lee这些前辈写的文章)，这么多资料我也是没有完全看懂，我也是采取能看懂的尽量看懂，不能看懂的多看几遍(这些资料交换看/翻来覆去的看有帮助你理解原来不懂的东西)，反正是对H.264有了个大概的理解。

我看这类编/解码有个习惯就是从编/解码出来的数据文件下手。因为无论如何复杂，最终的文件(字节流)肯定是符合某种规范的。

如果你打算继续往下看，你对H.264的理解基本上应该知道VCL和NAL是什么东西，知道Exp-Golomb/Huffman编码，码流(bitstream)是由一个个的NALU组成等等这一类的基本知识。

首先我们希望有种可以分析H.264码流的工具，这样可以让我们直观的了解，但是不幸的是类似这样的工具都太贵，多数许可证要上千美金，不过幸好我们可以找到21天试用版或者缺少功能的演示版。
我在网上搜索了很多相关的工具，比如CodecVisa，StreamEye以及VM Analyzer，这些都可以玩玩。

还有个问题，码流文件去哪里找？我在网上一个人的网站看到有下载，但是解压需要密码，我给它写信，期盼着他能给我密码，不幸的是他还没有回，然后我就继续找，其实可以不用这么麻烦，我们可以自己生成码流文件，JM代码里面有个foreman_part_qcif.yuv，你用它的lencod.exe命令先把它编码成H.264码流，这样你就还得先了解下JM是什么，如何编译，如何使用，不过这些都很简单。YUV视频序列也可以到网络上去下载，这个很多。

下载YUV视频序列 http://trace.eas.asu.edu/yuv/
H.264测试模型/文档 http://iphome.hhi.de/suehring/tml/
http://wftp3.itu.int/av-arch/jvt-site/
李世平 http://blog.csdn.net/sunshine1314
天之骄子 http://bbs.chinavideo.org/viewthread.php?tid=988

这里我查看码流用的工具是StreamEye，码流文件是foreman_part_qcif.yuv编码得来的。
这些工具怎么用就不介绍了，无非就是看视频有哪些/类帧组成，各个属性(Header/MacroBlock/Picture)是什么。

下面这段信息是从Headers Info拷贝出来的，粗看下它是SPS，PPS和Slice Header，那这有什么用，这些数据是什么意思？

  [00]seq_parameter_set_rbsp() {
    profile_idc                                    = 66 (Baseline)
    constraint_set0_flag                           = 0 (false)
    constraint_set1_flag                           = 0 (false)
    constraint_set2_flag                           = 0 (false)
    constraint_set3_flag                           = 0 (false)
    constraint_set4_flag                           = 0 (false)
    constraint_set5_flag                           = 0 (false)
    reserved_zero_2bits                            = 0
    level_idc                                      = 30
    seq_parameter_set_id                           = 0
    if (profile_idc == 100 || profile_idc == 110 || profile_idc == 122 || profile_idc == 144) {
      chroma_format_idc                            = na
      if (chroma_format_idc == 3)
        separate_colour_plane_flag                 = na
      bit_depth_luma_minus8                        = na
      bit_depth_chroma_minus8                      = na
      qpprime_y_zero_transform_bypass_flag         = na
      seq_scaling_matrix_present_flag              = na
      if (seq_scaling_matrix_present_flag)
        for (i = 0; i < 8; i++) {
          seq_scaling_list_present_flag[0]         = na
          if (seq_scaling_list_present_flag[0])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[1]         = na
          if (seq_scaling_list_present_flag[1])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[2]         = na
          if (seq_scaling_list_present_flag[2])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[3]         = na
          if (seq_scaling_list_present_flag[3])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[4]         = na
          if (seq_scaling_list_present_flag[4])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[5]         = na
          if (seq_scaling_list_present_flag[5])
            scaling_list_4x4[00]                   = na
            scaling_list_4x4[01]                   = na
            scaling_list_4x4[02]                   = na
            scaling_list_4x4[03]                   = na
            scaling_list_4x4[04]                   = na
            scaling_list_4x4[05]                   = na
            scaling_list_4x4[06]                   = na
            scaling_list_4x4[07]                   = na
            scaling_list_4x4[08]                   = na
            scaling_list_4x4[09]                   = na
            scaling_list_4x4[10]                   = na
            scaling_list_4x4[11]                   = na
            scaling_list_4x4[12]                   = na
            scaling_list_4x4[13]                   = na
            scaling_list_4x4[14]                   = na
            scaling_list_4x4[15]                   = na
          seq_scaling_list_present_flag[6]         = na
          if (seq_scaling_list_present_flag[6])
            scaling_list_8x8[00]                   = na
            scaling_list_8x8[01]                   = na
            scaling_list_8x8[02]                   = na
            scaling_list_8x8[03]                   = na
            scaling_list_8x8[04]                   = na
            scaling_list_8x8[05]                   = na
            scaling_list_8x8[06]                   = na
            scaling_list_8x8[07]                   = na
            scaling_list_8x8[08]                   = na
            scaling_list_8x8[09]                   = na
            scaling_list_8x8[10]                   = na
            scaling_list_8x8[11]                   = na
            scaling_list_8x8[12]                   = na
            scaling_list_8x8[13]                   = na
            scaling_list_8x8[14]                   = na
            scaling_list_8x8[15]                   = na
            scaling_list_8x8[16]                   = na
            scaling_list_8x8[17]                   = na
            scaling_list_8x8[18]                   = na
            scaling_list_8x8[19]                   = na
            scaling_list_8x8[20]                   = na
            scaling_list_8x8[21]                   = na
            scaling_list_8x8[22]                   = na
            scaling_list_8x8[23]                   = na
            scaling_list_8x8[24]                   = na
            scaling_list_8x8[25]                   = na
            scaling_list_8x8[26]                   = na
            scaling_list_8x8[27]                   = na
            scaling_list_8x8[28]                   = na
            scaling_list_8x8[29]                   = na
            scaling_list_8x8[30]                   = na
            scaling_list_8x8[31]                   = na
            scaling_list_8x8[32]                   = na
            scaling_list_8x8[33]                   = na
            scaling_list_8x8[34]                   = na
            scaling_list_8x8[35]                   = na
            scaling_list_8x8[36]                   = na
            scaling_list_8x8[37]                   = na
            scaling_list_8x8[38]                   = na
            scaling_list_8x8[39]                   = na
            scaling_list_8x8[40]                   = na
            scaling_list_8x8[41]                   = na
            scaling_list_8x8[42]                   = na
            scaling_list_8x8[43]                   = na
            scaling_list_8x8[44]                   = na
            scaling_list_8x8[45]                   = na
            scaling_list_8x8[46]                   = na
            scaling_list_8x8[47]                   = na
            scaling_list_8x8[48]                   = na
            scaling_list_8x8[49]                   = na
            scaling_list_8x8[50]                   = na
            scaling_list_8x8[51]                   = na
            scaling_list_8x8[52]                   = na
            scaling_list_8x8[53]                   = na
            scaling_list_8x8[54]                   = na
            scaling_list_8x8[55]                   = na
            scaling_list_8x8[56]                   = na
            scaling_list_8x8[57]                   = na
            scaling_list_8x8[58]                   = na
            scaling_list_8x8[59]                   = na
            scaling_list_8x8[60]                   = na
            scaling_list_8x8[61]                   = na
            scaling_list_8x8[62]                   = na
            scaling_list_8x8[63]                   = na
          seq_scaling_list_present_flag[7]         = na
          if (seq_scaling_list_present_flag[7])
            scaling_list_8x8[00]                   = na
            scaling_list_8x8[01]                   = na
            scaling_list_8x8[02]                   = na
            scaling_list_8x8[03]                   = na
            scaling_list_8x8[04]                   = na
            scaling_list_8x8[05]                   = na
            scaling_list_8x8[06]                   = na
            scaling_list_8x8[07]                   = na
            scaling_list_8x8[08]                   = na
            scaling_list_8x8[09]                   = na
            scaling_list_8x8[10]                   = na
            scaling_list_8x8[11]                   = na
            scaling_list_8x8[12]                   = na
            scaling_list_8x8[13]                   = na
            scaling_list_8x8[14]                   = na
            scaling_list_8x8[15]                   = na
            scaling_list_8x8[16]                   = na
            scaling_list_8x8[17]                   = na
            scaling_list_8x8[18]                   = na
            scaling_list_8x8[19]                   = na
            scaling_list_8x8[20]                   = na
            scaling_list_8x8[21]                   = na
            scaling_list_8x8[22]                   = na
            scaling_list_8x8[23]                   = na
            scaling_list_8x8[24]                   = na
            scaling_list_8x8[25]                   = na
            scaling_list_8x8[26]                   = na
            scaling_list_8x8[27]                   = na
            scaling_list_8x8[28]                   = na
            scaling_list_8x8[29]                   = na
            scaling_list_8x8[30]                   = na
            scaling_list_8x8[31]                   = na
            scaling_list_8x8[32]                   = na
            scaling_list_8x8[33]                   = na
            scaling_list_8x8[34]                   = na
            scaling_list_8x8[35]                   = na
            scaling_list_8x8[36]                   = na
            scaling_list_8x8[37]                   = na
            scaling_list_8x8[38]                   = na
            scaling_list_8x8[39]                   = na
            scaling_list_8x8[40]                   = na
            scaling_list_8x8[41]                   = na
            scaling_list_8x8[42]                   = na
            scaling_list_8x8[43]                   = na
            scaling_list_8x8[44]                   = na
            scaling_list_8x8[45]                   = na
            scaling_list_8x8[46]                   = na
            scaling_list_8x8[47]                   = na
            scaling_list_8x8[48]                   = na
            scaling_list_8x8[49]                   = na
            scaling_list_8x8[50]                   = na
            scaling_list_8x8[51]                   = na
            scaling_list_8x8[52]                   = na
            scaling_list_8x8[53]                   = na
            scaling_list_8x8[54]                   = na
            scaling_list_8x8[55]                   = na
            scaling_list_8x8[56]                   = na
            scaling_list_8x8[57]                   = na
            scaling_list_8x8[58]                   = na
            scaling_list_8x8[59]                   = na
            scaling_list_8x8[60]                   = na
            scaling_list_8x8[61]                   = na
            scaling_list_8x8[62]                   = na
            scaling_list_8x8[63]                   = na
          }
        }
      }
    log2_max_frame_num_minus4                      = 0 (4)
    pic_order_cnt_type                             = 0
    if (pic_order_cnt_type == 0)
      log2_max_pic_order_cnt_lsb_minus4            = 0 (4)
    else if (pic_order_cnt_type == 1) {
      delta_pic_order_always_zero_flag             = na
      offset_for_non_ref_pic                       = na
      offset_for_top_to_bottom_field               = na
      num_ref_frames_in_pic_order_cnt_cycle        = na
      for(i = 0; i < num_ref_frames_in_pic_order_cnt_cycle; i++)
      }
    max_num_ref_frames                             = 10
    gaps_in_frame_num_value_allowed_flag           = 0
    pic_width_in_mbs_minus1                        = 10 (176)
    pic_height_in_map_units_minus1                 = 8 (144)
    frame_mbs_only_flag                            = 1
    if (!frame_mbs_only_flag)
      mb_adaptive_frame_field_flag                 = na
    direct_8x8_inference_flag                      = 0 (false)
    frame_cropping_flag                            = 0 (false)
    if (frame_cropping_flag) {
      frame_crop_left_offset                       = na
      frame_crop_right_offset                      = na
      frame_crop_top_offset                        = na
      frame_crop_bottom_offset                     = na
      }
    vui_parameters_present_flag                    = 0 (false)
    if (vui_parameters_present_flag)
      }
      vui_parameters()
    }
  [00]pic_parameter_set_rbsp() {
    pic_parameter_set_id                           = 0
    seq_parameter_set_id                           = 0
    entropy_coding_mode_flag                       = 0 (CAVLC)
    pic_order_present_flag                         = 0 (false)
    num_slice_groups_minus1                        = 0 (1)
    if (num_slice_groups_minus1 > 0) {
      slice_group_map_type                         = na
      if (slice_group_map_type == 0)
        for (iGroup = 0; iGroup <= num_slice_groups_minus1; iGroup++) {
        }
      else if (slice_group_map_type == 2)
        for (iGroup = 0; iGroup < num_slice_groups_minus1; iGroup++) {
        }
      else if ((slice_group_map_type == 3) || (slice_group_map_type == 4) || (slice_group_map_type == 5)) {
        slice_group_change_direction_flag          = na
        slice_group_change_rate_minus1             = na
      } else if (slice_group_map_type == 6) {
        pic_size_in_map_units_minus1               = na
        for (i = 0; i <= pic_size_in_map_units_minus1; i++)
        }
      }
    num_ref_idx_l0_active_minus1                   = 9 (10)
    num_ref_idx_l1_active_minus1                   = 9 (10)
    weighted_pred_flag                             = 0 (false)
    weighted_bipred_idc                            = 0
    pic_init_qp_minus26                            = 0 (26)
    pic_init_qs_minus26                            = 0 (26)
    chroma_qp_index_offset                         = 0
    deblocking_filter_control_present_flag         = 0 (false)
    constrained_intra_pred_flag                    = 0 (false)
    redundant_pic_cnt_present_flag                 = 0 (false)
    }
    if (more_rbsp_data()) {
      transform_8x8_mode_flag                      = na
      pic_scaling_matrix_present_flag              = na
      if (pic_scaling_matrix_present_flag) {
        for (i = 0; i < 6 + 2 * transform_8x8_mode_flag; i++) {
          pic_scaling_list_present_flag[0]         = na
          if (pic_scaling_list_present_flag[0])
            scaling_list_4x4[0][00]                = na
            scaling_list_4x4[0][01]                = na
            scaling_list_4x4[0][02]                = na
            scaling_list_4x4[0][03]                = na
            scaling_list_4x4[0][04]                = na
            scaling_list_4x4[0][05]                = na
            scaling_list_4x4[0][06]                = na
            scaling_list_4x4[0][07]                = na
            scaling_list_4x4[0][08]                = na
            scaling_list_4x4[0][09]                = na
            scaling_list_4x4[0][10]                = na
            scaling_list_4x4[0][11]                = na
            scaling_list_4x4[0][12]                = na
            scaling_list_4x4[0][13]                = na
            scaling_list_4x4[0][14]                = na
            scaling_list_4x4[0][15]                = na
          pic_scaling_list_present_flag[1]         = na
          if (pic_scaling_list_present_flag[1])
            scaling_list_4x4[1][00]                = na
            scaling_list_4x4[1][01]                = na
            scaling_list_4x4[1][02]                = na
            scaling_list_4x4[1][03]                = na
            scaling_list_4x4[1][04]                = na
            scaling_list_4x4[1][05]                = na
            scaling_list_4x4[1][06]                = na
            scaling_list_4x4[1][07]                = na
            scaling_list_4x4[1][08]                = na
            scaling_list_4x4[1][09]                = na
            scaling_list_4x4[1][10]                = na
            scaling_list_4x4[1][11]                = na
            scaling_list_4x4[1][12]                = na
            scaling_list_4x4[1][13]                = na
            scaling_list_4x4[1][14]                = na
            scaling_list_4x4[1][15]                = na
          pic_scaling_list_present_flag[2]         = na
          if (pic_scaling_list_present_flag[2])
            scaling_list_4x4[2][00]                = na
            scaling_list_4x4[2][01]                = na
            scaling_list_4x4[2][02]                = na
            scaling_list_4x4[2][03]                = na
            scaling_list_4x4[2][04]                = na
            scaling_list_4x4[2][05]                = na
            scaling_list_4x4[2][06]                = na
            scaling_list_4x4[2][07]                = na
            scaling_list_4x4[2][08]                = na
            scaling_list_4x4[2][09]                = na
            scaling_list_4x4[2][10]                = na
            scaling_list_4x4[2][11]                = na
            scaling_list_4x4[2][12]                = na
            scaling_list_4x4[2][13]                = na
            scaling_list_4x4[2][14]                = na
            scaling_list_4x4[2][15]                = na
          pic_scaling_list_present_flag[3]         = na
          if (pic_scaling_list_present_flag[3])
            scaling_list_4x4[3][00]                = na
            scaling_list_4x4[3][01]                = na
            scaling_list_4x4[3][02]                = na
            scaling_list_4x4[3][03]                = na
            scaling_list_4x4[3][04]                = na
            scaling_list_4x4[3][05]                = na
            scaling_list_4x4[3][06]                = na
            scaling_list_4x4[3][07]                = na
            scaling_list_4x4[3][08]                = na
            scaling_list_4x4[3][09]                = na
            scaling_list_4x4[3][10]                = na
            scaling_list_4x4[3][11]                = na
            scaling_list_4x4[3][12]                = na
            scaling_list_4x4[3][13]                = na
            scaling_list_4x4[3][14]                = na
            scaling_list_4x4[3][15]                = na
          pic_scaling_list_present_flag[4]         = na
          if (pic_scaling_list_present_flag[4])
            scaling_list_4x4[4][00]                = na
            scaling_list_4x4[4][01]                = na
            scaling_list_4x4[4][02]                = na
            scaling_list_4x4[4][03]                = na
            scaling_list_4x4[4][04]                = na
            scaling_list_4x4[4][05]                = na
            scaling_list_4x4[4][06]                = na
            scaling_list_4x4[4][07]                = na
            scaling_list_4x4[4][08]                = na
            scaling_list_4x4[4][09]                = na
            scaling_list_4x4[4][10]                = na
            scaling_list_4x4[4][11]                = na
            scaling_list_4x4[4][12]                = na
            scaling_list_4x4[4][13]                = na
            scaling_list_4x4[4][14]                = na
            scaling_list_4x4[4][15]                = na
          pic_scaling_list_present_flag[5]         = na
          if (pic_scaling_list_present_flag[5])
            scaling_list_4x4[5][00]                = na
            scaling_list_4x4[5][01]                = na
            scaling_list_4x4[5][02]                = na
            scaling_list_4x4[5][03]                = na
            scaling_list_4x4[5][04]                = na
            scaling_list_4x4[5][05]                = na
            scaling_list_4x4[5][06]                = na
            scaling_list_4x4[5][07]                = na
            scaling_list_4x4[5][08]                = na
            scaling_list_4x4[5][09]                = na
            scaling_list_4x4[5][10]                = na
            scaling_list_4x4[5][11]                = na
            scaling_list_4x4[5][12]                = na
            scaling_list_4x4[5][13]                = na
            scaling_list_4x4[5][14]                = na
            scaling_list_4x4[5][15]                = na
          }
        }
      second_chroma_qp_index_offset                = na
      }
    }
  [00]slice_header() {
    nal_unit_header_svc_extension() {
      idr_flag                                     = na
      priority_id                                  = na
      no_inter_layer_pred_flag                     = na
      dependency_id                                = na
      quality_id                                   = na
      temporal_id                                  = na
      use_ref_base_pic_flag                        = na
      discardable_flag                             = na
      output_flag                                  = na
      }
    first_mb_in_slice                              = 0
    slice_type                                     = 7 (I slice)
    pic_parameter_set_id                           = 0
    frame_num                                      = 0
    if (!frame_mbs_only_flag) {
      field_pic_flag                               = na
      if (field_pic_flag)
        bottom_field_flag                          = na
      }
    if (nal_unit_type == 5)
      idr_pic_id                                   = 0
    if (pic_order_cnt_type == 0) {
      pic_order_cnt_lsb                            = 0
      if (pic_order_present_flag && !field_pic_flag)
        delta_pic_order_cnt_bottom                 = na
      }
    if (pic_order_cnt_type == 1 && !delta_pic_order_always_zero_flag) {
      delta_pic_order_cnt[0]                       = na
      if (pic_order_present_flag && !field_pic_flag)
        delta_pic_order_cnt[1]                     = na
      }
    if (redundant_pic_cnt_present_flag)
      redundant_pic_cnt                            = na
    if (slice_type == B)
      direct_spatial_mv_pred_flag                  = na
    if (slice_type == P || slice_type == SP || slice_type == B) {
      num_ref_idx_active_override_flag             = na
      if (num_ref_idx_active_override_flag) {
        num_ref_idx_l0_active_minus1               = na
        if (slice_type == B )
          num_ref_idx_l1_active_minus1             = na
        }
      }
    if (nal_unit_type == 20)
      ref_pic_list_mvc_modification()
    else
      ref_pic_list_modification()
    if ((weighted_pred_flag && (slice_type == P || slice_type == SP)) || (weighted_bipred_idc == 1 && slice_type == B))
      pred_weight_table()
    if (nal_ref_idc != 0)
      dec_ref_pic_marking()
    if (entropy_coding_mode_flag && slice_type != I && slice_type != SI)
      cabac_init_idc                               = na
    slice_qp_delta                                 = 2
    if (slice_type == SP || slice_type == SI) {
      if (slice_type == SP)
        sp_for_switch_flag                         = na
      slice_qs_delta                               = na
      }
    if (deblocking_filter_control_present_flag) {
      disable_deblocking_filter_idc                = na
      if (disable_deblocking_filter_idc != 1) {
        slice_alpha_c0_offset_div2                 = na
        slice_beta_offset_div2                     = na
        }
      }
    if (num_slice_groups_minus1 > 0 && slice_group_map_type >= 3 && slice_group_map_type <= 5)
      slice_group_change_cycle                     = na
    }

我们现在来分析，我们知道码流是由一个个的NAL Unit组成的，NALU是由NALU头和RBSP数据组成，而RBSP可能是SPS，PPS，Slice或SEI，目前我们这里SEI不会出现，而且SPS位于第一个NALU，PPS位于第二个NALU，其他就是Slice(严谨点区分的话可以把IDR等等再分出来)了。foreman_part_qcif.yuv只有3帧，那这里编码出来是不是就有5个NALU？我们这里可以大胆假设，然后仔细验证。
NALU头是什么东西，参见Spec的7.3 Syntax in tabular form，如果你有看过天之骄子的文章，就知道在Spec的7.3和7.4是相对应的，所以这两部分都要看，而且7.3就是编码算法的伪代码实现。C(ategory)和Descriptor都要熟悉，f(1)，u(2)，b(8)，ue(v)等等是什么意思，这些在7.2和9.1都有详细说明，大概说下这里比如ue(v)，se(v)等等这样的就是Exp-Golomb编码，f(1)，u(2)这里就是通常的按位数，比如2位的无符号整数，32位的整数等等。

所以7.3和7.4一定要看明白，只有看明白了才能在码流基础上分析H.264。

现在我们来开始分析，下面是一段H.264码流文件的十六进制数据，所以你得有个十六进制编辑器。

00 00 00 01 67 42 00 1E F1 61 62 62 00 00 00 01 68 C8 A1 43 88 00

我们知道00 00 00 01是NALU的开始标记，所以你打开这个完整的码流文件应该可以看到5个00 00 00 01，所以这就是我们之前说的有5个NALU，分别是SPS，PPS和3个Slice。

先贴段数据，这是Spec(Table 7-1 – NAL unit type codes, syntax element categories, and NAL unit type classes)规定的，NALU的类型，现在我们只要看看SPS，PPS，IDR和Slice就行。

#define NALU_TYPE_SLICE    1
#define NALU_TYPE_DPA      2
#define NALU_TYPE_DPB      3
#define NALU_TYPE_DPC      4
#define NALU_TYPE_IDR      5
#define NALU_TYPE_SEI      6
#define NALU_TYPE_SPS      7
#define NALU_TYPE_PPS      8
#define NALU_TYPE_AUD      9
#define NALU_TYPE_EOSEQ    10
#define NALU_TYPE_EOSTREAM 11
#define NALU_TYPE_FILL     12

A) 我们先看第一个NALU的RBSP(8个字节)

67 42 00 1E F1 61 62 62

转换成二进制流

01100111 01000010 00000000 00011110 11110001 01100001 01100010 01100010

先看NALU头
forbidden_zero_bit
nal_ref_idc
nal_unit_type
这三个属性共占8位(Spec上都有写，分别占1，2和5位)，那我们对着解析下就看出
forbidden_zero_bit = 0 // 0
nal_ref_idc = 3 // 11
nal_unit_type = 7 // 00111
这就对了，看看
#define NALU_TYPE_SPS 7
Spec当中后面有些放在if判断里的就是只有符合某个值的时候才会出现，我们这里nal_unit_type为7，不符合，所以直接跳过，进入到RBSP当中，这里是SPS，所以对照Spec
profile_idc
constraint_set0_flag
constraint_set1_flag
constraint_set2_flag
constraint_set3_flag
constraint_set4_flag
constraint_set5_flag
reserved_zero_2bits
level_idc
seq_parameter_set_id
这几个属性，直到seq_parameter_set_id之前都还比较好解析，我们就直接写出它们的值了
profile_idc = 66 // 01000010
constraint_set0_flag = 0 // 0
constraint_set1_flag = 0 // 0
constraint_set2_flag = 0 // 0
constraint_set3_flag = 0 // 0
constraint_set4_flag = 0 // 0
constraint_set5_flag = 0 // 0
reserved_zero_2bits = 0 // 00
level_idc = 30 // 00011110
对于seq_parameter_set_id，我们看到它是ue(v)，这是一种Exp-Golomb编码，每个编码所占的位数不是固定的，我们现在还剩下的数据是11110001 01100001 01100010 01100010。
公式参考Spec(9.1 Parsing process for Exp-Golomb codes)，

leadingZeroBits = −1
for (b = 0; !b; leadingZeroBits++)
	b = read_bits(1)

codeNum = 2^(leadingZeroBits) − 1 + read_bits(leadingZeroBits)

类似于2^k这种写法表示幂运算

过程就是读取1位，在这里结果是1，所以会跳出循环，但是leadingZeroBits++还是会执行，所以leadingZeroBits为0，后面read_bits也不会读取数据了。
codeNum = 2^0 – 1 + 0 = 0
也就是说编码为1的属性实际值为0
seq_parameter_set_id = 0 // Exp-Golomb解1
同样后面是if判断不会走到，现在直接到
log2_max_frame_num_minus4 = 0 // Exp-Golomb解1
pic_order_cnt_type = 0 // Exp-Golomb解1
log2_max_pic_order_cnt_lsb_minus4 = 0 // Exp-Golomb解1

max_num_ref_frames = 10 // 这里二进制流从0001开始了(前面的4个1被上面4个属性用掉了)，所以有leadingZeroBits为3，结果就是2^3 – 1 + read_bits(011)

gaps_in_frame_num_value_allowed_flag = 0 // 0

pic_width_in_mbs_minus1 = 10 // Exp-Golomb解0001 011
pic_height_in_map_units_minus1 = 8 // Exp-Golomb解00010 01

对于Exp-Golomb不明白的请参见Exponential-Golomb coding解码部分，剩下的

frame_mbs_only_flag = 1 // 1

direct_8x8_inference_flag = 0 // 0
frame_cropping_flag = 0 // 0
vui_parameters_present_flag = 0 // 0

还剩下10两个位的数据没有用到，之前的这么多数据(除了NALU头之外的)都是seq_parameter_set_data，而根据Spec我们知道还有结尾补齐位

seq_parameter_set_rbsp( ) {
	seq_parameter_set_data( ) // 数据
	rbsp_trailing_bits( ) // 按字节补齐
}

补齐规则参见7.3.2.11 RBSP trailing bits syntax，实际就是按照字节对齐来补齐，所以这就是10这两位数据的由来。

回头看起来，这就是SPS的数据，也就是第一个NALU，同前面从Headers Info拷贝出来的SPS也是完全吻合的，所以这里我们就算是把Spec和实际的用法/码流对照起来了。另外值得说一下的就是从Headers Info拷贝出来的数据当中”na”就是未定义的，也就是if条件没有覆盖的情况。

B) 现在我们以同样的方式来看PPS(5个字节)

68 C8 A1 43 88

转换成二进制流

01101000 11001000 10100001 01000011 10001000

同样先看NALU头，解析结果如下
forbidden_zero_bit = 0 // 0
nal_ref_idc = 3 // 11
nal_unit_type = 8 // 01000
也就对应于
#define NALU_TYPE_PPS 8
就可以知道此处的RBSP是PPS
pic_parameter_set_id = 0 // Exp-Golomb解1
seq_parameter_set_id = 0 // Exp-Golomb解1
entropy_coding_mode_flag = 0 // 0
bottom_field_pic_order_in_frame_present_flag = 0 // 0
num_slice_groups_minus1 = 0 // Exp-Golomb解1

num_ref_idx_l0_default_active_minus1 = 9 // Exp-Golomb解000 1010
num_ref_idx_l1_default_active_minus1 = 9 // Exp-Golomb解0001 010

weighted_pred_flag = 0 // 0
weighted_bipred_idc = 0 // 00

pic_init_qp_minus26 = 0 // Exp-Golomb解1
pic_init_qs_minus26 = 0 // Exp-Golomb解1
chroma_qp_index_offset = 0 // Exp-Golomb解1

deblocking_filter_control_present_flag = 0 // 0
constrained_intra_pred_flag = 0 // 0
redundant_pic_cnt_present_flag = 0 // 0

还剩下1000这四位，这就是按字节补齐的数据。

C) 这就是Slice开始的数据了
先看部分数据(前4个字节)

65 88 84 02

转换成二进制流

01100101 10001000 10000100 00000010

同样先看NALU头，解析结果如下
forbidden_zero_bit = 0 // 0
nal_ref_idc = 3 // 11
nal_unit_type = 5 // 00101
也就对应于
#define NALU_TYPE_IDR 5
可以知道这个是IDR帧(关于什么是IDR，IDR和I片有什么区别)

first_mb_in_slice = 0 // Exp-Golomb解1
slice_type = 7 // Exp-Golomb解0001000 也就是I slice，关于slice_type请参考Table 7-6 – Name association to slice_type
pic_parameter_set_id = 0 // Exp-Golomb解1

frame_num = 0 // u(v)根据占用的位数(log2_max_frame_num_minus4 + 4)解出值 // 0000

对于frame_num这个属性要特别说下，它的Descriptor是u(v)，那么我们查看u(v)得知
u(n): unsigned integer using n bits. When n is “v” in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements.
也就是说这个属性占用的位数是取决于其它属性的，那就再搜索下frame_num得到
frame_num is used as an identifier for pictures and shall be represented by log2_max_frame_num_minus4 + 4 bits in the bitstream.
于是我们就大概清楚了，frame_num占用的位数跟log2_max_frame_num_minus4相关，之前在SPS当中我们知道log2_max_frame_num_minus4 = 0，所以这里frame_num占用4位，也就是0000，解析出来也就是0，另外也需要知道frame_num有很多限制，比如在IDR当中必须为0，具体参见7.4.3 Slice header semantics。这里要指出的是，这是一份完整优秀的Spec，基本上已经涵盖了我们需要的所有东西，只是需要我们去找，去分析(尽管这个过程可能很麻烦，有时让人摸不着头脑，但是需要相信我们需要的答案就在里面)。

idr_pic_id = 0 // Exp-Golomb解1

pic_order_cnt_lsb = 0 // u(v)根据占用的位数(log2_max_pic_order_cnt_lsb_minus4 + 4)解出值 // 0000

剩下的000010

现在要进入ref_pic_list_modification( )这个function了，但是里面所有if判断条件不符合

然后进入dec_ref_pic_marking( )
no_output_of_prior_pics_flag = 0 // 0
long_term_reference_flag = 0 // 0

现在只剩下0010这四位了，我们继续补充3个字节(63 61 7C)进来01100011 01100001 01111100
于是我们继续做slice_qp_delta的解码，注意这里它的Descriptor是se(v)，所以要先对进Exp-Golomb解码，然后进行mapping得出值。
0010 01100011 01100001 01111100 // 这里两个0，求出Exp-Golomb编码值为00100 // 长度5，后缀为0可以被解析成2 实际可以通过Exp-Golomb(2^2 – 1 + 0)算出值为3 然后代入(-1)^(k + 1) * Ceil(k divide 2)求出值为2。详细可以参见9.1.1 Mapping process for signed Exp-Golomb codes。

slice_qp_delta = 2 // 00100 // se(v)

到这里Slice Header就解析完成了。

暂时就到这里，需要说明的是，我们只写出了前三个NALU部分解析方法(第一个Slice，也就是IDR，我们只写出了Header部分，还有数据部分我们留到后面来分析)，还剩两个Slice我们留着有必要的时候来分析。

追拍

要学H.264也不容易，涉及到的知识可谓是方方面面(代码，人体视觉系统构造，实际拍摄场景等等，因为它就是利用这些原理，来做到数据压缩的)，看了一段时间的论文，书，对大体的理论知识算是有了个初步的掌握，想要一下吃进去不大现实，所以现在就先从简单的知识入手，记录一些阅读中遇到的认为有价值的知识。

今天来翻译(应该算看了英文的东西，然后用中文写点笔记)下照相有关的东西，主要参考资料Panning (camera)，也就是我们日常所说的追拍。

比如你的小狗在跑，你想拍下它的样子，你就要随着它跑动的方位来移动你的镜头，当然是在拍摄过程当中，并且你移动镜头速度要和它跑动的速度差不多，就是镜头和狗相对静止，然后背景感觉在运动的效果。快门速度不能太快也不能太慢(太短了背景不会模糊，太长了整个画面都会模糊)。
效果图可以参见WikiPedia上的Racing Car，https://en.wikipedia.org/wiki/File:DTM_Mercedes_W204_DiResta09_amk.jpg。

当然我们这里说的追拍是指水平(horizontal plane)移动追拍，还有一种追拍是垂直移动的，叫做Tilt-shift拍摄，我们不考虑。

所以要想拍好这样的照片移动镜头的稳定性也是很重要的，所以有人把相机固定在三/单脚架上，或者更专业的是为相机架设一个固定的轨道，让相机在轨道上运行(电视里面拍戏经常见，不是吗)。

看完这个描述，你会觉得这不很简单吗？为什么会根H.264还有关系？其实是在Motion Compensation(MC)当中可能会用到这个。

参考资料:
https://en.wikipedia.org/wiki/Tilt_(camera)
https://en.wikipedia.org/wiki/Panning_(camera)

手工进行图片Huffman编码

这里有篇文章讲解如何进行Huffman编码的JPEG Huffman Coding Tutorial，写的还是很清晰，我本来打算翻译一下这篇文章，但是想想觉得还是配合它的思路以不同的角度再写一篇，它是解码方式，我这里是以编码方式来描述，它的Y:Cb:Cr是1:1:1，我这里是4:1:1，当然它是英文的，我是中文的 @@。

JPEG/Huffman编码基础知识不了解的还是要先学习下基础知识，可以参考JPEG学习笔记，否则你可能无法看懂这里在说什么。
简单回顾下JPEG编码过程，这里我们假设RGB->JPEG：

色彩空间转换
正向离散余弦变换
量化(需要量化表)
Huffman编码(需要Huffman编码表)
生成图片流(按照JFIF格式)

其它的我们都不说了，直接开始吧，如果你现在觉得上面的概念/原理还不是很理解，还来的急，请跳转到文章顶部从头开始读。
有些中间数据的推理是通过JPEGsnoop或者手工进行的，所以你也要熟悉。

假设我们有RGB数据，这里我用的是test.bmp，一张16×8的的白黑位图(左DU白，右DU黑)，可以用16进制工具打开来查看，实际数据就是：

FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

假设我们有SOF数据
8位数据样本(1字节)
图像高度为8(2字节)
图像宽度为16(2字节)
颜色分量数，JPEG都是YCrCb，即3(1字节)
颜色分量信息，大小是颜色分量数 multiply 3
因为分量信息中颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
H 2:1:1
V 2:1:1
所以总体采样因子就是(2 * 2):(1 * 1):(1 * 1)，即4:1:1

MCU宽是水平采样因子最大值 multiply 8(记该最大值为Hmax)
MCU高是垂直采样因子最大值 multiply 8(记该最大值为Vmax)
因此这里就是(Hmax * 8):(Vmax * 8) = 16:16

如果整幅图片的高度或者宽度不是MCU的整数倍，就需要padding，解码之后丢弃大于宽度或者高度部分的数据
在数据流当中，MCU是按从左到右，从上到下来排列的。
因为每个MCU由若干数据单元组成，而数据单元又必须是8:8的，所以MCU当中数据单元的个数就是4(Hmax * Vmax)

组装起来可能就如下：

FF C0 00 11 08 00 08 00 10 03 01 22 00 02 11 01 03 11 01

DHT(FF C4)如下(这里的Huffman编码表是标准的，没有优化过的)

FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 14 32 81 91 A1 08 23 42 B1 C1 15 52 D1 F0 24 33 62 72 82 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FF C4 00 1F 01 00 03 01 01 01 01 01 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B FF C4 00 B5 11 00 02 01 02 04 04 03 04 07 05 04 04 00 01 02 77 00 01 02 03 11 04 05 21 31 06 12 41 51 07 61 71 13 22 32 81 08 14 42 91 A1 B1 C1 09 23 33 52 F0 15 62 72 D1 0A 16 24 34 E1 25 F1 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 F4 F5 F6 F7 F8 F9 FA

重构Huffman编码表(这样才比较好理解，编码表如何构建出来，参考[JPEG学习笔记]，这里只列出需要用到的，部分有省略)，4张表分别为直流0(直流Y)，交流0(交流Y)，直流1(直流C)，交流1(交流C)

*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000B1
  Huffman table length = 31
  ----
  Destination ID = 0
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (001 total): 00 
    Codes of length 03 bits (005 total): 01 02 03 04 05 
    Codes of length 04 bits (001 total): 06 
    Codes of length 05 bits (001 total): 07 
    Codes of length 06 bits (001 total): 08 
    Codes of length 07 bits (001 total): 09 
    Codes of length 08 bits (001 total): 0A 
    Codes of length 09 bits (001 total): 0B 
    Codes of length 10 bits (000 total): 
    Codes of length 11 bits (000 total): 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

Length       Codeword        Code
2            00              00(End of Block)
3            010             01
3            011             02
3            100             03
3            101             04
3            110             05
4            1110            06
5            1111 0          07
6            1111 10         08
7            1111 110        09
8            1111 1110       0A
9            1111 1111 0     0B


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000D2
  Huffman table length = 181
  ----
  Destination ID = 0
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (003 total): 00 04 11 
    Codes of length 05 bits (003 total): 05 12 21 
    Codes of length 06 bits (002 total): 31 41 
    Codes of length 07 bits (004 total): 06 13 51 61 
    Codes of length 08 bits (003 total): 07 22 71 
    Codes of length 09 bits (005 total): 14 32 81 91 A1 
    Codes of length 10 bits (005 total): 08 23 42 B1 C1 
    Codes of length 11 bits (004 total): 15 52 D1 F0 
    Codes of length 12 bits (004 total): 24 33 62 72 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (001 total): 82 
    Codes of length 16 bits (125 total): 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 
                                         37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 
                                         57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 
                                         77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 
                                         96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 
                                         B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA 
                                         D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 
                                         E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

Length  Codeword                    Code
2       00                          01
2       01                          02
3       100                         03
4       1010                        00(End of Block)
4       1011                        04
4       1100                        11
5       1101 0                      05
5       1101 1                      12
5       1110 0                      21
6       1110 10                     31
6       1110 11                     41
7       1111 000                    06
7       1111 001                    13
7       1111 010                    51
7       1111 011                    61
......


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x00000189
  Huffman table length = 31
  ----
  Destination ID = 1
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (003 total): 00 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (001 total): 04 
    Codes of length 05 bits (001 total): 05 
    Codes of length 06 bits (001 total): 06 
    Codes of length 07 bits (001 total): 07 
    Codes of length 08 bits (001 total): 08 
    Codes of length 09 bits (001 total): 09 
    Codes of length 10 bits (001 total): 0A 
    Codes of length 11 bits (001 total): 0B 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

Length     Codeword            Code
2          00                  00(End of Block)
2          01                  01
2          10                  02
3          110                 03
4          1110                04
5          1111 0              05
6          1111 10             06
7          1111 110            07
8          1111 1110           08
9          1111 1111 0         09
10         1111 1111 10        0A
11         1111 1111 110       0B


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000001AA
  Huffman table length = 181
  ----
  Destination ID = 1
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 00 01 
    Codes of length 03 bits (001 total): 02 
    Codes of length 04 bits (002 total): 03 11 
    Codes of length 05 bits (004 total): 04 05 21 31 
    Codes of length 06 bits (004 total): 06 12 41 51 
    Codes of length 07 bits (003 total): 07 61 71 
    Codes of length 08 bits (004 total): 13 22 32 81 
    Codes of length 09 bits (007 total): 08 14 42 91 A1 B1 C1 
    Codes of length 10 bits (005 total): 09 23 33 52 F0 
    Codes of length 11 bits (004 total): 15 62 72 D1 
    Codes of length 12 bits (004 total): 0A 16 24 34 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (001 total): E1 
    Codes of length 15 bits (002 total): 25 F1 
    Codes of length 16 bits (119 total): 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 
                                         44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 
                                         64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 
                                         83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 
                                         9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 
                                         B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 
                                         D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 
                                         F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

Length  Codeword                    Code
2       00                          00(End of Block)
2       01                          01
3       100                         02
4       1010                        03
4       1011                        11
5       1100 0                      04
5       1100 1                      05
5       1101 0                      21
5       1101 1                      31
6       1110 00                     06
6       1110 01                     12
6       1110 10                     41
6       1110 11                     51
......

DQT(FF DB)如下

FF DB 00 43 00 05 03 04 04 04 03 05 04 04 04 05 05 05 06 07 0C 08 07 07 07 07 0F 0B 0B 09 0C 11 0F 12 12 11 0F 11 11 13 16 1C 17 13 14 1A 15 11 11 18 21 18 1A 1D 1D 1F 1F 1F 13 17 22 24 22 1E 24 1C 1E 1F 1E
FF DB 00 43 01 05 05 05 07 06 07 0E 08 08 0E 1E 14 11 14 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E

0x0043为所占大小
1个字节为QT信息，高4位为QT精度(在这里精度数据都是0)，低4位为QT号
64字节的QT数据
还原成矩阵形式(Zigzag)
QT 0

5   3   3   5   7   12  15  18

4   4   4   6   8   17  18  17

4   4   5   7   12  17  21  17

4   5   7   9   15  16  24  19

5   7   11  17  20  33  31  23

7   11  17  19  24  31  34  28

15  19  23  26  31  36  36  30

22  28  29  29  34  30  31  30

QT 1
略

前提条件都有了，开始做变换了。
通过SOF数据我们知道有1个MCU，Y有4个DU(2个是真实的数据，2个是填充)，Cb有一个DU，Cr有一个DU。

RGB
略

DU 1(Y)

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

DU 1(-128)

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

DU 1(FDCT)

1016 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 1(Quantization)

203  0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(Y)

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(-128)

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

DU 2(FDCT)

-1024 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(Quantization)

-205 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

于是得到
DC分别为203和-205(绝对)，转换为相对则是203和-408
AC都为0

其它为0或者填充数据暂时就不用管了。

现在进行Huffman编码
203 = 1100 1011(查表或者计算都可以得出，最高位为1表示正数，最高位为0表示负数，取反得出其值)，继而DC编码前缀为1111 10(根据Y的DC表可以得出，数据占8位)
-408 = 0011 0011 1
则

1111 10 1100 1011 // 直流Y
1111 10 1100 1011 1010 // 直流Y + 交流Y
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 // 直流Y + 交流Y + 直流Y + 交流Y
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU)
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 00 00 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb + 直流Cr + 交流Cr
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 00 00 111111 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb + 直流Cr + 交流Cr + 填充位

去掉空格
1111101100101110101111110001100111101000101000101000000000111111
也就是说数据其实只有58位，补齐到字节对齐，共占64位，就得到了我们最终要的数据。
16进制表示
FB 2E BF 19 E8 A2 80 3F

这下你或许对这复杂的JPEG原理有所了解了吧，但是记住这才是开始(当然你无法全手工的为一张复杂的图片来编码，这篇文章只是帮助理解原理)，JPEG正真复杂的是它对各种算法的优化，那才是重点。

NanoJPEG，一个简单的JPEG解码器分析

自己学习了下JPEG理论知识以后，找了个简单的解码器(NanoJPEG)试试看，原始地址在这里http://keyj.emphy.de/nanojpeg/，短短几百行，还是比较容易看懂的，本人理解详细参见代码注释，如理解有误欢迎指出，有问题/兴趣也可以留言和我讨论。

$ gcc -O3 -D_NJ_EXAMPLE_PROGRAM -o nanojpeg nanojpeg.c
$ ./nanojpeg testorig.jpg

跟着main函数，边理论边实践，不错的方法！

// NanoJPEG -- KeyJ's Tiny Baseline JPEG Decoder
// version 1.3 (2012-03-05)
// by Martin J. Fiedler <martin.fiedler@gmx.net>
//
// This software is published under the terms of KeyJ's Research License,
// version 0.2. Usage of this software is subject to the following conditions:
// 0. There's no warranty whatsoever. The author(s) of this software can not
//    be held liable for any damages that occur when using this software.
// 1. This software may be used freely for both non-commercial and commercial
//    purposes.
// 2. This software may be redistributed freely as long as no fees are charged
//    for the distribution and this license information is included.
// 3. This software may be modified freely except for this license information,
//    which must not be changed in any way.
// 4. If anything other than configuration, indentation or comments have been
//    altered in the code, the original author(s) must receive a copy of the
//    modified code.


///////////////////////////////////////////////////////////////////////////////
// DOCUMENTATION SECTION                                                     //
// read this if you want to know what this is all about                      //
///////////////////////////////////////////////////////////////////////////////

// INTRODUCTION
// ============
//
// This is a minimal decoder for baseline JPEG images. It accepts memory dumps
// of JPEG files as input and generates either 8-bit grayscale or packed 24-bit
// RGB images as output. It does not parse JFIF or Exif headers; all JPEG files
// are assumed to be either grayscale or YCbCr. CMYK or other color spaces are
// not supported. All YCbCr subsampling schemes with power-of-two ratios are
// supported, as are restart intervals. Progressive or lossless JPEG is not
// supported.
// Summed up, NanoJPEG should be able to decode all images from digital cameras
// and most common forms of other non-progressive JPEG images.
// The decoder is not optimized for speed, it's optimized for simplicity and
// small code. Image quality should be at a reasonable level. A bicubic chroma
// upsampling filter ensures that subsampled YCbCr images are rendered in
// decent quality. The decoder is not meant to deal with broken JPEG files in
// a graceful manner; if anything is wrong with the bitstream, decoding will
// simply fail.
// The code should work with every modern C compiler without problems and
// should not emit any warnings. It uses only (at least) 32-bit integer
// arithmetic and is supposed to be endianness independent and 64-bit clean.
// However, it is not thread-safe.


// COMPILE-TIME CONFIGURATION
// ==========================
//
// The following aspects of NanoJPEG can be controlled with preprocessor
// defines:
//
// _NJ_EXAMPLE_PROGRAM     = Compile a main() function with an example
//                           program.
// _NJ_INCLUDE_HEADER_ONLY = Don't compile anything, just act as a header
//                           file for NanoJPEG. Example:
//                               #define _NJ_INCLUDE_HEADER_ONLY
//                               #include "nanojpeg.c"
//                               int main(void) {
//                                   njInit();
//                                   // your code here
//                                   njDone();
//                               }
// NJ_USE_LIBC=1           = Use the malloc(), free(), memset() and memcpy()
//                           functions from the standard C library (default).
// NJ_USE_LIBC=0           = Don't use the standard C library. In this mode,
//                           external functions njAlloc(), njFreeMem(),
//                           njFillMem() and njCopyMem() need to be defined
//                           and implemented somewhere.
// NJ_USE_WIN32=0          = Normal mode (default).
// NJ_USE_WIN32=1          = If compiling with MSVC for Win32 and
//                           NJ_USE_LIBC=0, NanoJPEG will use its own
//                           implementations of the required C library
//                           functions (default if compiling with MSVC and
//                           NJ_USE_LIBC=0).
// NJ_CHROMA_FILTER=1      = Use the bicubic chroma upsampling filter
//                           (default). // 图像resize的一种算法
// NJ_CHROMA_FILTER=0      = Use simple pixel repetition for chroma upsampling
//                           (bad quality, but faster and less code).


// API
// ===
//
// For API documentation, read the "header section" below.


// EXAMPLE
// =======
//
// A few pages below, you can find an example program that uses NanoJPEG to
// convert JPEG files into PGM or PPM. To compile it, use something like
//     gcc -O3 -D_NJ_EXAMPLE_PROGRAM -o nanojpeg nanojpeg.c
// You may also add -std=c99 -Wall -Wextra -pedantic -Werror, if you want 


///////////////////////////////////////////////////////////////////////////////
// HEADER SECTION                                                            //
// copy and pase this into nanojpeg.h if you want                            //
///////////////////////////////////////////////////////////////////////////////

#ifndef _NANOJPEG_H
#define _NANOJPEG_H

// nj_result_t: Result codes for njDecode().
typedef enum _nj_result {
    NJ_OK = 0,        // no error, decoding successful
    NJ_NO_JPEG,       // not a JPEG file
    NJ_UNSUPPORTED,   // unsupported format
    NJ_OUT_OF_MEM,    // out of memory
    NJ_INTERNAL_ERR,  // internal error
    NJ_SYNTAX_ERROR,  // syntax error
    __NJ_FINISHED,    // used internally, will never be reported
} nj_result_t;

// njInit: Initialize NanoJPEG.
// For safety reasons, this should be called at least one time before using
// using any of the other NanoJPEG functions.
void njInit(void);

// njDecode: Decode a JPEG image.
// Decodes a memory dump of a JPEG file into internal buffers.
// Parameters:
//   jpeg = The pointer to the memory dump.
//   size = The size of the JPEG file.
// Return value: The error code in case of failure, or NJ_OK (zero) on success.
nj_result_t njDecode(const void* jpeg, const int size);

// njGetWidth: Return the width (in pixels) of the most recently decoded
// image. If njDecode() failed, the result of njGetWidth() is undefined.
int njGetWidth(void);

// njGetHeight: Return the height (in pixels) of the most recently decoded
// image. If njDecode() failed, the result of njGetHeight() is undefined.
int njGetHeight(void);

// njIsColor: Return 1 if the most recently decoded image is a color image
// (RGB) or 0 if it is a grayscale image. If njDecode() failed, the result
// of njGetWidth() is undefined.
int njIsColor(void);

// njGetImage: Returns the decoded image data.
// Returns a pointer to the most recently image. The memory layout it byte-
// oriented, top-down, without any padding between lines. Pixels of color
// images will be stored as three consecutive bytes for the red, green and
// blue channels. This data format is thus compatible with the PGM or PPM
// file formats and the OpenGL texture formats GL_LUMINANCE8 or GL_RGB8.
// If njDecode() failed, the result of njGetImage() is undefined.
unsigned char* njGetImage(void);

// njGetImageSize: Returns the size (in bytes) of the image data returned
// by njGetImage(). If njDecode() failed, the result of njGetImageSize() is
// undefined.
int njGetImageSize(void);

// njDone: Uninitialize NanoJPEG.
// Resets NanoJPEG's internal state and frees all memory that has been
// allocated at run-time by NanoJPEG. It is still possible to decode another
// image after a njDone() call.
void njDone(void);

#endif//_NANOJPEG_H


///////////////////////////////////////////////////////////////////////////////
// CONFIGURATION SECTION                                                     //
// adjust the default settings for the NJ_ defines here                      //
///////////////////////////////////////////////////////////////////////////////

#ifndef NJ_USE_LIBC
    #define NJ_USE_LIBC 1
#endif

#ifndef NJ_USE_WIN32
  #ifdef _MSC_VER
    #define NJ_USE_WIN32 (!NJ_USE_LIBC)
  #else
    #define NJ_USE_WIN32 0
  #endif
#endif

#ifndef NJ_CHROMA_FILTER
    #define NJ_CHROMA_FILTER 1
#endif


///////////////////////////////////////////////////////////////////////////////
// EXAMPLE PROGRAM                                                           //
// just define _NJ_EXAMPLE_PROGRAM to compile this (requires NJ_USE_LIBC)    //
///////////////////////////////////////////////////////////////////////////////

#ifdef  _NJ_EXAMPLE_PROGRAM

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char* argv[]) {
    int size;
    char *buf;
    FILE *f;

    if (argc < 2) {
        printf("Usage: %s <input.jpg> [<output.ppm>]\n", argv[0]);
        return 2;
    }
    f = fopen(argv[1], "rb");
    if (!f) {
        printf("Error opening the input file.\n");
        return 1;
    }
    fseek(f, 0, SEEK_END);
    size = (int) ftell(f); // 字节
    buf = malloc(size);
    fseek(f, 0, SEEK_SET);
    size = (int) fread(buf, 1, size, f); // 读取整个文件内容到buf
    fclose(f);

    njInit(); // 初始化nj_context_t
    if (njDecode(buf, size)) {
        printf("Error decoding the input file.\n");
        return 1;
    }

    f = fopen((argc > 2) ? argv[2] : (njIsColor() ? "nanojpeg_out.ppm" : "nanojpeg_out.pgm"), "wb");
    if (!f) {
        printf("Error opening the output file.\n");
        return 1;
    }
    fprintf(f, "P%d\n%d %d\n255\n", njIsColor() ? 6 : 5, njGetWidth(), njGetHeight());
    fwrite(njGetImage(), 1, njGetImageSize(), f);
    fclose(f);
    njDone();
    return 0;
}

#endif

// 解释什么是stride http://msdn.microsoft.com/en-us/library/windows/desktop/aa473780(v=vs.85).aspx

///////////////////////////////////////////////////////////////////////////////
// IMPLEMENTATION SECTION                                                    //
// you may stop reading here                                                 //
///////////////////////////////////////////////////////////////////////////////

#ifndef _NJ_INCLUDE_HEADER_ONLY

#ifdef _MSC_VER
    #define NJ_INLINE static __inline
    #define NJ_FORCE_INLINE static __forceinline
#else
    #define NJ_INLINE static inline
    #define NJ_FORCE_INLINE static inline
#endif

#if NJ_USE_LIBC
    #include <stdlib.h>
    #include <string.h>
    #define njAllocMem malloc
    #define njFreeMem  free
    #define njFillMem  memset
    #define njCopyMem  memcpy
#elif NJ_USE_WIN32
    #include <windows.h>
    #define njAllocMem(size) ((void*) LocalAlloc(LMEM_FIXED, (SIZE_T)(size)))
    #define njFreeMem(block) ((void) LocalFree((HLOCAL) block))
    NJ_INLINE void njFillMem(void* block, unsigned char value, int count) { __asm {
        mov edi, block
        mov al, value
        mov ecx, count
        rep stosb
    } }
    NJ_INLINE void njCopyMem(void* dest, const void* src, int count) { __asm {
        mov edi, dest
        mov esi, src
        mov ecx, count
        rep movsb
    } }
#else
    extern void* njAllocMem(int size);
    extern void njFreeMem(void* block);
    extern void njFillMem(void* block, unsigned char byte, int size);
    extern void njCopyMem(void* dest, const void* src, int size);
#endif

typedef struct _nj_code {
    unsigned char bits, code;
} nj_vlc_code_t;

typedef struct _nj_cmp {
    int cid;
    int ssx, ssy; // 水平/垂直因子
    int width, height;
    int stride;
    int qtsel; // Quantization Table量化表
    int actabsel, dctabsel; // AC/DC Huffman Table
    int dcpred;
    unsigned char *pixels;
} nj_component_t; // 颜色分量

typedef struct _nj_ctx {
    nj_result_t error;
    const unsigned char *pos; // 待解码数据指针(按字节来)
    int size; // 整个数据的长度
    int length; // 某一个marker内容的长度
    int width, height; // 图片宽和高度
    int mbwidth, mbheight; // MCU水平/垂直个数
    int mbsizex, mbsizey; // MCU宽/高
    int ncomp; // 颜色分量数
    nj_component_t comp[3]; // YCbCr
    int qtused, qtavail; // 这两个目前看不出来很大用处
    unsigned char qtab[4][64]; // 但是目前似乎只有2个
    nj_vlc_code_t vlctab[4][65536]; // 构造所有16位数的Huffman基数
									// 目前基本上是4个(直/交/0/1)
    int buf, bufbits; // 这是用来做什么的 buf是存放内容的 bufbits是计数器，存放了多少个bits
    int block[64];
    int rstinterval;
    unsigned char *rgb; // 解析出来的RGB所要占用的内存 // 每1个点包含3个字节，按找RGB的顺序
} nj_context_t;

static nj_context_t nj;

static const char njZZ[64] = { 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18,
11, 4, 5, 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13, 6, 7, 14, 21, 28, 35,
42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51, 58, 59, 52, 45,
38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63 };

/*
0   1   2   3   4   5   6   7

8   9   10  11  12  13  14  15

16  17  18  19  20  21  22  23

24  25  26  27  28  29  30  31

32  33  34  35  36  37  38  39

40  41  42  43  44  45  46  47

48  49  50  51  52  53  54  55

56  57  58  59  60  61  62  63
*/

NJ_FORCE_INLINE unsigned char njClip(const int x) { // 限定范围是0 ~ 255之间
    return (x < 0) ? 0 : ((x > 0xFF) ? 0xFF : (unsigned char) x);
}

#define W1 2841
#define W2 2676
#define W3 2408
#define W5 1609
#define W6 1108
#define W7 565

NJ_INLINE void njRowIDCT(int* blk) { // 按行来操作的 0 ~ 7 // 8 ~ 15
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;
    if (!((x1 = blk[4] << 11)
        | (x2 = blk[6])
        | (x3 = blk[2])
        | (x4 = blk[1])
        | (x5 = blk[7])
        | (x6 = blk[5])
        | (x7 = blk[3])))
    {
        blk[0] = blk[1] = blk[2] = blk[3] = blk[4] = blk[5] = blk[6] = blk[7] = blk[0] << 3;
        return;
    }
    x0 = (blk[0] << 11) + 128;
    x8 = W7 * (x4 + x5);
    x4 = x8 + (W1 - W7) * x4;
    x5 = x8 - (W1 + W7) * x5;
    x8 = W3 * (x6 + x7);
    x6 = x8 - (W3 - W5) * x6;
    x7 = x8 - (W3 + W5) * x7;
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2);
    x2 = x1 - (W2 + W6) * x2;
    x3 = x1 + (W2 - W6) * x3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8;
    x4 = (181 * (x4 - x5) + 128) >> 8;
    blk[0] = (x7 + x1) >> 8;
    blk[1] = (x3 + x2) >> 8;
    blk[2] = (x0 + x4) >> 8;
    blk[3] = (x8 + x6) >> 8;
    blk[4] = (x8 - x6) >> 8;
    blk[5] = (x0 - x4) >> 8;
    blk[6] = (x3 - x2) >> 8;
    blk[7] = (x7 - x1) >> 8;
}

NJ_INLINE void njColIDCT(const int* blk, unsigned char *out, int stride) {
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;
    if (!((x1 = blk[8*4] << 8)
        | (x2 = blk[8*6])
        | (x3 = blk[8*2])
        | (x4 = blk[8*1])
        | (x5 = blk[8*7])
        | (x6 = blk[8*5])
        | (x7 = blk[8*3])))
    {
        x1 = njClip(((blk[0] + 32) >> 6) + 128);
        for (x0 = 8;  x0;  --x0) {
            *out = (unsigned char) x1;
            out += stride;
        }
        return;
    }
    x0 = (blk[0] << 8) + 8192;
    x8 = W7 * (x4 + x5) + 4;
    x4 = (x8 + (W1 - W7) * x4) >> 3;
    x5 = (x8 - (W1 + W7) * x5) >> 3;
    x8 = W3 * (x6 + x7) + 4;
    x6 = (x8 - (W3 - W5) * x6) >> 3;
    x7 = (x8 - (W3 + W5) * x7) >> 3;
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2) + 4;
    x2 = (x1 - (W2 + W6) * x2) >> 3;
    x3 = (x1 + (W2 - W6) * x3) >> 3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8; // Y，Cb和Cr的值都范围都是-128 ~ 127，并且在FDCT的时候有先减去128，所以现在要IDCT之后再加上128
    x4 = (181 * (x4 - x5) + 128) >> 8;
    *out = njClip(((x7 + x1) >> 14) + 128);  out += stride;
    *out = njClip(((x3 + x2) >> 14) + 128);  out += stride;
    *out = njClip(((x0 + x4) >> 14) + 128);  out += stride;
    *out = njClip(((x8 + x6) >> 14) + 128);  out += stride;
    *out = njClip(((x8 - x6) >> 14) + 128);  out += stride;
    *out = njClip(((x0 - x4) >> 14) + 128);  out += stride;
    *out = njClip(((x3 - x2) >> 14) + 128);  out += stride;
    *out = njClip(((x7 - x1) >> 14) + 128);
}

#define njThrow(e) do { nj.error = e; return; } while (0)
#define njCheckError() do { if (nj.error) return; } while (0)

static int njShowBits(int bits) { // 能放得下大于32位的值么？
    unsigned char newbyte;
    if (!bits) return 0;
    while (nj.bufbits < bits) { // 也就是说要buf的位数小于已经buf的位数的时候，就直接读出来？
        if (nj.size <= 0) {
            nj.buf = (nj.buf << 8) | 0xFF;
            nj.bufbits += 8;
            continue;
        }
        newbyte = *nj.pos++; // 数据指针是按字节
        nj.size--;
        nj.bufbits += 8;
        nj.buf = (nj.buf << 8) | newbyte; // 高位最终会被覆盖掉，比如我要buf一个64位的值怎么办？
        if (newbyte == 0xFF) {
            if (nj.size) {
                unsigned char marker = *nj.pos++;
                nj.size--;
                switch (marker) {
                    case 0x00:
                    case 0xFF:
                        break;
                    case 0xD9: nj.size = 0; break;
                    default:
                        if ((marker & 0xF8) != 0xD0)
                            nj.error = NJ_SYNTAX_ERROR;
                        else {
                            nj.buf = (nj.buf << 8) | marker;
                            nj.bufbits += 8;
                        }
                }
            } else
                nj.error = NJ_SYNTAX_ERROR;
        }
    }
    return (nj.buf >> (nj.bufbits - bits)) & ((1 << bits) - 1);
}

NJ_INLINE void njSkipBits(int bits) {
    if (nj.bufbits < bits)
        (void) njShowBits(bits);
    nj.bufbits -= bits;
}

NJ_INLINE int njGetBits(int bits) {
    int res = njShowBits(bits);
    njSkipBits(bits);
    return res;
}

NJ_INLINE void njByteAlign(void) {
    nj.bufbits &= 0xF8; // (1111 1000)8的倍数，不满8的部分丢弃
}

static void njSkip(int count) {
    nj.pos += count; // 数据指针增加
    nj.size -= count; // 总体数据大小减去count
    nj.length -= count; // 当前marker长度减去count
    if (nj.size < 0) nj.error = NJ_SYNTAX_ERROR;
}

NJ_INLINE unsigned short njDecode16(const unsigned char *pos) {
    return (pos[0] << 8) | pos[1]; // 00000000 00001101
}

static void njDecodeLength(void) { // decode长度字段，这个方法调用一般都是已经进入到特定的marker之后
    if (nj.size < 2) njThrow(NJ_SYNTAX_ERROR);
    nj.length = njDecode16(nj.pos); // 该marker的长度(除去marker名字所占用的2个字节)
    if (nj.length > nj.size) njThrow(NJ_SYNTAX_ERROR);
    njSkip(2);
}

NJ_INLINE void njSkipMarker(void) {
    njDecodeLength();
    njSkip(nj.length);
}

NJ_INLINE void njDecodeSOF(void) { // 解析Start of Frame的时候就会把所需要的内存都分配好
    int i, ssxmax = 0, ssymax = 0;
    nj_component_t* c;
    njDecodeLength(); // 解析长度并移动数据指针
    if (nj.length < 9) njThrow(NJ_SYNTAX_ERROR);
    if (nj.pos[0] != 8) njThrow(NJ_UNSUPPORTED); // 样本精度，一般都是8
    nj.height = njDecode16(nj.pos + 1); // 图片高度/宽度
    nj.width = njDecode16(nj.pos + 3);
    nj.ncomp = nj.pos[5]; // 颜色分量数据，一般都是3
    njSkip(6); // 之前共6个字节数据，所以移动数据指针6个字节
    switch (nj.ncomp) { // 目前只支持1和3这两种
        case 1:
        case 3:
            break;
        default:
            njThrow(NJ_UNSUPPORTED);
    }
    if (nj.length < (nj.ncomp * 3)) njThrow(NJ_SYNTAX_ERROR); // 数据量肯定是要大于颜色分量数 multiply 3，因为接着存颜色分量信息的每个结构占3个字节
															  // 颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        c->cid = nj.pos[0]; // 颜色分量ID
        if (!(c->ssx = nj.pos[1] >> 4)) njThrow(NJ_SYNTAX_ERROR); // 高4位(水平因子)
        if (c->ssx & (c->ssx - 1)) njThrow(NJ_UNSUPPORTED);  // non-power of two
        if (!(c->ssy = nj.pos[1] & 15)) njThrow(NJ_SYNTAX_ERROR); // (00001111)低4位(垂直因子)
        if (c->ssy & (c->ssy - 1)) njThrow(NJ_UNSUPPORTED);  // non-power of two
        if ((c->qtsel = nj.pos[2]) & 0xFC) njThrow(NJ_SYNTAX_ERROR); // (11111101) 这里0xFC是用在这里干什么的？
        njSkip(3); // 移动数据指针到下一个颜色分量
        nj.qtused |= 1 << c->qtsel; // 这里是做什么用的？看不出来
        if (c->ssx > ssxmax) ssxmax = c->ssx; // 记录最大水平因子
        if (c->ssy > ssymax) ssymax = c->ssy; // 记录最大垂直因子
    }
    if (nj.ncomp == 1) { // 只有一种颜色分量的时候就简单啦
        c = nj.comp;
        c->ssx = c->ssy = ssxmax = ssymax = 1;
    }
    nj.mbsizex = ssxmax << 3; // MCU宽 是 水平采样因子最大值 multiply 8
    nj.mbsizey = ssymax << 3; // MCU高 是 垂直采样因子最大值 multiply 8
    nj.mbwidth = (nj.width + nj.mbsizex - 1) / nj.mbsizex; // 分子采用+ nj.mbsizex - 1就取到大于但是最接近(等于)宽度的值，
														   // 并且这个值是MCU宽度整数倍 // 这里是水平方向MCU的个数
    nj.mbheight = (nj.height + nj.mbsizey - 1) / nj.mbsizey; // 这里是垂直方向MCU的个数
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        c->width = (nj.width * c->ssx + ssxmax - 1) / ssxmax; // 采样宽度？ 最大水平/垂直因子的值就是图片原来的值，否则就会根据因子做相应的减少
        c->stride = (c->width + 7) & 0x7FFFFFF8; // (0111 1111 1111 1111 1111 1111 1111 1000) 做什么？以1234567结尾的都省略掉？
												 // 变成8的整数
												 // 补齐8位，注意前面有加7，所以总是不会比原来的少，比如原来是227，那么这里就会变成232
												 // 这是按照数据单元计算的，所以不对
		printf("%d, stride %d\n", i, c->stride);
        c->height = (nj.height * c->ssy + ssymax - 1) / ssymax;
        c->stride = nj.mbwidth * nj.mbsizex * c->ssx / ssxmax; // 再计算一遍stride有什么用？前面计算的是错误的，没有考虑MCU宽度
															   // 这里都已经是round过的了，所以直接计算
		printf("%d, stride again %d\n", i, c->stride);
        if (((c->width < 3) && (c->ssx != ssxmax)) || ((c->height < 3) && (c->ssy != ssymax))) njThrow(NJ_UNSUPPORTED);
        if (!(c->pixels = njAllocMem(c->stride * (nj.mbheight * nj.mbsizey * c->ssy / ssymax)))) njThrow(NJ_OUT_OF_MEM); // 为分量分配内存
																														 // 大小是所有MCU的
																														 // 可能比图片实际
																														 // 尺寸大
    }
    if (nj.ncomp == 3) { // 只有有3个颜色分量的时候才需要
        nj.rgb = njAllocMem(nj.width * nj.height * nj.ncomp);
        if (!nj.rgb) njThrow(NJ_OUT_OF_MEM);
    }
    njSkip(nj.length);
}

NJ_INLINE void njDecodeDHT(void) {
    int codelen, currcnt, remain, spread, i, j;
    nj_vlc_code_t *vlc;
    static unsigned char counts[16]; // 码字
    njDecodeLength();
    while (nj.length >= 17) { // 码字的数量(16) + 类型和ID(1)
        i = nj.pos[0]; // 类型和ID
        if (i & 0xEC) njThrow(NJ_SYNTAX_ERROR); // (11101100)
        if (i & 0x02) njThrow(NJ_UNSUPPORTED); // (00000010)
        i = (i | (i >> 3)) & 3;  // combined DC/AC + tableid value
								 // 直流0，直流1，交流0，交流1
        for (codelen = 1;  codelen <= 16;  ++codelen) // 码字长度
            counts[codelen - 1] = nj.pos[codelen]; // 读取码字
        njSkip(17);
        vlc = &nj.vlctab[i][0];
        remain = spread = 65536;
        for (codelen = 1;  codelen <= 16;  ++codelen) {
            spread >>= 1; // 干什么？
            currcnt = counts[codelen - 1];
            if (!currcnt) continue; // 如果该位数没有码字
            if (nj.length < currcnt) njThrow(NJ_SYNTAX_ERROR);
            remain -= currcnt << (16 - codelen);
            if (remain < 0) njThrow(NJ_SYNTAX_ERROR);
            for (i = 0;  i < currcnt;  ++i) { // 码字个数，同样位数的码字可以有多个
                register unsigned char code = nj.pos[i];
                for (j = spread;  j;  --j) { // 保存这么多个有什么作用？
                    vlc->bits = (unsigned char) codelen; // 码字位数
                    vlc->code = code; // 码字值
                    ++vlc;
                }
            }
            njSkip(currcnt);
        }
        while (remain--) {
            vlc->bits = 0;
            ++vlc;
        }
    }
    if (nj.length) njThrow(NJ_SYNTAX_ERROR);
}

NJ_INLINE void njDecodeDQT(void) {
    int i;
    unsigned char *t;
    njDecodeLength();
    while (nj.length >= 65) {
        i = nj.pos[0]; // QT信息，高4位为QT精度，低4位为QT号
        if (i & 0xFC) njThrow(NJ_SYNTAX_ERROR); // (1111 1110)这个用来检测QT号码是否正确的吗？目前精度好像都为0，所以这么写？
        nj.qtavail |= 1 << i; // XXX 直接通过这里转换为数量？
        t = &nj.qtab[i][0];
        for (i = 0;  i < 64;  ++i)
            t[i] = nj.pos[i + 1]; // 读取到QT数组当中，但应该还是按照文件流当中的排列
        njSkip(65);
    }
    if (nj.length) njThrow(NJ_SYNTAX_ERROR);
}

NJ_INLINE void njDecodeDRI(void) {
    njDecodeLength();
    if (nj.length < 2) njThrow(NJ_SYNTAX_ERROR);
    nj.rstinterval = njDecode16(nj.pos);
    njSkip(nj.length);
}

static int njGetVLC(nj_vlc_code_t* vlc, unsigned char* code) { // Variable Length Coding
    int value = njShowBits(16);
    int bits = vlc[value].bits;
    if (!bits) { nj.error = NJ_SYNTAX_ERROR; return 0; }
    njSkipBits(bits);
    value = vlc[value].code;
    if (code) *code = (unsigned char) value;
    bits = value & 15;
    if (!bits) return 0;
    value = njGetBits(bits);
    if (value < (1 << (bits - 1)))
        value += ((-1) << bits) + 1;
    return value;
}

NJ_INLINE void njDecodeBlock(nj_component_t* c, unsigned char* out) {
    unsigned char code = 0;
    int value, coef = 0;
    njFillMem(nj.block, 0, sizeof(nj.block));
    c->dcpred += njGetVLC(&nj.vlctab[c->dctabsel][0], NULL); // DC 0/1 不会和AC重复
    nj.block[0] = (c->dcpred) * nj.qtab[c->qtsel][0]; // DC // 这里是反量化？
    do {
        value = njGetVLC(&nj.vlctab[c->actabsel][0], &code); // DC 2/3
        if (!code) break;  // EOB
        if (!(code & 0x0F) && (code != 0xF0)) njThrow(NJ_SYNTAX_ERROR);
        coef += (code >> 4) + 1; // coefficient 系数
        if (coef > 63) njThrow(NJ_SYNTAX_ERROR);
        nj.block[(int) njZZ[coef]] = value * nj.qtab[c->qtsel][coef]; // AC 这里是反量化？
    } while (coef < 63);
    for (coef = 0;  coef < 64;  coef += 8)
        njRowIDCT(&nj.block[coef]); // 上面先Huffman解码/反量化，这里行(反DCT)
    for (coef = 0;  coef < 8;  ++coef)
        njColIDCT(&nj.block[coef], &out[coef], c->stride);
}

NJ_INLINE void njDecodeScan(void) {
    int i, mbx, mby, sbx, sby;
    int rstcount = nj.rstinterval, nextrst = 0;
    nj_component_t* c;
    njDecodeLength();
    if (nj.length < (4 + 2 * nj.ncomp)) njThrow(NJ_SYNTAX_ERROR);
    if (nj.pos[0] != nj.ncomp) njThrow(NJ_UNSUPPORTED);
    njSkip(1); // 颜色分量数量
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        if (nj.pos[0] != c->cid) njThrow(NJ_SYNTAX_ERROR); // 颜色分量ID
        if (nj.pos[1] & 0xEE) njThrow(NJ_SYNTAX_ERROR);
        c->dctabsel = nj.pos[1] >> 4; // 高4位为直流表DC Table
        c->actabsel = (nj.pos[1] & 1) | 2; // 低4位为交流表AC Table(这里有做特殊处理，所以AC的表名不会和DC相同)

		printf("DC/AC Huffman table ids: %d/%d\n", c->dctabsel, c->actabsel);	

        njSkip(2);
    }
    if (nj.pos[0] || (nj.pos[1] != 63) || nj.pos[2]) njThrow(NJ_UNSUPPORTED);
    njSkip(nj.length); // 忽略3个字节 通常为 00 3F 00
					   // 2 + 1 + 6 + 3为12字节，这个marker的长度刚好为12字节
					   // 接下来都是编码过的图像数据
    for (mbx = mby = 0;;) {
        for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) // 每个分量都要decode
            for (sby = 0;  sby < c->ssy;  ++sby) // 水平/垂直因子
                for (sbx = 0;  sbx < c->ssx;  ++sbx) {
                    njDecodeBlock(c, &c->pixels[((mby * c->ssy + sby) * c->stride + mbx * c->ssx + sbx) << 3]); // 读取原始编码过
																												// 的图片数据到block中
																												// 并反量化，反离散余弦变换
                    njCheckError();
                }
        if (++mbx >= nj.mbwidth) { // 读完所有的MCU，到达最右就返回从下一行开始
            mbx = 0;
            if (++mby >= nj.mbheight) break; // 到达最底行的时候推出，decode结束
        }
        if (nj.rstinterval && !(--rstcount)) { // restart marker
            njByteAlign();
            i = njGetBits(16);
            if (((i & 0xFFF8) != 0xFFD0) || ((i & 7) != nextrst)) njThrow(NJ_SYNTAX_ERROR);
            nextrst = (nextrst + 1) & 7;
            rstcount = nj.rstinterval;
            for (i = 0;  i < 3;  ++i)
                nj.comp[i].dcpred = 0;
        }
    }
    nj.error = __NJ_FINISHED;
}

#if NJ_CHROMA_FILTER

#define CF4A (-9)
#define CF4B (111)
#define CF4C (29)
#define CF4D (-3)
#define CF3A (28)
#define CF3B (109)
#define CF3C (-9)
#define CF3X (104)
#define CF3Y (27)
#define CF3Z (-3)
#define CF2A (139)
#define CF2B (-11)
#define CF(x) njClip(((x) + 64) >> 7)

// 通常我们放大图片的时候就需要upsampling，缩小的时候就downsampling，通称为resampling
// 这里Cb/Cr分量的会少些，所以需要upsampling

NJ_INLINE void njUpsampleH(nj_component_t* c) {
	printf("njUpsampleH %d\n", c->cid);
    const int xmax = c->width - 3;
    unsigned char *out, *lin, *lout;
    int x, y;
    out = njAllocMem((c->width * c->height) << 1);
    if (!out) njThrow(NJ_OUT_OF_MEM);
    lin = c->pixels;
    lout = out;
    for (y = c->height;  y;  --y) {
        lout[0] = CF(CF2A * lin[0] + CF2B * lin[1]);
        lout[1] = CF(CF3X * lin[0] + CF3Y * lin[1] + CF3Z * lin[2]);
        lout[2] = CF(CF3A * lin[0] + CF3B * lin[1] + CF3C * lin[2]);
        for (x = 0;  x < xmax;  ++x) {
            lout[(x << 1) + 3] = CF(CF4A * lin[x] + CF4B * lin[x + 1] + CF4C * lin[x + 2] + CF4D * lin[x + 3]);
            lout[(x << 1) + 4] = CF(CF4D * lin[x] + CF4C * lin[x + 1] + CF4B * lin[x + 2] + CF4A * lin[x + 3]);
        }
        lin += c->stride;
        lout += c->width << 1;
        lout[-3] = CF(CF3A * lin[-1] + CF3B * lin[-2] + CF3C * lin[-3]);
        lout[-2] = CF(CF3X * lin[-1] + CF3Y * lin[-2] + CF3Z * lin[-3]);
        lout[-1] = CF(CF2A * lin[-1] + CF2B * lin[-2]);
    }
    c->width <<= 1;
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

NJ_INLINE void njUpsampleV(nj_component_t* c) {
	printf("njUpsampleV %d\n", c->cid);
    const int w = c->width, s1 = c->stride, s2 = s1 + s1;
    unsigned char *out, *cin, *cout;
    int x, y;
    out = njAllocMem((c->width * c->height) << 1);
    if (!out) njThrow(NJ_OUT_OF_MEM);
    for (x = 0;  x < w;  ++x) {
        cin = &c->pixels[x];
        cout = &out[x];
        *cout = CF(CF2A * cin[0] + CF2B * cin[s1]);  cout += w;
        *cout = CF(CF3X * cin[0] + CF3Y * cin[s1] + CF3Z * cin[s2]);  cout += w;
        *cout = CF(CF3A * cin[0] + CF3B * cin[s1] + CF3C * cin[s2]);  cout += w;
        cin += s1;
        for (y = c->height - 3;  y;  --y) {
            *cout = CF(CF4A * cin[-s1] + CF4B * cin[0] + CF4C * cin[s1] + CF4D * cin[s2]);  cout += w;
            *cout = CF(CF4D * cin[-s1] + CF4C * cin[0] + CF4B * cin[s1] + CF4A * cin[s2]);  cout += w;
            cin += s1;
        }
        cin += s1;
        *cout = CF(CF3A * cin[0] + CF3B * cin[-s1] + CF3C * cin[-s2]);  cout += w;
        *cout = CF(CF3X * cin[0] + CF3Y * cin[-s1] + CF3Z * cin[-s2]);  cout += w;
        *cout = CF(CF2A * cin[0] + CF2B * cin[-s1]);
    }
    c->height <<= 1;
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

#else

NJ_INLINE void njUpsample(nj_component_t* c) {
	printf("njUpsample %d\n", c->cid);
    int x, y, xshift = 0, yshift = 0;
    unsigned char *out, *lin, *lout;
    while (c->width < nj.width) { c->width <<= 1; ++xshift; }
    while (c->height < nj.height) { c->height <<= 1; ++yshift; }
    out = njAllocMem(c->width * c->height); // 放大后的尺寸
    if (!out) njThrow(NJ_OUT_OF_MEM);
    lin = c->pixels;
    lout = out;
    for (y = 0;  y < c->height;  ++y) {
        lin = &c->pixels[(y >> yshift) * c->stride];
        for (x = 0;  x < c->width;  ++x)
            lout[x] = lin[x >> xshift];
        lout += c->width;
    }
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

#endif

NJ_INLINE void njConvert() {
    int i;
    nj_component_t* c;
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) { // 如果需要的话就upsampling
        #if NJ_CHROMA_FILTER
            while ((c->width < nj.width) || (c->height < nj.height)) {
                if (c->width < nj.width) njUpsampleH(c);
                njCheckError();
                if (c->height < nj.height) njUpsampleV(c);
                njCheckError();
            }
        #else
            if ((c->width < nj.width) || (c->height < nj.height))
                njUpsample(c);
        #endif
        if ((c->width < nj.width) || (c->height < nj.height)) njThrow(NJ_INTERNAL_ERR);
    }
    if (nj.ncomp == 3) { // SEE njGetImage()
        // convert to RGB
        int x, yy;
        unsigned char *prgb = nj.rgb;
        const unsigned char *py  = nj.comp[0].pixels;
        const unsigned char *pcb = nj.comp[1].pixels;
        const unsigned char *pcr = nj.comp[2].pixels;
		// 多余的数据(编/解码是对齐用的)会被丢弃吗？
        for (yy = nj.height;  yy;  --yy) { // 列
            for (x = 0;  x < nj.width;  ++x) { // 行
                register int y = py[x] << 8; // 这是为什么？ 色彩空间转换公式计算需要
                register int cb = pcb[x] - 128; // YCbCr的Cb和Cr一般都是有符号数，但是在JPEG当中都是无符号数
                register int cr = pcr[x] - 128;
                *prgb++ = njClip((y            + 359 * cr + 128) >> 8); // 色彩空间转换，YCbCr到RGB
                *prgb++ = njClip((y -  88 * cb - 183 * cr + 128) >> 8);
                *prgb++ = njClip((y + 454 * cb            + 128) >> 8);
            }
            py += nj.comp[0].stride; // 移动YCbCr数据指针，每一行都是有stride的，所以当需要的数据都得到时，后面的就不管，直接丢弃，移动到下一行
            pcb += nj.comp[1].stride;
            pcr += nj.comp[2].stride;
        }
    } else if (nj.comp[0].width != nj.comp[0].stride) { // 如果宽度和stride都一样，什么都不用做
        // grayscale -> only remove stride
        unsigned char *pin = &nj.comp[0].pixels[nj.comp[0].stride];
        unsigned char *pout = &nj.comp[0].pixels[nj.comp[0].width];
        int y;
        for (y = nj.comp[0].height - 1;  y;  --y) {
            njCopyMem(pout, pin, nj.comp[0].width);
            pin += nj.comp[0].stride;
            pout += nj.comp[0].width;
        }
        nj.comp[0].stride = nj.comp[0].width;
    }
}

void njInit(void) {
    njFillMem(&nj, 0, sizeof(nj_context_t)); // 初始化nj_context_t
}

void njDone(void) {
    int i;
    for (i = 0;  i < 3;  ++i)
        if (nj.comp[i].pixels) njFreeMem((void*) nj.comp[i].pixels);
    if (nj.rgb) njFreeMem((void*) nj.rgb);
    njInit();
}

nj_result_t njDecode(const void* jpeg, const int size) {
    njDone();
    nj.pos = (const unsigned char*) jpeg;
    nj.size = size & 0x7FFFFFFF; // ？
    if (nj.size < 2) return NJ_NO_JPEG;
    if ((nj.pos[0] ^ 0xFF) | (nj.pos[1] ^ 0xD8)) return NJ_NO_JPEG; // 不以0xFFD8打头(为什么要用异或来判断？)
    njSkip(2);
    while (!nj.error) { // 有“错误”的时候离开
        if ((nj.size < 2) || (nj.pos[0] != 0xFF)) return NJ_SYNTAX_ERROR; // 太小，或者不以0xFF打头
        njSkip(2); // 移动到标签的后面(长度字段的前面)
        switch (nj.pos[-1]) {
            case 0xC0: njDecodeSOF();  break;
            case 0xC4: njDecodeDHT();  break;
            case 0xDB: njDecodeDQT();  break;
            case 0xDD: njDecodeDRI();  break;
            case 0xDA: njDecodeScan(); break;
            case 0xFE: njSkipMarker(); break;
            default:
                if ((nj.pos[-1] & 0xF0) == 0xE0) // JPG0和APP0字段，目前都忽略
                    njSkipMarker();
                else
                    return NJ_UNSUPPORTED;
        }
    }
    if (nj.error != __NJ_FINISHED) return nj.error;
    nj.error = NJ_OK;
    njConvert();
    return nj.error;
}

int njGetWidth(void)            { return nj.width; }
int njGetHeight(void)           { return nj.height; }
int njIsColor(void)             { return (nj.ncomp != 1); }
unsigned char* njGetImage(void) { return (nj.ncomp == 1) ? nj.comp[0].pixels : nj.rgb; } // 一/三个分量
int njGetImageSize(void)        { return nj.width * nj.height * nj.ncomp; }

#endif // _NJ_INCLUDE_HEADER_ONLY

JPEG学习笔记

阅读资料:
JPEG File Interchange Format Version 1.02
JEITA CP-3451 Exif Version 2.2
DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES – REQUIREMENTS AND GUIDELINES
The JPEG Still Picture Compression Standard
JPEG 简易文档 V2.15 云风
libjpeg 6b
JPEG 原理详细实例分析及其在嵌入式 Linux 中的应用 http://www.ibm.com/developerworks/cn/linux/l-cn-jpeg/
JPEG文件编/解码详解 http://blog.csdn.net/lpt19832003/article/details/1713718
Huffman 编码压缩算法 http://coolshell.cn/articles/7459.html
A SIMPLE EXAMPLE OF HUFFMAN CODING ON A STRING http://nerdaholyc.com/a-simple-example-of-huffman-coding-on-a-string/

这是一篇自己学习JPEG的笔记，会不断完善，多数是别人的啦，少许自己理解，如有理解错误还请指出！

1、JPEG File格式分为两个部分：标记码和压缩数据
标记码由2个字节组成，以0xFF打头，后面的一个字节根据含义不同而定。每个标记码之前还可以加任意个0xFF来填充，他们没有什么意义，也就是说连续的多个0xFF被解释成一个0xFF。

2、
压缩算法是JPEG
色彩空间是YCbCr
APP0标记码是必须的

字节序都是大端

YCbCr的Cb和Cr一般都是有符号数，但是在JPEG当中都是无符号数，所以目前在JPEG当中的做法是RGB到YCbCr的转换是计算好Cb和Cr之后将其值增加128，再做后续的运算。
YCbCr到RGB的计算是先将Cb和Cr的值减去128后再来做转换。

假设一个数据单元，8 x 8的原始图像如下(转化为YCbCr之后的数据)：

52   55   61   66   70   61   64   73

63   59   55   90   109  85   69   72

62   59   68   113  144  104  66   73

63   58   71   122  154  106  70   69

67   61   68   104  126  88   68   70

79   65   60   70   77   68   58   75

85   71   64   59   55   61   65   83

87   79   69   68   65   76   78   94

3、DCT(Discrete Cosine Transform)
在做FDCT(Forward DCT)变换的时候，Y，Cb和Cr的值都范围都是-128 ~ 127，所以都要被减去128。
适配范围，每个点都减去128：

-76  -73  -67  -62  -58  -67  -64  -55

-65  -69  -73  -38  -19  -43  -59  -56

-66  -69  -60  -15  16   -24  -62  -55

-65  -70  -57  -6   26   -22  -58  -59

-61  -67  -60  -24  -2   -40  -60  -58

-49  -63  -68  -58  -51  -60  -70  -53

-43  -57  -64  -69  -73  -67  -63  -45

-41  -49  -59  -60  -63  -52  -50  -34

使用FDCT并四舍五入取最接近的整数：

-415 -30  -61  27   56   -20  -2   0

4    -22  -61  10   13   -7   -9   5

-47  7    77   -25  -29  10   5    -6

-49  12   34   -15  -10  6    2    2

12   -7   -13  -4   -2   2    -3   3

-8   3    2    -6   -2   1    4    2

-1   0    0    -2   -1   -3   4    -1

0    0    -1   -4   -1   0    1    2

DC(Direct Current)/AC(Alternating Current)
什么是DC？什么是AC？

4、量化：

-26  -3   -6   2    2    -1   0    0

0    -2   -4   1    1    0    0    0

-3   1    5    -1   -1   0    0    0

-4   1    2    -1   0    0    0    0

1    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

5、Zigzag
Zigzag的好处，在内存当中连续的点在图片上也是相邻的了，而且后面都是连续的0，可以继续使用RLE压缩。
于是就变成：

−26,−3,0,−3,−2,−6,2,−4,1,−4,1,1,5,1,2,−1,1,−1,2,0,0,0,0,0,−1,−1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

实际上我们的DC系数在编码的时候不会算到Zigzag里面，因为DC系数值比较大，并且相邻的两个数据单元的DC差值不会很大，所以JPEG使用了差分脉冲调制编码(DPCM)对相邻两个数据单元之间的DC系数进行差值编码，这是利用了两个数据单元之间的相关性。
对其他的63个AC系数会采用Zigzag编码。

6、RLE(Run Length Coding)
我们来用一个简单的例子来详细说明一下，假设以下是使用Zigzag编码过的63个AC系数数据：
57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0,0,0,0,0,..,0
可以表示为
(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2，1) ; EOB
EOB实际就是结束，也可以用(0,0)表示，如果最后面不是以0结束的就不需要
解释下这个含义，编码后的含义就是说在57之前有0个0，在45之前有0个0，在23之前有4个0，在-30之前有1个0，以此类推。
需要值得注意的是，我们后面还会对这样的数据使用Huffman压缩，但是该算法要求这个表示0的个数的值的位数是4bit，也就是说能保存的值的范围是0 ~ 15。
所以碰到连续大于15个0的情况的时候，我们会拆分(15,0) ; (3,2) ; … ; (15,0) ; (15,0) ; (1,4)，用(15,0)表示连续的16个0，也就是表示：
19个0,2, … ,33个0,4。

7、Canonical Huffman Code
在做好这些工作之后，JPEG还会对数据进行压缩，利用Canonical Huffman Code编码，对出现频率更高的数据采用更短的码字来编码。
这里将数值按照位数分为了16组。

               数值                 组              实际保存值
                0                   0                   -
              -1,1                  1                  0,1
           -3,-2,2,3                2              00,01,10,11
     -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
       -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
      -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
      -63,..,-32,32,..,63           6                   .
     -127,..,-64,64,..,127          7                   .
    -255,..,-128,128,..,255         8                   .
    -511,..,-256,256,..,511         9                   .
   -1023,..,-512,512,..,1023       10                   .
  -2047,..,-1024,1024,..,2047      11                   .
  -4095,..,-2048,2048,..,4095      12                   .
  -8191,..,-4096,4096,..,8191      13                   .
 -16383,..,-8192,8192,..,16383     14                   .
-32767,..,-16384,16384,..,32767    15                   .

之前RLE之后的结果：
(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2，1) ; EOB
现在仅对后面的值进行变换，前面表示0的个数的值不动。

    57是第6组的，实际保存值为111001，所以被编码为(6,111001)
    45，同样的操作，编码为 (6,101101)
    23  ->  (5,10111)
   -30  ->  (5,00001)
    -8  ->  (4,0111)
     1  ->  (1,1)

前面的那串数字就变成了:
(0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ; (2,1), 1 ; (0,0)

括号里的数值正好合成一个字节，后面被编码的数字表示范围是-32767 ~ 32767。
合成的字节里，高4位是前续0的个数，低4位描述了后面数字的位数。

接着变，下面的变化就需要去查找Huffman编码表了，这个是在压缩之前就应该构建好的，这个表是将0 ~ 255的8位定长数根据其出现的频率不同映射成为1 ~ 16位不定长数，频率大的小于8位，频率小的高于8位。这点很重要，当然这个表是如何构建出来的也很重要。
现在我们假设查表得知：

 6 = (0,6)    ---  111000    (注: 6 = 0 * 16 + 6 = 0x06)
69 = (4,5)    ---  1111111110011001    (注: 69 = 4 * 16 + 5 = 0x45)
21 = (1,5)    ---  11111110110
4  = (0,4)    ---  1011
33 = (2,1)    ---  11011
 0 = EOB = (0,0) ---  1010

那么最终我们得到的AC系数按位流就如下：
111000 111001 111000 101101 1111111110011001 10111 11111110110 00001 1011 0111 11011 1 1010

好奇：看起来前面括号里用的编码(范围为0 ~ 255)和后面编码的数字范围为-32767 ~ 32767的来自不同的编码基础？这样不会混淆么？
前面的编码数据(也就是说之前组合起来的一字节的数据，高4位0的个数，低4位非0数据编码后所占的位数)可以根据Huffman表查询出来，这样我们就可以得到紧跟着的数据的位数，读取相应的数据，所以不会混淆。

8、DC编码
DC在每个数据单元之中只有一个，并且连续数据单元之间的DC有紧密的联系，所以JPEG当中就采用的差分脉冲调制编码(DPCM)。
相邻单元之间
Diff = DC(i) – DC(i-1)
所以
DC(i) = DC(i-1) + Diff

我们保存的DC都是和上一个DC的差值，所以DC(0)等于多少？云风说是0，我觉得不应该啊。
目前DC相关这部分需要进一步证实。
假设Diff为-511
就会被编码成(9, 000000000)
假设查表(一般在JPEG文件当中Y和C是不同的表，另外还分DC和AC，所以通常就有4个Huffman表)的9的Huffman编码为1111110，那么整个DC的二进制表示就为1111110 000000000。

将DC的数据流放到AC之前就可以组成完整的编码数据流。

1111110 000000000  111000 111001  111000 101101  1111111110011001 10111  11111110110 00001  1011 0111  11011 1  1010

9、疑问：
MCU和DU之间有关系吗？看起来这些压缩都是在DU当中进行的。
每个Y，Cb，Cr分量都要来这样压缩一次吗？还是3个分量一起压缩？或者说是可以根据采样的不同，将部分分量放在一起压缩，比如Cb和Cr一起压缩，Y单独压缩。
看起来Cb可以和Cr放在一起压缩。
UPDATE：
MCU可能有一个或者多个DU组成，压缩都是按DU来的，分量都是分开压缩的，一个MCU当中的Y，Cb，Cr都是分别压缩的，然后按MCU组装成字节流。

10、JFIF文件格式分析实例：

SOI(Start of Image)

APP0(JFIF application segment)

APP1(application reserved)
…
APP15

DQT(Define Quantization Table)

DHT(Difine Huffman Table)

SOF(Start of Frame)

SOS(Start of Scan)

EOI(End of Image)

IFD(Image File Directories)

标记的前2个字节为名字，也就是前面所说的标记码，接着的2个字节为该标记的所占空间大小(不包含标记名字的2个字节，但包含自己在内)
比如这一段数据
FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 00 01 00 00
当中
0xFFE0为标记码，也就是APP0
0x0010为该标记所占大小，也就是16
后面跟的14个字节(“4A 46 49 46 00 01 01 00 00 01 00 01 00 00”)就是标记内容
根据SPEC的描述：
5字节为标识符，这里就是JFIF0
2字节为版本号(主版本和次版本各占1个字节)，这里就是1.01
1字节为密度单位，0表示没有单位，1表示每英寸点数，2表示每厘米点数
2字节为水平像素密度，这里是1
2字节为垂直像素密度，这里是1
1字节为缩略图水平像素总数，这里是0
1字节为缩略图垂直像素总数，这里为0
如果没有缩略图，这两个缩略图相关的字段值都必须要为0，如果不为0就表示后面还有缩略图的RGB数据，大小是3字节的倍数。

对于DQT

FF DB 00 43 00 08 06 06 07 06 05 08 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 39 3D 38 32 3C 2E 33 34 32
FF DB 00 43 01 09 09 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32

0xFFDB为标记码
0x0043为所占大小，也就是67(除去本身所占2字节，大小跟精度有关)
1个字节为QT信息，高4位为QT精度，低4位为QT号
还原成矩阵形式(Zigzag)

8   6   5   8   12  20  26  31

6   6   7   10  13  29  30  28

7   7   8   12  20  29  35  28

7   9   11  15  26  44  40  31

9   11  19  28  34  55  52  39

12  18  28  32  41  52  57  46 

25  32  39  44  52  61  60  51

36  46  48  49  56  50  52  50

就是这样(JPEG 原理详细实例分析及其在嵌入式 Linux 中的应用这篇文章中应该画错了)
另外一个QT也是同样的方法可以还原。

对于SOF0这段数据
FF C0 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 01
当中
0xFFC0表示SOF0
2个字节为长度，0x0011，即17
1个字节为数据样本的精度，这里是8位
2个字节表示图像的高度，这里是149
2个字节表示图像的宽度，这里是227
1个字节表示颜色分量数，JPEG都是YCrCb，即3
9个字节表示颜色分量信息，这里字节数是颜色分量数 multiply 3
因为分量信息中颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
H 2:1:1
V 2:1:1
所以总体采样因子就是(2 * 2):(1 * 1):(1 * 1)，即4:1:1

MCU宽是水平采样因子最大值 multiply 8(记该最大值为Hmax)
MCU高是垂直采样因子最大值 multiply 8(记该最大值为Vmax)
因此这里就是(Hmax * 8):(Vmax * 8) = 16:16

如果整幅图片的高度或者宽度不是MCU的整数倍，就需要padding，解码之后丢弃大于宽度或者高度部分的数据

在数据流当中，MCU是按从左到右，从上到下来排列的。

因为每个MCU由若干数据单元组成，而数据单元又必须是8:8的，所以MCU当中数据单元的个数就是4(Hmax * Vmax)

对于DHT，如下就有4个Huffman表

FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B
FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 14 32 81 91 A1 08 23 42 B1 C1 15 52 D1 F0 24 33 62 72 82 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA
FF C4 00 1F 01 00 03 01 01 01 01 01 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B
FF C4 00 B5 11 00 02 01 02 04 04 03 04 07 05 04 04 00 01 02 77 00 01 02 03 11 04 05 21 31 06 12 41 51 07 61 71 13 22 32 81 08 14 42 91 A1 B1 C1 09 23 33 52 F0 15 62 72 D1 0A 16 24 34 E1 25 F1 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 F4 F5 F6 F7 F8 F9 FA

2个字节为标记码
2个字节为长度
1个字节为类型和ID，高4位为类型，0为DC直流，1为AC交流；低4位为ID
也就是说这里有4个Huffman编码表，直流0，交流0，直流1，交流1
16个字节为码字的数量，以第一张表为例子，没有1位的码字，2位的码字1个，3位的码字5个，4到9位的码字各1个，没有9位以上的码字，所以一共是12个码字
剩下的12字节为编码内容(这12是根据码字数量得出的，1 + 5 + 1 + 1 + 1 + 1 + 1 + 1)，
00 01 02 03 04 05 06 07 08 09 0A 0B
所以转换成二进制的话就应当如下：

位数      代码                 码字
2         00                  00
3         01/02/03/04/05      001/010/011/100/101
4         06                  0110
5         07                  00111
6         08                  001000
7         09                  0001001
8         0A                  00001010
9         0B                  000001011

UPDATE: 这上面的编码内容的理解应该是错误的，这是参照http://www.ibm.com/developerworks/cn/linux/l-cn-jpeg/来的。
实际应该是：

Length    Code                Codeword    
2         00                  00
3         01                  010
3         02                  011
3         03                  100
3         04                  101
3         05                  110
4         06                  1110
5         07                  11110
6         08                  111110
7         09                  1111110
8         0A                  11111110
9         0B                  111111110

这个Huffman表是如何构建的，得好好研究，另外后面的压缩都会用到这个表的内容。
Huffman编码表的构建原理
第1个码字必须是0(根据其位数具体表现为0或00或000等等，以此类推)
下一个码字在前面一个的基础上加1，如果位数有增加，则加1之后补零到相应位数
值为00的为结束标志(EOB)，编码或者解码时候注意其码字

EXIF通常是放在APP1当中
FF E1 66 59 …… FF D9
0xFFE1(2字节)就是APP1的Marker名字，0x6559(2字节)是该字段的长度，为26201，但是不包含TAG名字所占用的2个字节。也就是说后面的实际内容占用了26199个字节，直到0xFFD9，这就是EXIF的内容。为什么会有个0xFFD9呢？我们都知道0xFFD9是EOI，因为EXIF里面的缩略图实际上也是一个完整的JPEG图片，它也有SOI等等这一些Marker。
参见JEITA CP-3451 Exif Version 2.2的Figure 7 Structure of Exif file with compressed thumbnail
因为EXIF并不是JPEG必须的，所以你把这之间的内容整个都拿掉(用16进制编辑器很好做到，有兴趣的可以试试看)，对图片也没有什么影响，只是少了所有EXIF的信息。

11、Linux下编译libjpeg
download source tree
uncompress it
cd to source tree folder
clean the source tree

$ ./configure --enable-shared --enable-static

Maybe it will complain this message
./configure: 1562: ./ltconfig: not found // Here
but it does not matter, so we do not care it.

$ make

if it complains ‘make: ./libtool: Command not found’
then fix your configure file from
LIBTOOL=”./libtool”
to
LIBTOOL=”libtool”
‘Cause under your current source tree, there is of course no file named libtool, so it can not exec that command, so replace it with the system-wide libtool. BUT the prerequisite thing is that you have installed libtool on your host machine.

Then compile it again, everything should be okay.

After completed, remember there will be a folder name ‘.libs’ generated under your source tree folder, this is a HIDDEN folder, everything you need just under it.

$ ./cjpeg testimg.bmp > hello.jpeg

a new compressed jpeg file is generated

$ ./djpeg -bmp testimg.jpg > hello.bmp

a new bmp file is generated

use your imagination to do anything.

Good luck!

SurfaceFlinger源码分析

针对Jelly Bean版本的代码。
SurfaceFlinger是什么，这些介绍大家可以在网络上找找看，这里就直接上代码。

首先我们得了解一种常用的编程做法，生产者/消费者模型，也许都会觉得很简单，但是这里就用到了很多这些基本概念。
BufferQueue 数据都queue到这里面，前提是它是先从BufferQueue取出一个空的数据单元，称为一个buffer，实际为GraphicBuffer类型。

ConsumerBase 它是消费者端使用的接口，它实现了BufferQueue::ConsumerListener接口，也就是BufferQueue当中有buffer被queue的时候，它能被通知到(onFrameAvailable)。同理当生产者disconnect与BufferQueue的连接或者setBufferCount被调用(该方法释放掉所有buffer，让buffer都归BufferQueue所有，如果有buffer处于DEQUEUED状态，此方法返回错误)，它也会被通知到(onBuffersReleased)。

BufferItemConsumer和CpuConsumer 它们都是ConsumerBase的子类，BufferItemConsumer一次可以acquire多个buffer，ConsumerBase一次只能一个，BufferItemConsumer是修改了BufferQueue的mMaxAcquiredBufferCount参数，ConsumerBase使用的默认值1。CpuBuffer可以把buffer锁起来供CPU使用，它也是调用GRALLOC的方法来完成这个功能的。
FramebufferSurface ConsumerBase的子类，会把收到的数据通过HWComposer往荧幕上贴。
SurfaceTexture ConsumerBase的子类，它可以把GraphicBuffer转换成texture image，然后交给OpenGL。

SurfaceTextureLayer是一个定制化的BufferQueue，NATIVE_WINDOW_API_MEDIA/NATIVE_WINDOW_API_CAMERA过来的请求会把BufferQueue设置为异步模式。

BufferQueue当中buffer的状态，这个很简单，但是也很重要。

// BufferState represents the different states in which a buffer slot
// can be.
enum BufferState {
    // FREE indicates that the buffer is not currently being used and
    // will not be used in the future until it gets dequeued and
    // subsequently queued by the client.
    // aka "owned by BufferQueue, ready to be dequeued"
    FREE = 0,

    // DEQUEUED indicates that the buffer has been dequeued by the
    // client, but has not yet been queued or canceled. The buffer is
    // considered 'owned' by the client, and the server should not use
    // it for anything.
    //
    // Note that when in synchronous-mode (mSynchronousMode == true),
    // the buffer that's currently attached to the texture may be
    // dequeued by the client.  That means that the current buffer can
    // be in either the DEQUEUED or QUEUED state.  In asynchronous mode,
    // however, the current buffer is always in the QUEUED state.
    // aka "owned by producer, ready to be queued"
    DEQUEUED = 1,

    // QUEUED indicates that the buffer has been queued by the client,
    // and has not since been made available for the client to dequeue.
    // Attaching the buffer to the texture does NOT transition the
    // buffer away from the QUEUED state. However, in Synchronous mode
    // the current buffer may be dequeued by the client under some
    // circumstances. See the note about the current buffer in the
    // documentation for DEQUEUED.
    // aka "owned by BufferQueue, ready to be acquired"
    QUEUED = 2,

    // aka "owned by consumer, ready to be released"
    ACQUIRED = 3
};

BufferQueue主要方法
dequeueBuffer取一个buffer(返回slot，这个bufer是从State为FREE的当中取的)给client使用，必要时候(null/height/width/format/usage任何一点不满足都会触发)它会使用GraphicBufferAlloc::createGraphicBuffer()去分配buffer

requestBuffer根据一个指定的slot获取它的buffer的地址，这个主要用在刚刚分配buffer之后(或者是意外的发现指定slot的buffer地址为空)，目前在SurfaceTextureClient(Surface)当中被使用到

queueBuffer通知BufferQueue压入了一个装满数据的buffer，QueueBufferInput是该buffer的描述数据，QueueBufferOutput是BufferQueue当前的状态(默认height/width/transformHint/slot的数量，这个slot只是当前被还回给BufferQueue)

acquireBuffer获取一个pending buffer的拥有权，这个buffer是mQueue当中，也就是状态为QUEUED的(有没有数据？)。

releaseBuffer放弃持有的指定slot的buffer

freeBuffer或者cancelBuffer都会导致这个buffer处于FREE状态

ConsumerBase的主要方法
acquireBufferLocked/releaseBufferLocked/freeBufferLocked/abandonLocked

另外这个protected的数组也很重要，子类可以直接从它里面获取buffer的信息，它实际就相当于缓存了BufferQueue的一些必要信息。

// mSlots stores the buffers that have been allocated by the BufferQueue
// for each buffer slot.  It is initialized to null pointers, and gets
// filled in with the result of BufferQueue::acquire when the
// client dequeues a buffer from a
// slot that has not yet been used. The buffer allocated to a slot will also
// be replaced if the requested buffer usage or geometry differs from that
// of the buffer allocated to a slot.
Slot mSlots[BufferQueue::NUM_BUFFER_SLOTS];

SurfaceTextureClient是一个ANativeWindow，为native_window_api_*和native_window_*方法(这些都在system/core/include/system/window.h当中)做具体实现，另外它还持有SurfaceTexture。

// Initialize the ANativeWindow function pointers.
ANativeWindow::setSwapInterval  = hook_setSwapInterval;
ANativeWindow::dequeueBuffer    = hook_dequeueBuffer;
ANativeWindow::cancelBuffer     = hook_cancelBuffer;
ANativeWindow::queueBuffer      = hook_queueBuffer;
ANativeWindow::query            = hook_query;
ANativeWindow::perform          = hook_perform;

ANativeWindow::dequeueBuffer_DEPRECATED = hook_dequeueBuffer_DEPRECATED;
ANativeWindow::cancelBuffer_DEPRECATED  = hook_cancelBuffer_DEPRECATED;
ANativeWindow::lockBuffer_DEPRECATED    = hook_lockBuffer_DEPRECATED;
ANativeWindow::queueBuffer_DEPRECATED   = hook_queueBuffer_DEPRECATED;

const_cast<int&>(ANativeWindow::minSwapInterval) = 0;
const_cast<int&>(ANativeWindow::maxSwapInterval) = 1;

一些重要的命名改动

早期的Jelly Bean当中，比如(4.1/4.2)                  4.3
================================================================================
SurfaceTextureClient和Surface(继承                 被简化成了Surface(ANativeWindow)
自SurfaceTextureClient)实际就是一个
ANativeWindow

================================================================================
ISurfaceTexture                                   IGraphicBufferProducer，
                                                  Binder IPC接口，用来在不同组件之间
                                                  传输数据使用(跨进程的)，BufferQueue
                                                  实现了BnGraphicBufferProducer

================================================================================
SurfaceTexture(ConsumerBase)                      GLConsumer(ConsumerBase)它取
                                                  BufferQueue里面的数据，然后作为一个
                                                  texture提供给OpenGL使用

上面是会用到的基本知识，下面基本才直接和SurfaceFlinger相关。
箭头的方向为继承的方向

                             BpSurface       ---->>>>      ISurface
                                                           sp<ISurfaceTexture> ISurface::getSurfaceTexture()

BSurface       ---->>>>      BnSurface       ---->>>>      ISurface
sp<ISurfaceTexture> BSurface::getSurfaceTexture()
        SurfaceTexture::getBufferQueue()

Layer       ---->>>>      LayerBaseClient       ---->>>>       LayerBase
sp<ISurface> Layer::createSurface()
        new BSurface

                          sp<ISurface> LayerBaseClient::getSurface()
                                  sp<ISurface> LayerBaseClient::createSurface()

                           BpSurfaceComposerClient       ---->>>>      ISurfaceComposerClient
                                                                       sp<ISurface> ISurfaceComposerClient::createSurface()

Client       ---->>>>      BnSurfaceComposerClient       ---->>>>      ISurfaceComposerClient
Client::createSurface()
        SurfaceFlinger::createLayer()
                createXXXLayer()
                        new LayerXXX
                Layer::getSurface()
                	Layer::createSurface()

sp<SurfaceControl> SurfaceComposerClient::createSurface()
              ISurfaceComposerClient::createSurface()
              new SurfaceControl(ISurface)
// SurfaceComposerClient只是个普通的工具类，它的createSurface会去调用ISurfaceComposerClient和createSurface

现在来看一种情况，假设客户端要创建一个SurfaceView，这中间会发生什么样的事情。
当然你先得了解在Java层当中SurfaceView/SurfaceHolder/Surface这三者是什么关系。

=================================Java=====================================================
new SurfaceView
	surface = new Surface // 这个是SurfaceView当中的Surface(这都是空的，不会在服务端真正的去创建一个Surface)
	newSurface = new Surface // 这个是新的Surface，当Surface改变/被创建/被销毁/需要重绘，
							 // 都会是现在系统层准备好，然后再复制来替代我们SurfaceView当
							 // 中的原来的Surface(通过transferFrom完成)

真正创建Surface的方法是系统去调用的，app不会直接去调用，但是一旦被调用之后就会进入到JNI层相应方法之中，
会用到一个SurfaceSession，书面解释是表示到Surface Flinger的一次会话，因为客户端要同服务端沟通，就存在这样一个会话的概念，这个实际就是Native层SurfaceComposerClient的一个实例。

=================================JNI&Native========================================
android_view_Surface.cpp nativeCreate()
	android_view_SurfaceSession_getClient
	SurfaceComposerClient->createSurface
		ISurfaceComposerClient->createSurface // IPC
			Client->createSurface
				SurfaceFlinger->createLayer
					createXXXLayer()
						new LayerXXX
					Layer->getSurface()
		new SurfaceControl // SurfaceControl包含创建出来的ISurface
	setSurfaceControl // 保存到JNI Context当中

这样Isurface就创建好了

再来看另外一路发生了什么事情，Window/View System需要初始化整个Window，这样在SurfaceView当中一些callback(比如resize/new-surface/onWindowVisibilityChanged/setVisibility/onDetachedFromWindow)就会被调用到，这个时候最终会去调用updateWindow，然后IWindowSession.relayout之后就会有新的Surface被产生出来，然后通过Surface.transferFrom复制到SurfaceView的Surface当中。

还有一点注意的地方Java层的Surface(Surface.java)是如何转化为Native层的Surface(Surface.h|cpp，也就是SurfaceTextureClient)的，注意Surface.java持有一个名为mNativeSurface的Surface.h|cpp的指针，然后每次新创建Native层的Surface之后，就会把它保存到JNI Context当中，然后Java/Native就是通过这么来转换的。

接着我们就只看Native层Surface的管理，android_view_Surface.h|cpp当中有这么个方法android_view_Surface_getNativeWindow
而它又去调用一个内部方法getSurface，如下：

static sp<Surface> getSurface(JNIEnv* env, jobject surfaceObj) {
    sp<Surface> result(android_view_Surface_getSurface(env, surfaceObj)); // 如果取出来为空
    if (result == NULL) {
        /*
         * if this method is called from the WindowManager's process, it means
         * the client is is not remote, and therefore is allowed to have
         * a Surface (data), so we create it here.
         * If we don't have a SurfaceControl, it means we're in a different
         * process.
         */

        SurfaceControl* const control = reinterpret_cast<SurfaceControl*>(
                env->GetIntField(surfaceObj, gSurfaceClassInfo.mNativeSurfaceControl));
        if (control) {
            result = control->getSurface(); // 创建Surface(SurfaceTextureClient)
            if (result != NULL) {
                result->incStrong(surfaceObj);
                env->SetIntField(surfaceObj, gSurfaceClassInfo.mNativeSurface, // Native关联变量，gui/Surface.h
                        reinterpret_cast<jint>(result.get()));
            }
        }
    }
    return result;
}

sp<ANativeWindow> android_view_Surface_getNativeWindow(JNIEnv* env, jobject surfaceObj) { // 这是供Native Activity使用的
    return getSurface(env, surfaceObj);
}

看似这就是创建Surface的地方，实则不然，这是供Native Activity使用。我们普通的Java Activity是createFromParcel。
创建的过程当中会初始化ISurface变量，这个是从SurfaceFlinger的Layer的创建的，另外也会通过ISurface->getSurfaceTexture()取得BufferQueue，这样(Surface)SurfaceTextureClient和BufferQueue也就建立起了联系，也就能通过native_window_*或者ANativeWindow往BufferQueue里压入数据。

举个Camera的例子，我们知道在HAL当中每个Stream创建的时候都会有一个camera2_stream_ops参数传进去，并且在Stream的callback当中都会调用camera2_stream_ops->enqueue_buffer，然后调用到ANativeWindow->queueBuffer，最终会调用到BufferQueue的方法，所以你看如果我们喂给Camera HAL的ANativeWindow是SurfaceFlinger当中创建的话，那么Stream的数据就会回到SurfaceFlinger当中，SurfaceFlinger对需要的Layer的数据进行merge之后就可以给FB显示出来了，这就是Camera preview的原理。

SurfaceFlinger内部比较重要的一些功能或者类分析：
我们知道在Jelly Bean当中有黄油计划，主要就是引入VSYNC， Triple Buffer这些东西，Triple Buffer在Layer.h|cpp当中有提到。
那VSYNC是什么东西，简单来说就是一个固定频率的时钟，通常由显示器硬件来提供，如果硬件没有提供，那Android这里自己会模拟一个，参见HWComposer.h|cpp当中的VSyncThread这个类，实现也是非常简洁明了，自己看看代码就能明白。其实VSYNC/Triple Buffer这些东西在PC领域已经是应用多年的老技术了，感兴趣的可以自己搜索看看。

那简单的理解来看，硬件实现就是我们有注册一个callback给硬件，当有VSYNC过来的话就会被调用，当然最终会被调用到onVSyncReceived这个方法，那软件方式就是利用时钟了，每间隔固定的时间就调用onVSyncReceived。
另外还有个和VSYNC没有关系，但是却在这里出现的一个就是onHotplugReceived，就是你的外接或者虚拟显示器被拔掉或者接上会发生的事件，这里的话拔掉会导致从硬件VSYNC切换回软件方式，接上的话又会从软件切换回硬件的方式，总之这里优先使用硬件方式。

IDisplayEventConnection是客户端用来和SurfaceFlinger做VSYNC沟通的通道，利用Binder实现，比如setVsyncRate/requestNextVsync/getDataChannel这些方法，从字面意思就比较容易理解出这几个方法的含义，set就设置VSYNC事件被通知的频率，request就是手动请求一次VSYNC事件，data channel就是获取数据传递的通道，这里是BitTube实现，它是一个利用Socket实现的跨进程通信的管道，并且你可以在它上面注册感兴趣的事件，当事件到来时候，它通知你(利用epoll实现)。
所以每个客户端可以选择自己要的VYSNC的事件的频率，然后就收听事件通知就可以了，Java层的Choreographer就是利用这个实现的。我们要指导这个IDisplayEventConnection是可以有多个的，比如View系统或者Animation系统都用到这个，比如你自己写的App如果不用系统View/Animation相关的，你也可以自己利用Choreographer来注册。
那现在SurfaceFlinger是如何管理这些事件请求或者监听通知的呢？
通过EventThread，这是一个普通的Thread，客户端每调用一次SurfaceFlinger的createDisplayEventConnection就会创建一个Connection，随后被加入到EventThreade当中的mDisplayEventConnections，并触发这个线程的threadLoop来执行(没有事件需要执行的时候，该线程是睡眠状态，因为waitForEvent方法里面有wait)，最后将结果通过postEvent提交给BitTupe，这样之前有在上面注册事件监听的就会收到对应的事件。
详细的代码分析请参见(https://github.com/guohai/and-notes/tree/master/surfaceflinger-jb-4.2)中文注释/可能也有少部分是我添加的英文注释。

杂项：
另外FrameBufferNativeWindow已经不再被使用了。

我们通常说在新的支持硬件加速的设备和系统上，我们倾向于使用TextureView来替代SurfaceView，这里面又是什么原因呢？
都知道SurfaceView会单独创建一个Surface，在SurfaceFlinger当中的体现也是多创建一个Layer，然后与原有的，比如Window/Status Bar等等这些Layer合并之后再在display上画出来。
那使用TextureView就不会有这么一个过程吗？是的，因为TextureView里面利用了SurfaceTexture，SurfaceTexture的创建不会导致SurfaceFlinger中多出来一个Layer，因为它是使用硬件来做的，所以TextureView必须是支持硬件加速，并且开启的情况下才能使用，否则它什么也做不了。但是它还是会创建一个Layer，只不过这个Layer是硬件来创建，管理，那软件层面就不用花这个功来做这件事情。Native层的SurfaceTexture(ConsumerBase)它负责接收过来的数据，然后通过JNI往上传View层，软件层面的工作就结束了。

P.S. 详细信息待补充