September 2013 – Interact^Influence

用Wicd替换Ubuntu 11.10 Network Manager

Ubuntu 11.10的Network Manager不是很稳定，经常断线，特别是网络状态不是很好的时候(根据自己经验总结出来，而且断了就很难连上)，dmesg显示超时。实在忍不住了在网上搜搜，别人说可以用Wicd替换掉Network Manager。
动手开始

sudo apt-get install wicd

sudo aptitude remove network-manager

其实如果你比较谨慎，不想立即删除Network Manager的话，你可以先停止服务，然后再禁止掉它。

sudo /etc/init.d/network-manager stop

sudo update-rc.d network-manager disable

然后再重启wicd

sudo /etc/init.d/wicd restart

或者重启系统。

注意了，我一般是通过WPA2来加密无线连接的(也就是路由器开了这个)，我们连接Access Point的时候也应当选择这个，通常我们都会输入对应的明文密码，比如SSID为”TA’S WIFI”，PASSWORD为”cheatbot”，我们直接选择对应的SSID并且输入这个明文密码就好了。

但是Wicd这里不一样的，它需要你输入一个Passphrase，也就是Use Encryption下的WPA 1/2 (Passphrase)，这个不是直接的明文密码，如果你输入直接的明文密码，连接的时候就会出现密码错误，Bad Password也就是这么出来的，那么这个值从哪里来？

man wpa_passphrase

WPA_PASSPHRASE(8)                                            WPA_PASSPHRASE(8)

NAME
       wpa_passphrase - Generate a WPA PSK from an ASCII passphrase for a SSID

SYNOPSIS
       wpa_passphrase [ ssid ] [ passphrase ]

OVERVIEW
       wpa_passphrase  pre-computes  PSK  entries  for  network  configuration
       blocks of a wpa_supplicant.conf file. An ASCII passphrase and SSID  are
       used to generate a 256-bit PSK.

OPTIONS
       ssid   The SSID whose passphrase should be derived.

       passphrase
              The  passphrase  to  use.  If  not included on the command line,
              passphrase will be read from standard input.

SEE ALSO
       wpa_supplicant.conf(5) wpa_supplicant(8)

LEGAL
       wpa_supplicant is copyright (c) 2003-2007, Jouni Malinen <j@w1.fi>  and
       contributors.  All Rights Reserved.

       This  program  is  dual-licensed  under  both the GPL version 2 and BSD
       license. Either license may be used at your option.

                               07 September 2010             WPA_PASSPHRASE(8)

这里就可以看到实际是执行如下命令

guohai@KNIGHT:~$ wpa_passphrase "TA'S WIFI" "cheatbot"
network={
	ssid="TA'S WIFI"
	#psk="cheatbot"
	psk=f122b3c5a2c1b50064805d44fa8b23802dadfdfa6e2dbb73bdf3d2ee702a4146
}

这里长长的psk就是需要我们填到Properties的Key里面去的，填好这些之后，试着连连看(如果连不上，就重启下系统，我是这样的，然后我还勾选了前面的Automatically connect to this network)，然后它就能成功连接上了。

当然稳定性还有待观测，不过到现在还没有出现过之前断开的情况，虽然Wicd没有默认的Network Manager直观，好看，但是只要它稳定好用，我也就满意了。

手工进行图片Huffman编码

这里有篇文章讲解如何进行Huffman编码的JPEG Huffman Coding Tutorial，写的还是很清晰，我本来打算翻译一下这篇文章，但是想想觉得还是配合它的思路以不同的角度再写一篇，它是解码方式，我这里是以编码方式来描述，它的Y:Cb:Cr是1:1:1，我这里是4:1:1，当然它是英文的，我是中文的 @@。

JPEG/Huffman编码基础知识不了解的还是要先学习下基础知识，可以参考JPEG学习笔记，否则你可能无法看懂这里在说什么。
简单回顾下JPEG编码过程，这里我们假设RGB->JPEG：

色彩空间转换
正向离散余弦变换
量化(需要量化表)
Huffman编码(需要Huffman编码表)
生成图片流(按照JFIF格式)

其它的我们都不说了，直接开始吧，如果你现在觉得上面的概念/原理还不是很理解，还来的急，请跳转到文章顶部从头开始读。
有些中间数据的推理是通过JPEGsnoop或者手工进行的，所以你也要熟悉。

假设我们有RGB数据，这里我用的是test.bmp，一张16×8的的白黑位图(左DU白，右DU黑)，可以用16进制工具打开来查看，实际数据就是：

FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

假设我们有SOF数据
8位数据样本(1字节)
图像高度为8(2字节)
图像宽度为16(2字节)
颜色分量数，JPEG都是YCrCb，即3(1字节)
颜色分量信息，大小是颜色分量数 multiply 3
因为分量信息中颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
H 2:1:1
V 2:1:1
所以总体采样因子就是(2 * 2):(1 * 1):(1 * 1)，即4:1:1

MCU宽是水平采样因子最大值 multiply 8(记该最大值为Hmax)
MCU高是垂直采样因子最大值 multiply 8(记该最大值为Vmax)
因此这里就是(Hmax * 8):(Vmax * 8) = 16:16

如果整幅图片的高度或者宽度不是MCU的整数倍，就需要padding，解码之后丢弃大于宽度或者高度部分的数据
在数据流当中，MCU是按从左到右，从上到下来排列的。
因为每个MCU由若干数据单元组成，而数据单元又必须是8:8的，所以MCU当中数据单元的个数就是4(Hmax * Vmax)

组装起来可能就如下：

FF C0 00 11 08 00 08 00 10 03 01 22 00 02 11 01 03 11 01

DHT(FF C4)如下(这里的Huffman编码表是标准的，没有优化过的)

FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 14 32 81 91 A1 08 23 42 B1 C1 15 52 D1 F0 24 33 62 72 82 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FF C4 00 1F 01 00 03 01 01 01 01 01 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B FF C4 00 B5 11 00 02 01 02 04 04 03 04 07 05 04 04 00 01 02 77 00 01 02 03 11 04 05 21 31 06 12 41 51 07 61 71 13 22 32 81 08 14 42 91 A1 B1 C1 09 23 33 52 F0 15 62 72 D1 0A 16 24 34 E1 25 F1 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 F4 F5 F6 F7 F8 F9 FA

重构Huffman编码表(这样才比较好理解，编码表如何构建出来，参考[JPEG学习笔记]，这里只列出需要用到的，部分有省略)，4张表分别为直流0(直流Y)，交流0(交流Y)，直流1(直流C)，交流1(交流C)

*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000B1
  Huffman table length = 31
  ----
  Destination ID = 0
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (001 total): 00 
    Codes of length 03 bits (005 total): 01 02 03 04 05 
    Codes of length 04 bits (001 total): 06 
    Codes of length 05 bits (001 total): 07 
    Codes of length 06 bits (001 total): 08 
    Codes of length 07 bits (001 total): 09 
    Codes of length 08 bits (001 total): 0A 
    Codes of length 09 bits (001 total): 0B 
    Codes of length 10 bits (000 total): 
    Codes of length 11 bits (000 total): 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

Length       Codeword        Code
2            00              00(End of Block)
3            010             01
3            011             02
3            100             03
3            101             04
3            110             05
4            1110            06
5            1111 0          07
6            1111 10         08
7            1111 110        09
8            1111 1110       0A
9            1111 1111 0     0B


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000D2
  Huffman table length = 181
  ----
  Destination ID = 0
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (003 total): 00 04 11 
    Codes of length 05 bits (003 total): 05 12 21 
    Codes of length 06 bits (002 total): 31 41 
    Codes of length 07 bits (004 total): 06 13 51 61 
    Codes of length 08 bits (003 total): 07 22 71 
    Codes of length 09 bits (005 total): 14 32 81 91 A1 
    Codes of length 10 bits (005 total): 08 23 42 B1 C1 
    Codes of length 11 bits (004 total): 15 52 D1 F0 
    Codes of length 12 bits (004 total): 24 33 62 72 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (001 total): 82 
    Codes of length 16 bits (125 total): 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 
                                         37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 
                                         57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 
                                         77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 
                                         96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 
                                         B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA 
                                         D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 
                                         E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

Length  Codeword                    Code
2       00                          01
2       01                          02
3       100                         03
4       1010                        00(End of Block)
4       1011                        04
4       1100                        11
5       1101 0                      05
5       1101 1                      12
5       1110 0                      21
6       1110 10                     31
6       1110 11                     41
7       1111 000                    06
7       1111 001                    13
7       1111 010                    51
7       1111 011                    61
......


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x00000189
  Huffman table length = 31
  ----
  Destination ID = 1
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (003 total): 00 01 02 
    Codes of length 03 bits (001 total): 03 
    Codes of length 04 bits (001 total): 04 
    Codes of length 05 bits (001 total): 05 
    Codes of length 06 bits (001 total): 06 
    Codes of length 07 bits (001 total): 07 
    Codes of length 08 bits (001 total): 08 
    Codes of length 09 bits (001 total): 09 
    Codes of length 10 bits (001 total): 0A 
    Codes of length 11 bits (001 total): 0B 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

Length     Codeword            Code
2          00                  00(End of Block)
2          01                  01
2          10                  02
3          110                 03
4          1110                04
5          1111 0              05
6          1111 10             06
7          1111 110            07
8          1111 1110           08
9          1111 1111 0         09
10         1111 1111 10        0A
11         1111 1111 110       0B


*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000001AA
  Huffman table length = 181
  ----
  Destination ID = 1
  Class = 1 (AC Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (002 total): 00 01 
    Codes of length 03 bits (001 total): 02 
    Codes of length 04 bits (002 total): 03 11 
    Codes of length 05 bits (004 total): 04 05 21 31 
    Codes of length 06 bits (004 total): 06 12 41 51 
    Codes of length 07 bits (003 total): 07 61 71 
    Codes of length 08 bits (004 total): 13 22 32 81 
    Codes of length 09 bits (007 total): 08 14 42 91 A1 B1 C1 
    Codes of length 10 bits (005 total): 09 23 33 52 F0 
    Codes of length 11 bits (004 total): 15 62 72 D1 
    Codes of length 12 bits (004 total): 0A 16 24 34 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (001 total): E1 
    Codes of length 15 bits (002 total): 25 F1 
    Codes of length 16 bits (119 total): 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 
                                         44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 
                                         64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 
                                         83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 
                                         9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 
                                         B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 
                                         D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 
                                         F4 F5 F6 F7 F8 F9 FA 
    Total number of codes: 162

Length  Codeword                    Code
2       00                          00(End of Block)
2       01                          01
3       100                         02
4       1010                        03
4       1011                        11
5       1100 0                      04
5       1100 1                      05
5       1101 0                      21
5       1101 1                      31
6       1110 00                     06
6       1110 01                     12
6       1110 10                     41
6       1110 11                     51
......

DQT(FF DB)如下

FF DB 00 43 00 05 03 04 04 04 03 05 04 04 04 05 05 05 06 07 0C 08 07 07 07 07 0F 0B 0B 09 0C 11 0F 12 12 11 0F 11 11 13 16 1C 17 13 14 1A 15 11 11 18 21 18 1A 1D 1D 1F 1F 1F 13 17 22 24 22 1E 24 1C 1E 1F 1E
FF DB 00 43 01 05 05 05 07 06 07 0E 08 08 0E 1E 14 11 14 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E 1E

0x0043为所占大小
1个字节为QT信息，高4位为QT精度(在这里精度数据都是0)，低4位为QT号
64字节的QT数据
还原成矩阵形式(Zigzag)
QT 0

5   3   3   5   7   12  15  18

4   4   4   6   8   17  18  17

4   4   5   7   12  17  21  17

4   5   7   9   15  16  24  19

5   7   11  17  20  33  31  23

7   11  17  19  24  31  34  28

15  19  23  26  31  36  36  30

22  28  29  29  34  30  31  30

QT 1
略

前提条件都有了，开始做变换了。
通过SOF数据我们知道有1个MCU，Y有4个DU(2个是真实的数据，2个是填充)，Cb有一个DU，Cr有一个DU。

RGB
略

DU 1(Y)

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

255  255  255  255  255  255  255  255

DU 1(-128)

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

127  127  127  127  127  127  127  127

DU 1(FDCT)

1016 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 1(Quantization)

203  0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(Y)

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(-128)

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

-128 -128 -128 -128 -128 -128 -128 -128

DU 2(FDCT)

-1024 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

DU 2(Quantization)

-205 0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

于是得到
DC分别为203和-205(绝对)，转换为相对则是203和-408
AC都为0

其它为0或者填充数据暂时就不用管了。

现在进行Huffman编码
203 = 1100 1011(查表或者计算都可以得出，最高位为1表示正数，最高位为0表示负数，取反得出其值)，继而DC编码前缀为1111 10(根据Y的DC表可以得出，数据占8位)
-408 = 0011 0011 1
则

1111 10 1100 1011 // 直流Y
1111 10 1100 1011 1010 // 直流Y + 交流Y
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 // 直流Y + 交流Y + 直流Y + 交流Y
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU)
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 00 00 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb + 直流Cr + 交流Cr
1111 10 1100 1011 1010 1111 110 0011 0011 1 1010 00 1010 00 1010 00 00 00 00 111111 // 直流Y + 交流Y + 直流Y + 交流Y + 2个填充DU(因为MCU为4个DU + 直流Cb + 交流Cb + 直流Cr + 交流Cr + 填充位

去掉空格
1111101100101110101111110001100111101000101000101000000000111111
也就是说数据其实只有58位，补齐到字节对齐，共占64位，就得到了我们最终要的数据。
16进制表示
FB 2E BF 19 E8 A2 80 3F

这下你或许对这复杂的JPEG原理有所了解了吧，但是记住这才是开始(当然你无法全手工的为一张复杂的图片来编码，这篇文章只是帮助理解原理)，JPEG正真复杂的是它对各种算法的优化，那才是重点。

NanoJPEG，一个简单的JPEG解码器分析

自己学习了下JPEG理论知识以后，找了个简单的解码器(NanoJPEG)试试看，原始地址在这里http://keyj.emphy.de/nanojpeg/，短短几百行，还是比较容易看懂的，本人理解详细参见代码注释，如理解有误欢迎指出，有问题/兴趣也可以留言和我讨论。

$ gcc -O3 -D_NJ_EXAMPLE_PROGRAM -o nanojpeg nanojpeg.c
$ ./nanojpeg testorig.jpg

跟着main函数，边理论边实践，不错的方法！

// NanoJPEG -- KeyJ's Tiny Baseline JPEG Decoder
// version 1.3 (2012-03-05)
// by Martin J. Fiedler <martin.fiedler@gmx.net>
//
// This software is published under the terms of KeyJ's Research License,
// version 0.2. Usage of this software is subject to the following conditions:
// 0. There's no warranty whatsoever. The author(s) of this software can not
//    be held liable for any damages that occur when using this software.
// 1. This software may be used freely for both non-commercial and commercial
//    purposes.
// 2. This software may be redistributed freely as long as no fees are charged
//    for the distribution and this license information is included.
// 3. This software may be modified freely except for this license information,
//    which must not be changed in any way.
// 4. If anything other than configuration, indentation or comments have been
//    altered in the code, the original author(s) must receive a copy of the
//    modified code.


///////////////////////////////////////////////////////////////////////////////
// DOCUMENTATION SECTION                                                     //
// read this if you want to know what this is all about                      //
///////////////////////////////////////////////////////////////////////////////

// INTRODUCTION
// ============
//
// This is a minimal decoder for baseline JPEG images. It accepts memory dumps
// of JPEG files as input and generates either 8-bit grayscale or packed 24-bit
// RGB images as output. It does not parse JFIF or Exif headers; all JPEG files
// are assumed to be either grayscale or YCbCr. CMYK or other color spaces are
// not supported. All YCbCr subsampling schemes with power-of-two ratios are
// supported, as are restart intervals. Progressive or lossless JPEG is not
// supported.
// Summed up, NanoJPEG should be able to decode all images from digital cameras
// and most common forms of other non-progressive JPEG images.
// The decoder is not optimized for speed, it's optimized for simplicity and
// small code. Image quality should be at a reasonable level. A bicubic chroma
// upsampling filter ensures that subsampled YCbCr images are rendered in
// decent quality. The decoder is not meant to deal with broken JPEG files in
// a graceful manner; if anything is wrong with the bitstream, decoding will
// simply fail.
// The code should work with every modern C compiler without problems and
// should not emit any warnings. It uses only (at least) 32-bit integer
// arithmetic and is supposed to be endianness independent and 64-bit clean.
// However, it is not thread-safe.


// COMPILE-TIME CONFIGURATION
// ==========================
//
// The following aspects of NanoJPEG can be controlled with preprocessor
// defines:
//
// _NJ_EXAMPLE_PROGRAM     = Compile a main() function with an example
//                           program.
// _NJ_INCLUDE_HEADER_ONLY = Don't compile anything, just act as a header
//                           file for NanoJPEG. Example:
//                               #define _NJ_INCLUDE_HEADER_ONLY
//                               #include "nanojpeg.c"
//                               int main(void) {
//                                   njInit();
//                                   // your code here
//                                   njDone();
//                               }
// NJ_USE_LIBC=1           = Use the malloc(), free(), memset() and memcpy()
//                           functions from the standard C library (default).
// NJ_USE_LIBC=0           = Don't use the standard C library. In this mode,
//                           external functions njAlloc(), njFreeMem(),
//                           njFillMem() and njCopyMem() need to be defined
//                           and implemented somewhere.
// NJ_USE_WIN32=0          = Normal mode (default).
// NJ_USE_WIN32=1          = If compiling with MSVC for Win32 and
//                           NJ_USE_LIBC=0, NanoJPEG will use its own
//                           implementations of the required C library
//                           functions (default if compiling with MSVC and
//                           NJ_USE_LIBC=0).
// NJ_CHROMA_FILTER=1      = Use the bicubic chroma upsampling filter
//                           (default). // 图像resize的一种算法
// NJ_CHROMA_FILTER=0      = Use simple pixel repetition for chroma upsampling
//                           (bad quality, but faster and less code).


// API
// ===
//
// For API documentation, read the "header section" below.


// EXAMPLE
// =======
//
// A few pages below, you can find an example program that uses NanoJPEG to
// convert JPEG files into PGM or PPM. To compile it, use something like
//     gcc -O3 -D_NJ_EXAMPLE_PROGRAM -o nanojpeg nanojpeg.c
// You may also add -std=c99 -Wall -Wextra -pedantic -Werror, if you want 


///////////////////////////////////////////////////////////////////////////////
// HEADER SECTION                                                            //
// copy and pase this into nanojpeg.h if you want                            //
///////////////////////////////////////////////////////////////////////////////

#ifndef _NANOJPEG_H
#define _NANOJPEG_H

// nj_result_t: Result codes for njDecode().
typedef enum _nj_result {
    NJ_OK = 0,        // no error, decoding successful
    NJ_NO_JPEG,       // not a JPEG file
    NJ_UNSUPPORTED,   // unsupported format
    NJ_OUT_OF_MEM,    // out of memory
    NJ_INTERNAL_ERR,  // internal error
    NJ_SYNTAX_ERROR,  // syntax error
    __NJ_FINISHED,    // used internally, will never be reported
} nj_result_t;

// njInit: Initialize NanoJPEG.
// For safety reasons, this should be called at least one time before using
// using any of the other NanoJPEG functions.
void njInit(void);

// njDecode: Decode a JPEG image.
// Decodes a memory dump of a JPEG file into internal buffers.
// Parameters:
//   jpeg = The pointer to the memory dump.
//   size = The size of the JPEG file.
// Return value: The error code in case of failure, or NJ_OK (zero) on success.
nj_result_t njDecode(const void* jpeg, const int size);

// njGetWidth: Return the width (in pixels) of the most recently decoded
// image. If njDecode() failed, the result of njGetWidth() is undefined.
int njGetWidth(void);

// njGetHeight: Return the height (in pixels) of the most recently decoded
// image. If njDecode() failed, the result of njGetHeight() is undefined.
int njGetHeight(void);

// njIsColor: Return 1 if the most recently decoded image is a color image
// (RGB) or 0 if it is a grayscale image. If njDecode() failed, the result
// of njGetWidth() is undefined.
int njIsColor(void);

// njGetImage: Returns the decoded image data.
// Returns a pointer to the most recently image. The memory layout it byte-
// oriented, top-down, without any padding between lines. Pixels of color
// images will be stored as three consecutive bytes for the red, green and
// blue channels. This data format is thus compatible with the PGM or PPM
// file formats and the OpenGL texture formats GL_LUMINANCE8 or GL_RGB8.
// If njDecode() failed, the result of njGetImage() is undefined.
unsigned char* njGetImage(void);

// njGetImageSize: Returns the size (in bytes) of the image data returned
// by njGetImage(). If njDecode() failed, the result of njGetImageSize() is
// undefined.
int njGetImageSize(void);

// njDone: Uninitialize NanoJPEG.
// Resets NanoJPEG's internal state and frees all memory that has been
// allocated at run-time by NanoJPEG. It is still possible to decode another
// image after a njDone() call.
void njDone(void);

#endif//_NANOJPEG_H


///////////////////////////////////////////////////////////////////////////////
// CONFIGURATION SECTION                                                     //
// adjust the default settings for the NJ_ defines here                      //
///////////////////////////////////////////////////////////////////////////////

#ifndef NJ_USE_LIBC
    #define NJ_USE_LIBC 1
#endif

#ifndef NJ_USE_WIN32
  #ifdef _MSC_VER
    #define NJ_USE_WIN32 (!NJ_USE_LIBC)
  #else
    #define NJ_USE_WIN32 0
  #endif
#endif

#ifndef NJ_CHROMA_FILTER
    #define NJ_CHROMA_FILTER 1
#endif


///////////////////////////////////////////////////////////////////////////////
// EXAMPLE PROGRAM                                                           //
// just define _NJ_EXAMPLE_PROGRAM to compile this (requires NJ_USE_LIBC)    //
///////////////////////////////////////////////////////////////////////////////

#ifdef  _NJ_EXAMPLE_PROGRAM

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char* argv[]) {
    int size;
    char *buf;
    FILE *f;

    if (argc < 2) {
        printf("Usage: %s <input.jpg> [<output.ppm>]\n", argv[0]);
        return 2;
    }
    f = fopen(argv[1], "rb");
    if (!f) {
        printf("Error opening the input file.\n");
        return 1;
    }
    fseek(f, 0, SEEK_END);
    size = (int) ftell(f); // 字节
    buf = malloc(size);
    fseek(f, 0, SEEK_SET);
    size = (int) fread(buf, 1, size, f); // 读取整个文件内容到buf
    fclose(f);

    njInit(); // 初始化nj_context_t
    if (njDecode(buf, size)) {
        printf("Error decoding the input file.\n");
        return 1;
    }

    f = fopen((argc > 2) ? argv[2] : (njIsColor() ? "nanojpeg_out.ppm" : "nanojpeg_out.pgm"), "wb");
    if (!f) {
        printf("Error opening the output file.\n");
        return 1;
    }
    fprintf(f, "P%d\n%d %d\n255\n", njIsColor() ? 6 : 5, njGetWidth(), njGetHeight());
    fwrite(njGetImage(), 1, njGetImageSize(), f);
    fclose(f);
    njDone();
    return 0;
}

#endif

// 解释什么是stride http://msdn.microsoft.com/en-us/library/windows/desktop/aa473780(v=vs.85).aspx

///////////////////////////////////////////////////////////////////////////////
// IMPLEMENTATION SECTION                                                    //
// you may stop reading here                                                 //
///////////////////////////////////////////////////////////////////////////////

#ifndef _NJ_INCLUDE_HEADER_ONLY

#ifdef _MSC_VER
    #define NJ_INLINE static __inline
    #define NJ_FORCE_INLINE static __forceinline
#else
    #define NJ_INLINE static inline
    #define NJ_FORCE_INLINE static inline
#endif

#if NJ_USE_LIBC
    #include <stdlib.h>
    #include <string.h>
    #define njAllocMem malloc
    #define njFreeMem  free
    #define njFillMem  memset
    #define njCopyMem  memcpy
#elif NJ_USE_WIN32
    #include <windows.h>
    #define njAllocMem(size) ((void*) LocalAlloc(LMEM_FIXED, (SIZE_T)(size)))
    #define njFreeMem(block) ((void) LocalFree((HLOCAL) block))
    NJ_INLINE void njFillMem(void* block, unsigned char value, int count) { __asm {
        mov edi, block
        mov al, value
        mov ecx, count
        rep stosb
    } }
    NJ_INLINE void njCopyMem(void* dest, const void* src, int count) { __asm {
        mov edi, dest
        mov esi, src
        mov ecx, count
        rep movsb
    } }
#else
    extern void* njAllocMem(int size);
    extern void njFreeMem(void* block);
    extern void njFillMem(void* block, unsigned char byte, int size);
    extern void njCopyMem(void* dest, const void* src, int size);
#endif

typedef struct _nj_code {
    unsigned char bits, code;
} nj_vlc_code_t;

typedef struct _nj_cmp {
    int cid;
    int ssx, ssy; // 水平/垂直因子
    int width, height;
    int stride;
    int qtsel; // Quantization Table量化表
    int actabsel, dctabsel; // AC/DC Huffman Table
    int dcpred;
    unsigned char *pixels;
} nj_component_t; // 颜色分量

typedef struct _nj_ctx {
    nj_result_t error;
    const unsigned char *pos; // 待解码数据指针(按字节来)
    int size; // 整个数据的长度
    int length; // 某一个marker内容的长度
    int width, height; // 图片宽和高度
    int mbwidth, mbheight; // MCU水平/垂直个数
    int mbsizex, mbsizey; // MCU宽/高
    int ncomp; // 颜色分量数
    nj_component_t comp[3]; // YCbCr
    int qtused, qtavail; // 这两个目前看不出来很大用处
    unsigned char qtab[4][64]; // 但是目前似乎只有2个
    nj_vlc_code_t vlctab[4][65536]; // 构造所有16位数的Huffman基数
									// 目前基本上是4个(直/交/0/1)
    int buf, bufbits; // 这是用来做什么的 buf是存放内容的 bufbits是计数器，存放了多少个bits
    int block[64];
    int rstinterval;
    unsigned char *rgb; // 解析出来的RGB所要占用的内存 // 每1个点包含3个字节，按找RGB的顺序
} nj_context_t;

static nj_context_t nj;

static const char njZZ[64] = { 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18,
11, 4, 5, 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13, 6, 7, 14, 21, 28, 35,
42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51, 58, 59, 52, 45,
38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63 };

/*
0   1   2   3   4   5   6   7

8   9   10  11  12  13  14  15

16  17  18  19  20  21  22  23

24  25  26  27  28  29  30  31

32  33  34  35  36  37  38  39

40  41  42  43  44  45  46  47

48  49  50  51  52  53  54  55

56  57  58  59  60  61  62  63
*/

NJ_FORCE_INLINE unsigned char njClip(const int x) { // 限定范围是0 ~ 255之间
    return (x < 0) ? 0 : ((x > 0xFF) ? 0xFF : (unsigned char) x);
}

#define W1 2841
#define W2 2676
#define W3 2408
#define W5 1609
#define W6 1108
#define W7 565

NJ_INLINE void njRowIDCT(int* blk) { // 按行来操作的 0 ~ 7 // 8 ~ 15
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;
    if (!((x1 = blk[4] << 11)
        | (x2 = blk[6])
        | (x3 = blk[2])
        | (x4 = blk[1])
        | (x5 = blk[7])
        | (x6 = blk[5])
        | (x7 = blk[3])))
    {
        blk[0] = blk[1] = blk[2] = blk[3] = blk[4] = blk[5] = blk[6] = blk[7] = blk[0] << 3;
        return;
    }
    x0 = (blk[0] << 11) + 128;
    x8 = W7 * (x4 + x5);
    x4 = x8 + (W1 - W7) * x4;
    x5 = x8 - (W1 + W7) * x5;
    x8 = W3 * (x6 + x7);
    x6 = x8 - (W3 - W5) * x6;
    x7 = x8 - (W3 + W5) * x7;
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2);
    x2 = x1 - (W2 + W6) * x2;
    x3 = x1 + (W2 - W6) * x3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8;
    x4 = (181 * (x4 - x5) + 128) >> 8;
    blk[0] = (x7 + x1) >> 8;
    blk[1] = (x3 + x2) >> 8;
    blk[2] = (x0 + x4) >> 8;
    blk[3] = (x8 + x6) >> 8;
    blk[4] = (x8 - x6) >> 8;
    blk[5] = (x0 - x4) >> 8;
    blk[6] = (x3 - x2) >> 8;
    blk[7] = (x7 - x1) >> 8;
}

NJ_INLINE void njColIDCT(const int* blk, unsigned char *out, int stride) {
    int x0, x1, x2, x3, x4, x5, x6, x7, x8;
    if (!((x1 = blk[8*4] << 8)
        | (x2 = blk[8*6])
        | (x3 = blk[8*2])
        | (x4 = blk[8*1])
        | (x5 = blk[8*7])
        | (x6 = blk[8*5])
        | (x7 = blk[8*3])))
    {
        x1 = njClip(((blk[0] + 32) >> 6) + 128);
        for (x0 = 8;  x0;  --x0) {
            *out = (unsigned char) x1;
            out += stride;
        }
        return;
    }
    x0 = (blk[0] << 8) + 8192;
    x8 = W7 * (x4 + x5) + 4;
    x4 = (x8 + (W1 - W7) * x4) >> 3;
    x5 = (x8 - (W1 + W7) * x5) >> 3;
    x8 = W3 * (x6 + x7) + 4;
    x6 = (x8 - (W3 - W5) * x6) >> 3;
    x7 = (x8 - (W3 + W5) * x7) >> 3;
    x8 = x0 + x1;
    x0 -= x1;
    x1 = W6 * (x3 + x2) + 4;
    x2 = (x1 - (W2 + W6) * x2) >> 3;
    x3 = (x1 + (W2 - W6) * x3) >> 3;
    x1 = x4 + x6;
    x4 -= x6;
    x6 = x5 + x7;
    x5 -= x7;
    x7 = x8 + x3;
    x8 -= x3;
    x3 = x0 + x2;
    x0 -= x2;
    x2 = (181 * (x4 + x5) + 128) >> 8; // Y，Cb和Cr的值都范围都是-128 ~ 127，并且在FDCT的时候有先减去128，所以现在要IDCT之后再加上128
    x4 = (181 * (x4 - x5) + 128) >> 8;
    *out = njClip(((x7 + x1) >> 14) + 128);  out += stride;
    *out = njClip(((x3 + x2) >> 14) + 128);  out += stride;
    *out = njClip(((x0 + x4) >> 14) + 128);  out += stride;
    *out = njClip(((x8 + x6) >> 14) + 128);  out += stride;
    *out = njClip(((x8 - x6) >> 14) + 128);  out += stride;
    *out = njClip(((x0 - x4) >> 14) + 128);  out += stride;
    *out = njClip(((x3 - x2) >> 14) + 128);  out += stride;
    *out = njClip(((x7 - x1) >> 14) + 128);
}

#define njThrow(e) do { nj.error = e; return; } while (0)
#define njCheckError() do { if (nj.error) return; } while (0)

static int njShowBits(int bits) { // 能放得下大于32位的值么？
    unsigned char newbyte;
    if (!bits) return 0;
    while (nj.bufbits < bits) { // 也就是说要buf的位数小于已经buf的位数的时候，就直接读出来？
        if (nj.size <= 0) {
            nj.buf = (nj.buf << 8) | 0xFF;
            nj.bufbits += 8;
            continue;
        }
        newbyte = *nj.pos++; // 数据指针是按字节
        nj.size--;
        nj.bufbits += 8;
        nj.buf = (nj.buf << 8) | newbyte; // 高位最终会被覆盖掉，比如我要buf一个64位的值怎么办？
        if (newbyte == 0xFF) {
            if (nj.size) {
                unsigned char marker = *nj.pos++;
                nj.size--;
                switch (marker) {
                    case 0x00:
                    case 0xFF:
                        break;
                    case 0xD9: nj.size = 0; break;
                    default:
                        if ((marker & 0xF8) != 0xD0)
                            nj.error = NJ_SYNTAX_ERROR;
                        else {
                            nj.buf = (nj.buf << 8) | marker;
                            nj.bufbits += 8;
                        }
                }
            } else
                nj.error = NJ_SYNTAX_ERROR;
        }
    }
    return (nj.buf >> (nj.bufbits - bits)) & ((1 << bits) - 1);
}

NJ_INLINE void njSkipBits(int bits) {
    if (nj.bufbits < bits)
        (void) njShowBits(bits);
    nj.bufbits -= bits;
}

NJ_INLINE int njGetBits(int bits) {
    int res = njShowBits(bits);
    njSkipBits(bits);
    return res;
}

NJ_INLINE void njByteAlign(void) {
    nj.bufbits &= 0xF8; // (1111 1000)8的倍数，不满8的部分丢弃
}

static void njSkip(int count) {
    nj.pos += count; // 数据指针增加
    nj.size -= count; // 总体数据大小减去count
    nj.length -= count; // 当前marker长度减去count
    if (nj.size < 0) nj.error = NJ_SYNTAX_ERROR;
}

NJ_INLINE unsigned short njDecode16(const unsigned char *pos) {
    return (pos[0] << 8) | pos[1]; // 00000000 00001101
}

static void njDecodeLength(void) { // decode长度字段，这个方法调用一般都是已经进入到特定的marker之后
    if (nj.size < 2) njThrow(NJ_SYNTAX_ERROR);
    nj.length = njDecode16(nj.pos); // 该marker的长度(除去marker名字所占用的2个字节)
    if (nj.length > nj.size) njThrow(NJ_SYNTAX_ERROR);
    njSkip(2);
}

NJ_INLINE void njSkipMarker(void) {
    njDecodeLength();
    njSkip(nj.length);
}

NJ_INLINE void njDecodeSOF(void) { // 解析Start of Frame的时候就会把所需要的内存都分配好
    int i, ssxmax = 0, ssymax = 0;
    nj_component_t* c;
    njDecodeLength(); // 解析长度并移动数据指针
    if (nj.length < 9) njThrow(NJ_SYNTAX_ERROR);
    if (nj.pos[0] != 8) njThrow(NJ_UNSUPPORTED); // 样本精度，一般都是8
    nj.height = njDecode16(nj.pos + 1); // 图片高度/宽度
    nj.width = njDecode16(nj.pos + 3);
    nj.ncomp = nj.pos[5]; // 颜色分量数据，一般都是3
    njSkip(6); // 之前共6个字节数据，所以移动数据指针6个字节
    switch (nj.ncomp) { // 目前只支持1和3这两种
        case 1:
        case 3:
            break;
        default:
            njThrow(NJ_UNSUPPORTED);
    }
    if (nj.length < (nj.ncomp * 3)) njThrow(NJ_SYNTAX_ERROR); // 数据量肯定是要大于颜色分量数 multiply 3，因为接着存颜色分量信息的每个结构占3个字节
															  // 颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        c->cid = nj.pos[0]; // 颜色分量ID
        if (!(c->ssx = nj.pos[1] >> 4)) njThrow(NJ_SYNTAX_ERROR); // 高4位(水平因子)
        if (c->ssx & (c->ssx - 1)) njThrow(NJ_UNSUPPORTED);  // non-power of two
        if (!(c->ssy = nj.pos[1] & 15)) njThrow(NJ_SYNTAX_ERROR); // (00001111)低4位(垂直因子)
        if (c->ssy & (c->ssy - 1)) njThrow(NJ_UNSUPPORTED);  // non-power of two
        if ((c->qtsel = nj.pos[2]) & 0xFC) njThrow(NJ_SYNTAX_ERROR); // (11111101) 这里0xFC是用在这里干什么的？
        njSkip(3); // 移动数据指针到下一个颜色分量
        nj.qtused |= 1 << c->qtsel; // 这里是做什么用的？看不出来
        if (c->ssx > ssxmax) ssxmax = c->ssx; // 记录最大水平因子
        if (c->ssy > ssymax) ssymax = c->ssy; // 记录最大垂直因子
    }
    if (nj.ncomp == 1) { // 只有一种颜色分量的时候就简单啦
        c = nj.comp;
        c->ssx = c->ssy = ssxmax = ssymax = 1;
    }
    nj.mbsizex = ssxmax << 3; // MCU宽 是 水平采样因子最大值 multiply 8
    nj.mbsizey = ssymax << 3; // MCU高 是 垂直采样因子最大值 multiply 8
    nj.mbwidth = (nj.width + nj.mbsizex - 1) / nj.mbsizex; // 分子采用+ nj.mbsizex - 1就取到大于但是最接近(等于)宽度的值，
														   // 并且这个值是MCU宽度整数倍 // 这里是水平方向MCU的个数
    nj.mbheight = (nj.height + nj.mbsizey - 1) / nj.mbsizey; // 这里是垂直方向MCU的个数
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        c->width = (nj.width * c->ssx + ssxmax - 1) / ssxmax; // 采样宽度？ 最大水平/垂直因子的值就是图片原来的值，否则就会根据因子做相应的减少
        c->stride = (c->width + 7) & 0x7FFFFFF8; // (0111 1111 1111 1111 1111 1111 1111 1000) 做什么？以1234567结尾的都省略掉？
												 // 变成8的整数
												 // 补齐8位，注意前面有加7，所以总是不会比原来的少，比如原来是227，那么这里就会变成232
												 // 这是按照数据单元计算的，所以不对
		printf("%d, stride %d\n", i, c->stride);
        c->height = (nj.height * c->ssy + ssymax - 1) / ssymax;
        c->stride = nj.mbwidth * nj.mbsizex * c->ssx / ssxmax; // 再计算一遍stride有什么用？前面计算的是错误的，没有考虑MCU宽度
															   // 这里都已经是round过的了，所以直接计算
		printf("%d, stride again %d\n", i, c->stride);
        if (((c->width < 3) && (c->ssx != ssxmax)) || ((c->height < 3) && (c->ssy != ssymax))) njThrow(NJ_UNSUPPORTED);
        if (!(c->pixels = njAllocMem(c->stride * (nj.mbheight * nj.mbsizey * c->ssy / ssymax)))) njThrow(NJ_OUT_OF_MEM); // 为分量分配内存
																														 // 大小是所有MCU的
																														 // 可能比图片实际
																														 // 尺寸大
    }
    if (nj.ncomp == 3) { // 只有有3个颜色分量的时候才需要
        nj.rgb = njAllocMem(nj.width * nj.height * nj.ncomp);
        if (!nj.rgb) njThrow(NJ_OUT_OF_MEM);
    }
    njSkip(nj.length);
}

NJ_INLINE void njDecodeDHT(void) {
    int codelen, currcnt, remain, spread, i, j;
    nj_vlc_code_t *vlc;
    static unsigned char counts[16]; // 码字
    njDecodeLength();
    while (nj.length >= 17) { // 码字的数量(16) + 类型和ID(1)
        i = nj.pos[0]; // 类型和ID
        if (i & 0xEC) njThrow(NJ_SYNTAX_ERROR); // (11101100)
        if (i & 0x02) njThrow(NJ_UNSUPPORTED); // (00000010)
        i = (i | (i >> 3)) & 3;  // combined DC/AC + tableid value
								 // 直流0，直流1，交流0，交流1
        for (codelen = 1;  codelen <= 16;  ++codelen) // 码字长度
            counts[codelen - 1] = nj.pos[codelen]; // 读取码字
        njSkip(17);
        vlc = &nj.vlctab[i][0];
        remain = spread = 65536;
        for (codelen = 1;  codelen <= 16;  ++codelen) {
            spread >>= 1; // 干什么？
            currcnt = counts[codelen - 1];
            if (!currcnt) continue; // 如果该位数没有码字
            if (nj.length < currcnt) njThrow(NJ_SYNTAX_ERROR);
            remain -= currcnt << (16 - codelen);
            if (remain < 0) njThrow(NJ_SYNTAX_ERROR);
            for (i = 0;  i < currcnt;  ++i) { // 码字个数，同样位数的码字可以有多个
                register unsigned char code = nj.pos[i];
                for (j = spread;  j;  --j) { // 保存这么多个有什么作用？
                    vlc->bits = (unsigned char) codelen; // 码字位数
                    vlc->code = code; // 码字值
                    ++vlc;
                }
            }
            njSkip(currcnt);
        }
        while (remain--) {
            vlc->bits = 0;
            ++vlc;
        }
    }
    if (nj.length) njThrow(NJ_SYNTAX_ERROR);
}

NJ_INLINE void njDecodeDQT(void) {
    int i;
    unsigned char *t;
    njDecodeLength();
    while (nj.length >= 65) {
        i = nj.pos[0]; // QT信息，高4位为QT精度，低4位为QT号
        if (i & 0xFC) njThrow(NJ_SYNTAX_ERROR); // (1111 1110)这个用来检测QT号码是否正确的吗？目前精度好像都为0，所以这么写？
        nj.qtavail |= 1 << i; // XXX 直接通过这里转换为数量？
        t = &nj.qtab[i][0];
        for (i = 0;  i < 64;  ++i)
            t[i] = nj.pos[i + 1]; // 读取到QT数组当中，但应该还是按照文件流当中的排列
        njSkip(65);
    }
    if (nj.length) njThrow(NJ_SYNTAX_ERROR);
}

NJ_INLINE void njDecodeDRI(void) {
    njDecodeLength();
    if (nj.length < 2) njThrow(NJ_SYNTAX_ERROR);
    nj.rstinterval = njDecode16(nj.pos);
    njSkip(nj.length);
}

static int njGetVLC(nj_vlc_code_t* vlc, unsigned char* code) { // Variable Length Coding
    int value = njShowBits(16);
    int bits = vlc[value].bits;
    if (!bits) { nj.error = NJ_SYNTAX_ERROR; return 0; }
    njSkipBits(bits);
    value = vlc[value].code;
    if (code) *code = (unsigned char) value;
    bits = value & 15;
    if (!bits) return 0;
    value = njGetBits(bits);
    if (value < (1 << (bits - 1)))
        value += ((-1) << bits) + 1;
    return value;
}

NJ_INLINE void njDecodeBlock(nj_component_t* c, unsigned char* out) {
    unsigned char code = 0;
    int value, coef = 0;
    njFillMem(nj.block, 0, sizeof(nj.block));
    c->dcpred += njGetVLC(&nj.vlctab[c->dctabsel][0], NULL); // DC 0/1 不会和AC重复
    nj.block[0] = (c->dcpred) * nj.qtab[c->qtsel][0]; // DC // 这里是反量化？
    do {
        value = njGetVLC(&nj.vlctab[c->actabsel][0], &code); // DC 2/3
        if (!code) break;  // EOB
        if (!(code & 0x0F) && (code != 0xF0)) njThrow(NJ_SYNTAX_ERROR);
        coef += (code >> 4) + 1; // coefficient 系数
        if (coef > 63) njThrow(NJ_SYNTAX_ERROR);
        nj.block[(int) njZZ[coef]] = value * nj.qtab[c->qtsel][coef]; // AC 这里是反量化？
    } while (coef < 63);
    for (coef = 0;  coef < 64;  coef += 8)
        njRowIDCT(&nj.block[coef]); // 上面先Huffman解码/反量化，这里行(反DCT)
    for (coef = 0;  coef < 8;  ++coef)
        njColIDCT(&nj.block[coef], &out[coef], c->stride);
}

NJ_INLINE void njDecodeScan(void) {
    int i, mbx, mby, sbx, sby;
    int rstcount = nj.rstinterval, nextrst = 0;
    nj_component_t* c;
    njDecodeLength();
    if (nj.length < (4 + 2 * nj.ncomp)) njThrow(NJ_SYNTAX_ERROR);
    if (nj.pos[0] != nj.ncomp) njThrow(NJ_UNSUPPORTED);
    njSkip(1); // 颜色分量数量
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) {
        if (nj.pos[0] != c->cid) njThrow(NJ_SYNTAX_ERROR); // 颜色分量ID
        if (nj.pos[1] & 0xEE) njThrow(NJ_SYNTAX_ERROR);
        c->dctabsel = nj.pos[1] >> 4; // 高4位为直流表DC Table
        c->actabsel = (nj.pos[1] & 1) | 2; // 低4位为交流表AC Table(这里有做特殊处理，所以AC的表名不会和DC相同)

		printf("DC/AC Huffman table ids: %d/%d\n", c->dctabsel, c->actabsel);	

        njSkip(2);
    }
    if (nj.pos[0] || (nj.pos[1] != 63) || nj.pos[2]) njThrow(NJ_UNSUPPORTED);
    njSkip(nj.length); // 忽略3个字节 通常为 00 3F 00
					   // 2 + 1 + 6 + 3为12字节，这个marker的长度刚好为12字节
					   // 接下来都是编码过的图像数据
    for (mbx = mby = 0;;) {
        for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) // 每个分量都要decode
            for (sby = 0;  sby < c->ssy;  ++sby) // 水平/垂直因子
                for (sbx = 0;  sbx < c->ssx;  ++sbx) {
                    njDecodeBlock(c, &c->pixels[((mby * c->ssy + sby) * c->stride + mbx * c->ssx + sbx) << 3]); // 读取原始编码过
																												// 的图片数据到block中
																												// 并反量化，反离散余弦变换
                    njCheckError();
                }
        if (++mbx >= nj.mbwidth) { // 读完所有的MCU，到达最右就返回从下一行开始
            mbx = 0;
            if (++mby >= nj.mbheight) break; // 到达最底行的时候推出，decode结束
        }
        if (nj.rstinterval && !(--rstcount)) { // restart marker
            njByteAlign();
            i = njGetBits(16);
            if (((i & 0xFFF8) != 0xFFD0) || ((i & 7) != nextrst)) njThrow(NJ_SYNTAX_ERROR);
            nextrst = (nextrst + 1) & 7;
            rstcount = nj.rstinterval;
            for (i = 0;  i < 3;  ++i)
                nj.comp[i].dcpred = 0;
        }
    }
    nj.error = __NJ_FINISHED;
}

#if NJ_CHROMA_FILTER

#define CF4A (-9)
#define CF4B (111)
#define CF4C (29)
#define CF4D (-3)
#define CF3A (28)
#define CF3B (109)
#define CF3C (-9)
#define CF3X (104)
#define CF3Y (27)
#define CF3Z (-3)
#define CF2A (139)
#define CF2B (-11)
#define CF(x) njClip(((x) + 64) >> 7)

// 通常我们放大图片的时候就需要upsampling，缩小的时候就downsampling，通称为resampling
// 这里Cb/Cr分量的会少些，所以需要upsampling

NJ_INLINE void njUpsampleH(nj_component_t* c) {
	printf("njUpsampleH %d\n", c->cid);
    const int xmax = c->width - 3;
    unsigned char *out, *lin, *lout;
    int x, y;
    out = njAllocMem((c->width * c->height) << 1);
    if (!out) njThrow(NJ_OUT_OF_MEM);
    lin = c->pixels;
    lout = out;
    for (y = c->height;  y;  --y) {
        lout[0] = CF(CF2A * lin[0] + CF2B * lin[1]);
        lout[1] = CF(CF3X * lin[0] + CF3Y * lin[1] + CF3Z * lin[2]);
        lout[2] = CF(CF3A * lin[0] + CF3B * lin[1] + CF3C * lin[2]);
        for (x = 0;  x < xmax;  ++x) {
            lout[(x << 1) + 3] = CF(CF4A * lin[x] + CF4B * lin[x + 1] + CF4C * lin[x + 2] + CF4D * lin[x + 3]);
            lout[(x << 1) + 4] = CF(CF4D * lin[x] + CF4C * lin[x + 1] + CF4B * lin[x + 2] + CF4A * lin[x + 3]);
        }
        lin += c->stride;
        lout += c->width << 1;
        lout[-3] = CF(CF3A * lin[-1] + CF3B * lin[-2] + CF3C * lin[-3]);
        lout[-2] = CF(CF3X * lin[-1] + CF3Y * lin[-2] + CF3Z * lin[-3]);
        lout[-1] = CF(CF2A * lin[-1] + CF2B * lin[-2]);
    }
    c->width <<= 1;
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

NJ_INLINE void njUpsampleV(nj_component_t* c) {
	printf("njUpsampleV %d\n", c->cid);
    const int w = c->width, s1 = c->stride, s2 = s1 + s1;
    unsigned char *out, *cin, *cout;
    int x, y;
    out = njAllocMem((c->width * c->height) << 1);
    if (!out) njThrow(NJ_OUT_OF_MEM);
    for (x = 0;  x < w;  ++x) {
        cin = &c->pixels[x];
        cout = &out[x];
        *cout = CF(CF2A * cin[0] + CF2B * cin[s1]);  cout += w;
        *cout = CF(CF3X * cin[0] + CF3Y * cin[s1] + CF3Z * cin[s2]);  cout += w;
        *cout = CF(CF3A * cin[0] + CF3B * cin[s1] + CF3C * cin[s2]);  cout += w;
        cin += s1;
        for (y = c->height - 3;  y;  --y) {
            *cout = CF(CF4A * cin[-s1] + CF4B * cin[0] + CF4C * cin[s1] + CF4D * cin[s2]);  cout += w;
            *cout = CF(CF4D * cin[-s1] + CF4C * cin[0] + CF4B * cin[s1] + CF4A * cin[s2]);  cout += w;
            cin += s1;
        }
        cin += s1;
        *cout = CF(CF3A * cin[0] + CF3B * cin[-s1] + CF3C * cin[-s2]);  cout += w;
        *cout = CF(CF3X * cin[0] + CF3Y * cin[-s1] + CF3Z * cin[-s2]);  cout += w;
        *cout = CF(CF2A * cin[0] + CF2B * cin[-s1]);
    }
    c->height <<= 1;
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

#else

NJ_INLINE void njUpsample(nj_component_t* c) {
	printf("njUpsample %d\n", c->cid);
    int x, y, xshift = 0, yshift = 0;
    unsigned char *out, *lin, *lout;
    while (c->width < nj.width) { c->width <<= 1; ++xshift; }
    while (c->height < nj.height) { c->height <<= 1; ++yshift; }
    out = njAllocMem(c->width * c->height); // 放大后的尺寸
    if (!out) njThrow(NJ_OUT_OF_MEM);
    lin = c->pixels;
    lout = out;
    for (y = 0;  y < c->height;  ++y) {
        lin = &c->pixels[(y >> yshift) * c->stride];
        for (x = 0;  x < c->width;  ++x)
            lout[x] = lin[x >> xshift];
        lout += c->width;
    }
    c->stride = c->width;
    njFreeMem(c->pixels);
    c->pixels = out;
}

#endif

NJ_INLINE void njConvert() {
    int i;
    nj_component_t* c;
    for (i = 0, c = nj.comp;  i < nj.ncomp;  ++i, ++c) { // 如果需要的话就upsampling
        #if NJ_CHROMA_FILTER
            while ((c->width < nj.width) || (c->height < nj.height)) {
                if (c->width < nj.width) njUpsampleH(c);
                njCheckError();
                if (c->height < nj.height) njUpsampleV(c);
                njCheckError();
            }
        #else
            if ((c->width < nj.width) || (c->height < nj.height))
                njUpsample(c);
        #endif
        if ((c->width < nj.width) || (c->height < nj.height)) njThrow(NJ_INTERNAL_ERR);
    }
    if (nj.ncomp == 3) { // SEE njGetImage()
        // convert to RGB
        int x, yy;
        unsigned char *prgb = nj.rgb;
        const unsigned char *py  = nj.comp[0].pixels;
        const unsigned char *pcb = nj.comp[1].pixels;
        const unsigned char *pcr = nj.comp[2].pixels;
		// 多余的数据(编/解码是对齐用的)会被丢弃吗？
        for (yy = nj.height;  yy;  --yy) { // 列
            for (x = 0;  x < nj.width;  ++x) { // 行
                register int y = py[x] << 8; // 这是为什么？ 色彩空间转换公式计算需要
                register int cb = pcb[x] - 128; // YCbCr的Cb和Cr一般都是有符号数，但是在JPEG当中都是无符号数
                register int cr = pcr[x] - 128;
                *prgb++ = njClip((y            + 359 * cr + 128) >> 8); // 色彩空间转换，YCbCr到RGB
                *prgb++ = njClip((y -  88 * cb - 183 * cr + 128) >> 8);
                *prgb++ = njClip((y + 454 * cb            + 128) >> 8);
            }
            py += nj.comp[0].stride; // 移动YCbCr数据指针，每一行都是有stride的，所以当需要的数据都得到时，后面的就不管，直接丢弃，移动到下一行
            pcb += nj.comp[1].stride;
            pcr += nj.comp[2].stride;
        }
    } else if (nj.comp[0].width != nj.comp[0].stride) { // 如果宽度和stride都一样，什么都不用做
        // grayscale -> only remove stride
        unsigned char *pin = &nj.comp[0].pixels[nj.comp[0].stride];
        unsigned char *pout = &nj.comp[0].pixels[nj.comp[0].width];
        int y;
        for (y = nj.comp[0].height - 1;  y;  --y) {
            njCopyMem(pout, pin, nj.comp[0].width);
            pin += nj.comp[0].stride;
            pout += nj.comp[0].width;
        }
        nj.comp[0].stride = nj.comp[0].width;
    }
}

void njInit(void) {
    njFillMem(&nj, 0, sizeof(nj_context_t)); // 初始化nj_context_t
}

void njDone(void) {
    int i;
    for (i = 0;  i < 3;  ++i)
        if (nj.comp[i].pixels) njFreeMem((void*) nj.comp[i].pixels);
    if (nj.rgb) njFreeMem((void*) nj.rgb);
    njInit();
}

nj_result_t njDecode(const void* jpeg, const int size) {
    njDone();
    nj.pos = (const unsigned char*) jpeg;
    nj.size = size & 0x7FFFFFFF; // ？
    if (nj.size < 2) return NJ_NO_JPEG;
    if ((nj.pos[0] ^ 0xFF) | (nj.pos[1] ^ 0xD8)) return NJ_NO_JPEG; // 不以0xFFD8打头(为什么要用异或来判断？)
    njSkip(2);
    while (!nj.error) { // 有“错误”的时候离开
        if ((nj.size < 2) || (nj.pos[0] != 0xFF)) return NJ_SYNTAX_ERROR; // 太小，或者不以0xFF打头
        njSkip(2); // 移动到标签的后面(长度字段的前面)
        switch (nj.pos[-1]) {
            case 0xC0: njDecodeSOF();  break;
            case 0xC4: njDecodeDHT();  break;
            case 0xDB: njDecodeDQT();  break;
            case 0xDD: njDecodeDRI();  break;
            case 0xDA: njDecodeScan(); break;
            case 0xFE: njSkipMarker(); break;
            default:
                if ((nj.pos[-1] & 0xF0) == 0xE0) // JPG0和APP0字段，目前都忽略
                    njSkipMarker();
                else
                    return NJ_UNSUPPORTED;
        }
    }
    if (nj.error != __NJ_FINISHED) return nj.error;
    nj.error = NJ_OK;
    njConvert();
    return nj.error;
}

int njGetWidth(void)            { return nj.width; }
int njGetHeight(void)           { return nj.height; }
int njIsColor(void)             { return (nj.ncomp != 1); }
unsigned char* njGetImage(void) { return (nj.ncomp == 1) ? nj.comp[0].pixels : nj.rgb; } // 一/三个分量
int njGetImageSize(void)        { return nj.width * nj.height * nj.ncomp; }

#endif // _NJ_INCLUDE_HEADER_ONLY

JPEG学习笔记

阅读资料:
JPEG File Interchange Format Version 1.02
JEITA CP-3451 Exif Version 2.2
DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES – REQUIREMENTS AND GUIDELINES
The JPEG Still Picture Compression Standard
JPEG 简易文档 V2.15 云风
libjpeg 6b
JPEG 原理详细实例分析及其在嵌入式 Linux 中的应用 http://www.ibm.com/developerworks/cn/linux/l-cn-jpeg/
JPEG文件编/解码详解 http://blog.csdn.net/lpt19832003/article/details/1713718
Huffman 编码压缩算法 http://coolshell.cn/articles/7459.html
A SIMPLE EXAMPLE OF HUFFMAN CODING ON A STRING http://nerdaholyc.com/a-simple-example-of-huffman-coding-on-a-string/

这是一篇自己学习JPEG的笔记，会不断完善，多数是别人的啦，少许自己理解，如有理解错误还请指出！

1、JPEG File格式分为两个部分：标记码和压缩数据
标记码由2个字节组成，以0xFF打头，后面的一个字节根据含义不同而定。每个标记码之前还可以加任意个0xFF来填充，他们没有什么意义，也就是说连续的多个0xFF被解释成一个0xFF。

2、
压缩算法是JPEG
色彩空间是YCbCr
APP0标记码是必须的

字节序都是大端

YCbCr的Cb和Cr一般都是有符号数，但是在JPEG当中都是无符号数，所以目前在JPEG当中的做法是RGB到YCbCr的转换是计算好Cb和Cr之后将其值增加128，再做后续的运算。
YCbCr到RGB的计算是先将Cb和Cr的值减去128后再来做转换。

假设一个数据单元，8 x 8的原始图像如下(转化为YCbCr之后的数据)：

52   55   61   66   70   61   64   73

63   59   55   90   109  85   69   72

62   59   68   113  144  104  66   73

63   58   71   122  154  106  70   69

67   61   68   104  126  88   68   70

79   65   60   70   77   68   58   75

85   71   64   59   55   61   65   83

87   79   69   68   65   76   78   94

3、DCT(Discrete Cosine Transform)
在做FDCT(Forward DCT)变换的时候，Y，Cb和Cr的值都范围都是-128 ~ 127，所以都要被减去128。
适配范围，每个点都减去128：

-76  -73  -67  -62  -58  -67  -64  -55

-65  -69  -73  -38  -19  -43  -59  -56

-66  -69  -60  -15  16   -24  -62  -55

-65  -70  -57  -6   26   -22  -58  -59

-61  -67  -60  -24  -2   -40  -60  -58

-49  -63  -68  -58  -51  -60  -70  -53

-43  -57  -64  -69  -73  -67  -63  -45

-41  -49  -59  -60  -63  -52  -50  -34

使用FDCT并四舍五入取最接近的整数：

-415 -30  -61  27   56   -20  -2   0

4    -22  -61  10   13   -7   -9   5

-47  7    77   -25  -29  10   5    -6

-49  12   34   -15  -10  6    2    2

12   -7   -13  -4   -2   2    -3   3

-8   3    2    -6   -2   1    4    2

-1   0    0    -2   -1   -3   4    -1

0    0    -1   -4   -1   0    1    2

DC(Direct Current)/AC(Alternating Current)
什么是DC？什么是AC？

4、量化：

-26  -3   -6   2    2    -1   0    0

0    -2   -4   1    1    0    0    0

-3   1    5    -1   -1   0    0    0

-4   1    2    -1   0    0    0    0

1    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

0    0    0    0    0    0    0    0

5、Zigzag
Zigzag的好处，在内存当中连续的点在图片上也是相邻的了，而且后面都是连续的0，可以继续使用RLE压缩。
于是就变成：

−26,−3,0,−3,−2,−6,2,−4,1,−4,1,1,5,1,2,−1,1,−1,2,0,0,0,0,0,−1,−1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

实际上我们的DC系数在编码的时候不会算到Zigzag里面，因为DC系数值比较大，并且相邻的两个数据单元的DC差值不会很大，所以JPEG使用了差分脉冲调制编码(DPCM)对相邻两个数据单元之间的DC系数进行差值编码，这是利用了两个数据单元之间的相关性。
对其他的63个AC系数会采用Zigzag编码。

6、RLE(Run Length Coding)
我们来用一个简单的例子来详细说明一下，假设以下是使用Zigzag编码过的63个AC系数数据：
57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0,0,0,0,0,..,0
可以表示为
(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2，1) ; EOB
EOB实际就是结束，也可以用(0,0)表示，如果最后面不是以0结束的就不需要
解释下这个含义，编码后的含义就是说在57之前有0个0，在45之前有0个0，在23之前有4个0，在-30之前有1个0，以此类推。
需要值得注意的是，我们后面还会对这样的数据使用Huffman压缩，但是该算法要求这个表示0的个数的值的位数是4bit，也就是说能保存的值的范围是0 ~ 15。
所以碰到连续大于15个0的情况的时候，我们会拆分(15,0) ; (3,2) ; … ; (15,0) ; (15,0) ; (1,4)，用(15,0)表示连续的16个0，也就是表示：
19个0,2, … ,33个0,4。

7、Canonical Huffman Code
在做好这些工作之后，JPEG还会对数据进行压缩，利用Canonical Huffman Code编码，对出现频率更高的数据采用更短的码字来编码。
这里将数值按照位数分为了16组。

               数值                 组              实际保存值
                0                   0                   -
              -1,1                  1                  0,1
           -3,-2,2,3                2              00,01,10,11
     -7,-6,-5,-4,4,5,6,7            3    000,001,010,011,100,101,110,111
       -15,..,-8,8,..,15            4       0000,..,0111,1000,..,1111
      -31,..,-16,16,..,31           5     00000,..,01111,10000,..,11111
      -63,..,-32,32,..,63           6                   .
     -127,..,-64,64,..,127          7                   .
    -255,..,-128,128,..,255         8                   .
    -511,..,-256,256,..,511         9                   .
   -1023,..,-512,512,..,1023       10                   .
  -2047,..,-1024,1024,..,2047      11                   .
  -4095,..,-2048,2048,..,4095      12                   .
  -8191,..,-4096,4096,..,8191      13                   .
 -16383,..,-8192,8192,..,16383     14                   .
-32767,..,-16384,16384,..,32767    15                   .

之前RLE之后的结果：
(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2，1) ; EOB
现在仅对后面的值进行变换，前面表示0的个数的值不动。

    57是第6组的，实际保存值为111001，所以被编码为(6,111001)
    45，同样的操作，编码为 (6,101101)
    23  ->  (5,10111)
   -30  ->  (5,00001)
    -8  ->  (4,0111)
     1  ->  (1,1)

前面的那串数字就变成了:
(0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ; (2,1), 1 ; (0,0)

括号里的数值正好合成一个字节，后面被编码的数字表示范围是-32767 ~ 32767。
合成的字节里，高4位是前续0的个数，低4位描述了后面数字的位数。

接着变，下面的变化就需要去查找Huffman编码表了，这个是在压缩之前就应该构建好的，这个表是将0 ~ 255的8位定长数根据其出现的频率不同映射成为1 ~ 16位不定长数，频率大的小于8位，频率小的高于8位。这点很重要，当然这个表是如何构建出来的也很重要。
现在我们假设查表得知：

 6 = (0,6)    ---  111000    (注: 6 = 0 * 16 + 6 = 0x06)
69 = (4,5)    ---  1111111110011001    (注: 69 = 4 * 16 + 5 = 0x45)
21 = (1,5)    ---  11111110110
4  = (0,4)    ---  1011
33 = (2,1)    ---  11011
 0 = EOB = (0,0) ---  1010

那么最终我们得到的AC系数按位流就如下：
111000 111001 111000 101101 1111111110011001 10111 11111110110 00001 1011 0111 11011 1 1010

好奇：看起来前面括号里用的编码(范围为0 ~ 255)和后面编码的数字范围为-32767 ~ 32767的来自不同的编码基础？这样不会混淆么？
前面的编码数据(也就是说之前组合起来的一字节的数据，高4位0的个数，低4位非0数据编码后所占的位数)可以根据Huffman表查询出来，这样我们就可以得到紧跟着的数据的位数，读取相应的数据，所以不会混淆。

8、DC编码
DC在每个数据单元之中只有一个，并且连续数据单元之间的DC有紧密的联系，所以JPEG当中就采用的差分脉冲调制编码(DPCM)。
相邻单元之间
Diff = DC(i) – DC(i-1)
所以
DC(i) = DC(i-1) + Diff

我们保存的DC都是和上一个DC的差值，所以DC(0)等于多少？云风说是0，我觉得不应该啊。
目前DC相关这部分需要进一步证实。
假设Diff为-511
就会被编码成(9, 000000000)
假设查表(一般在JPEG文件当中Y和C是不同的表，另外还分DC和AC，所以通常就有4个Huffman表)的9的Huffman编码为1111110，那么整个DC的二进制表示就为1111110 000000000。

将DC的数据流放到AC之前就可以组成完整的编码数据流。

1111110 000000000  111000 111001  111000 101101  1111111110011001 10111  11111110110 00001  1011 0111  11011 1  1010

9、疑问：
MCU和DU之间有关系吗？看起来这些压缩都是在DU当中进行的。
每个Y，Cb，Cr分量都要来这样压缩一次吗？还是3个分量一起压缩？或者说是可以根据采样的不同，将部分分量放在一起压缩，比如Cb和Cr一起压缩，Y单独压缩。
看起来Cb可以和Cr放在一起压缩。
UPDATE：
MCU可能有一个或者多个DU组成，压缩都是按DU来的，分量都是分开压缩的，一个MCU当中的Y，Cb，Cr都是分别压缩的，然后按MCU组装成字节流。

10、JFIF文件格式分析实例：

SOI(Start of Image)

APP0(JFIF application segment)

APP1(application reserved)
…
APP15

DQT(Define Quantization Table)

DHT(Difine Huffman Table)

SOF(Start of Frame)

SOS(Start of Scan)

EOI(End of Image)

IFD(Image File Directories)

标记的前2个字节为名字，也就是前面所说的标记码，接着的2个字节为该标记的所占空间大小(不包含标记名字的2个字节，但包含自己在内)
比如这一段数据
FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 00 01 00 00
当中
0xFFE0为标记码，也就是APP0
0x0010为该标记所占大小，也就是16
后面跟的14个字节(“4A 46 49 46 00 01 01 00 00 01 00 01 00 00”)就是标记内容
根据SPEC的描述：
5字节为标识符，这里就是JFIF0
2字节为版本号(主版本和次版本各占1个字节)，这里就是1.01
1字节为密度单位，0表示没有单位，1表示每英寸点数，2表示每厘米点数
2字节为水平像素密度，这里是1
2字节为垂直像素密度，这里是1
1字节为缩略图水平像素总数，这里是0
1字节为缩略图垂直像素总数，这里为0
如果没有缩略图，这两个缩略图相关的字段值都必须要为0，如果不为0就表示后面还有缩略图的RGB数据，大小是3字节的倍数。

对于DQT

FF DB 00 43 00 08 06 06 07 06 05 08 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 39 3D 38 32 3C 2E 33 34 32
FF DB 00 43 01 09 09 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32

0xFFDB为标记码
0x0043为所占大小，也就是67(除去本身所占2字节，大小跟精度有关)
1个字节为QT信息，高4位为QT精度，低4位为QT号
还原成矩阵形式(Zigzag)

8   6   5   8   12  20  26  31

6   6   7   10  13  29  30  28

7   7   8   12  20  29  35  28

7   9   11  15  26  44  40  31

9   11  19  28  34  55  52  39

12  18  28  32  41  52  57  46 

25  32  39  44  52  61  60  51

36  46  48  49  56  50  52  50

就是这样(JPEG 原理详细实例分析及其在嵌入式 Linux 中的应用这篇文章中应该画错了)
另外一个QT也是同样的方法可以还原。

对于SOF0这段数据
FF C0 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 01
当中
0xFFC0表示SOF0
2个字节为长度，0x0011，即17
1个字节为数据样本的精度，这里是8位
2个字节表示图像的高度，这里是149
2个字节表示图像的宽度，这里是227
1个字节表示颜色分量数，JPEG都是YCrCb，即3
9个字节表示颜色分量信息，这里字节数是颜色分量数 multiply 3
因为分量信息中颜色分量ID占用1个字节，水平/垂直因子占用1个字节(高4位水平，低4位垂直)，量化表占用1个字节
H 2:1:1
V 2:1:1
所以总体采样因子就是(2 * 2):(1 * 1):(1 * 1)，即4:1:1

MCU宽是水平采样因子最大值 multiply 8(记该最大值为Hmax)
MCU高是垂直采样因子最大值 multiply 8(记该最大值为Vmax)
因此这里就是(Hmax * 8):(Vmax * 8) = 16:16

如果整幅图片的高度或者宽度不是MCU的整数倍，就需要padding，解码之后丢弃大于宽度或者高度部分的数据

在数据流当中，MCU是按从左到右，从上到下来排列的。

因为每个MCU由若干数据单元组成，而数据单元又必须是8:8的，所以MCU当中数据单元的个数就是4(Hmax * Vmax)

对于DHT，如下就有4个Huffman表

FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B
FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 14 32 81 91 A1 08 23 42 B1 C1 15 52 D1 F0 24 33 62 72 82 09 0A 16 17 18 19 1A 25 26 27 28 29 2A 34 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E1 E2 E3 E4 E5 E6 E7 E8 E9 EA F1 F2 F3 F4 F5 F6 F7 F8 F9 FA
FF C4 00 1F 01 00 03 01 01 01 01 01 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0A 0B
FF C4 00 B5 11 00 02 01 02 04 04 03 04 07 05 04 04 00 01 02 77 00 01 02 03 11 04 05 21 31 06 12 41 51 07 61 71 13 22 32 81 08 14 42 91 A1 B1 C1 09 23 33 52 F0 15 62 72 D1 0A 16 24 34 E1 25 F1 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43 44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63 64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82 83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99 9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7 B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5 D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3 F4 F5 F6 F7 F8 F9 FA

2个字节为标记码
2个字节为长度
1个字节为类型和ID，高4位为类型，0为DC直流，1为AC交流；低4位为ID
也就是说这里有4个Huffman编码表，直流0，交流0，直流1，交流1
16个字节为码字的数量，以第一张表为例子，没有1位的码字，2位的码字1个，3位的码字5个，4到9位的码字各1个，没有9位以上的码字，所以一共是12个码字
剩下的12字节为编码内容(这12是根据码字数量得出的，1 + 5 + 1 + 1 + 1 + 1 + 1 + 1)，
00 01 02 03 04 05 06 07 08 09 0A 0B
所以转换成二进制的话就应当如下：

位数      代码                 码字
2         00                  00
3         01/02/03/04/05      001/010/011/100/101
4         06                  0110
5         07                  00111
6         08                  001000
7         09                  0001001
8         0A                  00001010
9         0B                  000001011

UPDATE: 这上面的编码内容的理解应该是错误的，这是参照http://www.ibm.com/developerworks/cn/linux/l-cn-jpeg/来的。
实际应该是：

Length    Code                Codeword    
2         00                  00
3         01                  010
3         02                  011
3         03                  100
3         04                  101
3         05                  110
4         06                  1110
5         07                  11110
6         08                  111110
7         09                  1111110
8         0A                  11111110
9         0B                  111111110

这个Huffman表是如何构建的，得好好研究，另外后面的压缩都会用到这个表的内容。
Huffman编码表的构建原理
第1个码字必须是0(根据其位数具体表现为0或00或000等等，以此类推)
下一个码字在前面一个的基础上加1，如果位数有增加，则加1之后补零到相应位数
值为00的为结束标志(EOB)，编码或者解码时候注意其码字

EXIF通常是放在APP1当中
FF E1 66 59 …… FF D9
0xFFE1(2字节)就是APP1的Marker名字，0x6559(2字节)是该字段的长度，为26201，但是不包含TAG名字所占用的2个字节。也就是说后面的实际内容占用了26199个字节，直到0xFFD9，这就是EXIF的内容。为什么会有个0xFFD9呢？我们都知道0xFFD9是EOI，因为EXIF里面的缩略图实际上也是一个完整的JPEG图片，它也有SOI等等这一些Marker。
参见JEITA CP-3451 Exif Version 2.2的Figure 7 Structure of Exif file with compressed thumbnail
因为EXIF并不是JPEG必须的，所以你把这之间的内容整个都拿掉(用16进制编辑器很好做到，有兴趣的可以试试看)，对图片也没有什么影响，只是少了所有EXIF的信息。

11、Linux下编译libjpeg
download source tree
uncompress it
cd to source tree folder
clean the source tree

$ ./configure --enable-shared --enable-static

Maybe it will complain this message
./configure: 1562: ./ltconfig: not found // Here
but it does not matter, so we do not care it.

$ make

if it complains ‘make: ./libtool: Command not found’
then fix your configure file from
LIBTOOL=”./libtool”
to
LIBTOOL=”libtool”
‘Cause under your current source tree, there is of course no file named libtool, so it can not exec that command, so replace it with the system-wide libtool. BUT the prerequisite thing is that you have installed libtool on your host machine.

Then compile it again, everything should be okay.

After completed, remember there will be a folder name ‘.libs’ generated under your source tree folder, this is a HIDDEN folder, everything you need just under it.

$ ./cjpeg testimg.bmp > hello.jpeg

a new compressed jpeg file is generated

$ ./djpeg -bmp testimg.jpg > hello.bmp

a new bmp file is generated

use your imagination to do anything.

Good luck!