+==========================================================================================+
|Keen 4-6 DICT format:                                                                     |
+==========================================================================================+

The 'DICT' of huffman dictionary files are the (Usually internal) huffman binary trees that
are used to decompress individual files from group files.

There are three maind DICTs, AUDIODICT (Sound), EGADICT (Graphics) and MAPDIC (Game maps)
They are the first component of the 'group file trilogy' (Dictionary, header, data)

Each DICT is exactly 1024 bytes long, consisting of 255 'nodes' and 4 blank bytes at the
file end. The 255 nodes, numbered 0-254, are connected together to form a 'binary tree';
this is covered more fully below.



-------------------------------------------------------------------------------
FILE STRUCTURE:
NODES
...
..
.
PADDING
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
STRUCTURE:
0    4    Node       Huffman node
1020 4    Padding    Blank
-------------------------------------------------------------------------------


------------------------------------------------
NODE STRUCTURE:
?    1    Left val   Left branch character or node value
+1   1    Val type   Left branch type; $00 = character, $01 = node
+2   1    Right val  Right branch character or node value
+3   1    Val type   Right branch type; $00 = character, $01 = node
------------------------------------------------



-------------------------------------------------------------------------------
HUFFMAN COMPRESSION:

Huffman compression compresses data by representing characters as strings of bits, (1s and
0s) with commonly used characters having shorter strings. This is just what normal data
is, (For example the space character is represented by $00100000) You can think of normal
data then as 256 different characters each with its own individual 8-bit string.

We can organize these strings into a neat 'tree' shape, with left being 0 and right being 1,
we'll start off with 1 trunk and branch in two over and over until we have 256 tips. To find
the character we want, we start at the trunk and move onto right or left branches as we
read the bit string for a character.

In the example below a 4-character tree has been built for the numbers 0-3. To find, say,
the number two, we take its bit string (In this case only two bits long, 10) and move as
you can see:

0  1  2  3
\ /   \ /
 *     *
  \   /
   \ /
    *
   ROOT

Each of the '*' is called a NODE, and there are n-1 of them, where n is the number of types
of characters you have in your data. The trunk node is called the ROOT, and is always the
highest numbered node (In this case node 3)

Right, so how does Huffman compress? If I have the string '03323330', this tree doesn't
really let me compress it at all! But what Huffman can do is 'rearrange' the tree, so that
more common characters (Like 3) get shorter 'branches' and less common ones get longer.

There is only one other possible tree and it is shown below. Notice that '3' now only needs
one bit while 0 and 1 need 3. BUT, because 3 is more common we'll have a lot more 1 bit
strings than 3 bits, so our data will shrink:

0  1
\ /
 *   2
  \ /
   *   3
    \ /
     *
   ROOT

Our string '0133233310' is 00,11,11,10,11,11,11,00 in 2-binary Using the first tree
we don't get any change, the output is 0011111011111100, 16 bits of data. Using our
second tree however we get 000,1,1,01,1,1,1,000 or 0001101111000 13 bits, and almost 50%
reduction in size.

Since Huffman relies on rearrangement, it can 'only' work with data where some characters
occur a lot more than others. Fortunately, this is almost always the case, indeed you
can often identify a huffman tree's end by the string $00 $00 $FD $01 which is usually node
254 (The root) which has $00 (Blank) as its character and node 253 as a branch for the whole
rest of the tree.

You may have already been able to guess how the nodes in the DICT are joined together, each
node has two branches, leading either to another node or to a character. The second and
fourth bytes of each node determine that, 1 for a node, 0 for a character. From the nodes
then you can build up the binary tree you need to decompress huffman.


NOTE: Our example string of 13 bits would be encoded as two bytes, with the extra bit space
being filled with zeroes. All huffman programs need to be told when to stop decoding, in
case they come across 'spare bits' This is usually done by telling the program how long the
decompressed file is.
-------------------------------------------------------------------------------