Documentos de Académico
Documentos de Profesional
Documentos de Cultura
In this project, we aim to put Huffman coding into practice and use it to implement a
compress/uncompress utility. By using this utility, we can compress a regular file; or
uncompress a compressed file that is compressed by our utility.
For example:
Basic idea:
1 scan file and do statistic on each character (the times of its occurrence in this file)
2 create Huffman tree based on the statistic
3 compute Huffman code of each character based on the Huffman tree in step 2
4 encode the source file: write the huffman code of each character into the
compressed file. For example:
The binary value of character „A‟ is 01000001, which have 8 bits. Suppose the
Huffman code of „A‟ is 0110. Then we only need to write 4 bits instead of 8 bits to
represent „A‟. So we save four bits. Since each byte has 8 bits, we use these save 4
bits to store other character‟s Huffman code. So suppose the Huffman code of „B‟ is
110 and binary value of character „B‟ is 01000010. Then in the original file, it requires
two bytes to store „A‟ and „B‟, but now we only need 7 bits. However, since each byte
has 8 bits, you need to make up another bit. It either comes from one bit of another
Huffman code of the following character or a 0 if the end of the file is reached.
When you create the compressed file, you need to put the encoding information into
the compressed file in order to use it when uncompress the file.
When uncompressing the file, you read the encoding information first and
re-construct Huffman tree. Next decode the file based on the Huffman tree.
The following example bases on a data source using a set of five different symbols.
The symbol's frequencies are:
Symbol Frequency
A 24
B 12
C 10
D 8
E 8
----> total 186 bit
(with 3 bit per code word)
The two rarest symbols 'E' and 'D' are connected first, followed by 'C' and 'D'. The
new parent nodes have the frequency 16 and 22 respectively and are brought
together in the next step. The resulting node and the remaining symbol 'A' are
subordinated to the root node that is created in a final step.
Basic Requirement:
1. Achieve Huffman code based on the statistic of all the characters of a file
2. Output the file based on the Huffman code, you can output plain Huffman code of
each character.
3. Decode a “encoded file”.
Example:
Text:
Abbdc
Huffman code:
„A‟: 10
„b‟: 01
„c‟: 11
„d‟: 00
Your “compressed” file should show:
1001010011
If input 101010110000
Your output should be:
AAAcdd