【哈夫曼编码】HDU1053-Entropy

Entropy
Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 4178    Accepted Submission(s): 1707

Problem Description
An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with “wasted” or “extra” information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately
encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives,
have very little entropy and do not benefit from further attempts at entropy encoding.

English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters
in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however:
it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes?
This seemingly difficult problem is solved using what is known as a “prefix-free variable-length” encoding.

In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix
of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would
be impossible.

Consider the text “AAAAABCD”. Using ASCII, encoding this would require 64 bits. If, instead, we encode “A” with the bit pattern “00”, “B” with “01”, “C” with “10”, and “D” with “11” then we can encode this text in only 16 bits; the resulting bit pattern
would be “0000000000011011”. This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph “A” occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order
to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode “A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”. (This is clearly not the only optimal encoding, as it is
obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to “0000010110111”, a compression ratio of 4.9
to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into
the original text even though the codes have varying bit lengths.

As a second example, consider the text “THE CAT IN THE HAT”. In this text, the letter “T” and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters “C”,
“I’ and “N” only occur once, however, so they will have the longest codes.

There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with “00”, “A” with
“100”, “C” with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N” with “1011” and “T” with “01”. The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a
compression ratio of 2.8 to 1.

 

Input
The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the
word “END” as the text string. This line should not be processed.

 

Output
For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.

 

Sample Input
AAAAABCD
THE_CAT_IN_THE_HAT
END
 

Sample Output
64 13 4.9
144 51 2.8
 

Source
Greater New York 2000
 

 

 

 

哈夫曼编码。
题意是,给出一排字符串,要求求出字符的8位编码的长度,哈夫曼编码值,以及之间的比值
因为仅仅只要求求出哈夫曼编码值,所以不用建立哈夫曼树,可以建立优先队列,只要将每次最小的
出队的两个元素合成一个新的大数,然后放进优先队列中,直到只剩下一个元素为止,那个元素就是哈夫曼编码值。

AC代码:

#include<stdio.h>
#include<string.h>
#include<queue>
#include<algorithm>
using namespace std;
char a[1500];
int flag[27];
int main()
{
    int i,j,n,m,x,y,sum;
    while(scanf("%s",a),strcmp(a,"END")!=0)
    {
       memset(flag,0,sizeof(flag));
       n=strlen(a);
       for(i=0;i<n;i++)
       {
          if(a[i]!='_')
          flag[a[i]-'A']++;
          else
          flag[26]++;
       }

       priority_queue<int,vector<int>,greater<int> > q;
       for(i=0;i<=26;i++)
       if(flag[i]!=0)
       q.push(flag[i]);

       sum=0;
       while(q.size()>1)
       {
          x=q.top(); q.pop();
          y=q.top(); q.pop();
          q.push(x+y);
          sum+=(x+y);
       }

       if(sum==0)
       sum=n;

       printf("%d %d %.1lf\n",n*8,sum,(n*8.0)/(sum*1.0));
    }
    return 0;
}
时间: 2024-11-03 20:02:57

【哈夫曼编码】HDU1053-Entropy的相关文章

哈夫曼(huffman)树和哈夫曼编码

哈夫曼树 哈夫曼树也叫最优二叉树(哈夫曼树)    问题:什么是哈夫曼树? 例:将学生的百分制成绩转换为五分制成绩:≥90 分: A,80-89分: B,70-79分: C,60-69分: D,<60分: E. if (a < 60){ b = 'E'; } else if (a < 70) { b = 'D'; } else if (a<80) { b = 'C'; } else if (a<90){ b = 'B'; } else { b = 'A'; } 判别树:用于描

关于哈夫曼编码的程序运行时出错,我分析是由于cd定义出现了问题,导致后边cd[--start]出错

问题描述 关于哈夫曼编码的程序运行时出错,我分析是由于cd定义出现了问题,导致后边cd[--start]出错 void HuffmanCoding(HuffmanTree &HT,HuffmanCode &HC,int *w,int n){ //w存放n个字符的权值(均>0),构造赫夫曼树 HT,并求出n个字符的赫夫曼编码 HC printf("123"); system("pause"); int s1,s2,i,start; int f=0

简单快速的哈夫曼编码

介绍 本文描述在网上能够找到的最简单,最快速的哈夫曼编码.本方 法不使用任何扩展动态库,比如STL或者组件.只使用简单的C函数,比如: memset,memmove,qsort,malloc,realloc和memcpy. 因此,大家都会发现 ,理解甚至修改这个编码都是很容易的. 背景 哈夫曼压缩是个无损的压缩算法,一般用来压缩文本和程序文件.哈夫 曼压缩属于可变代码长度算法一族.意思是个体符号(例如,文本文件中的字符 )用一个特定长度的位序列替代.因此,在文件中出现频率高的符号,使用短的 位序

哈夫曼编码长度问题,证明题

问题描述 哈夫曼编码长度问题,证明题 已知字符串中某一字符出现的频率大于0.4,证明该字符串中存在某一字符的哈夫曼编码长度为1;如果字符串中所有字符出现的频率都小于三分之一,证明字符串中不存在哈夫曼编码长度为1的字符 解决方案 http://www.zybang.com/question/041154e0c4adf3af1f0dfac8629a346f.html

c-跪求大神 帮忙,这段关于哈夫曼编码 的程序着实看不懂啊。。。。。。。

问题描述 跪求大神 帮忙,这段关于哈夫曼编码 的程序着实看不懂啊....... struct Codetype{//哈弗曼编码数据类型 char bits;//编码流-数组,n为为哈夫曼树中叶子结点的数目,编码的长度不可能超过n int start;//编码实际在编码流数组里的开始位置 }; Codetype *HuffmanCode(hufmtree *tree){//哈弗曼编码的生成 int i,j,p,k; Codetype *code; if(tree==NULL) return NUL

【Matlab编程】哈夫曼编码的Matlab实现

       在前年暑假的时候,用C实现了哈夫曼编译码的功能,见文章<哈夫曼树及编译码>.不过在通信仿真中,经常要使用到Matlab编程,所以为了方便起见,这里用Matlab实现的哈夫曼编码的功能.至于哈夫曼编译码的基本原理,我们可以参考之前的文章<哈夫曼树及编译码>,里面有详细的说明及图解过程.下面直接给出具体的Matlab实现的哈夫曼编码函数,由于程序中注释还算比较详细,在此就不予与说明: function [ h,e ] = Huffman_code( p ) %p为概率分布

电工4课程作业2:哈夫曼编码实验

问题描述 电工4课程作业2:哈夫曼编码实验 1) 问题描述设某编码系统共有n个字符,使用频率分别为{w1, w2, -, wn},设计一个不等长的编码方案,使得该编码系统的空间效率最好.2) 基本要求(1) 设计数据结构:(2) 设计编码算法:(3) 分析时间复杂度和空间复杂度. 解决方案 http://blog.csdn.net/fduan/article/details/7837444http://blog.sina.com.cn/s/blog_686d0fb001012qmh.html 解

范式霍夫曼编码 码长不连续 产生不能识别部分编码

问题描述 范式霍夫曼编码 码长不连续 产生不能识别部分编码 如图 范式霍夫曼编码 遇到这种 编码长度不连续的情况 网上说 extern KBitInputStream bs; int len = 1; int code = bs.ReadBit(); while(code >= first[len]) { code <<= 1; code |= (bs.ReadBit()); // append next input bit to code len++; } len--; // 至此,识

【哈夫曼编码】HDU2527-Safe Or Unsafe

Safe Or Unsafe Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submission(s): 1474    Accepted Submission(s): 582 Problem Description Javac++ 一天在看计算机的书籍的时候,看到了一个有趣的东西!每一串字符都可以被编码成一些数字来储存信息,但是不同的编码方式得到的储存空间是不一