python字符串格式化之学习笔记

在python中格式化输出字符串使用的是%运算符，通用的形式为

•格式标记字符串 % 要输出的值组
其中，左边部分的”格式标记字符串“可以完全和c中的一致。右边的'值组'如果有两个及以上的值则需要用小括号括起来，中间用短号隔开。重点来看左边的部分。左边部分的最简单形式为：

•%cdoe
其中的code有多种，不过由于在python中，所有东西都可以转换成string类型，因此，如果没有什么特殊需求完全可以全部使用’%s‘来标记。比如：

•'%s %s %s' % (1, 2.3, ['one', 'two', 'three'])
它的输出为'1 2.3 ['one', 'two', 'three']'，就是按照%左边的标记输出的。虽然第一个和第二值不是string类型，一样没有问题。在这个过程中，当电脑发现第一个值不是%s时，会先调用整型数的函数，把第一个值也就是1转成string类型，然后再调用str()函数来输出。前面说过还有一个repr()函数，如果要用这个函数，可以用%r来标记。除了%s外，还有很多类似的code:

字符串格式化：

代码如下	复制代码
format = “hello %s, %s enough for ya?” values = (‘world’,'hot’) print format % values 结果：hello world, hot enough for ya?

注：2.7可以。3.0不行

3.0要用print(format % values) 要用括号括起来。

与php类似但函数或方法名不一样的地方：

explode/" target="_blank">php explode=> python split
php trim => python strip
php implode => python join

工作中格式化字符串时遇到了UnicodeDecodeError的异常，所以研究下字符串格式化的相关知识和大家分享。

代码如下

复制代码

C:Userszhuangyan>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = '你好世界'
>>> print 'Say this: %s' % a
Say this: 你好世界
>>> print 'Say this: %s and say that: %s' % (a, 'hello world')
Say this: 你好世界 and say that: hello world
>>> print 'Say this: %s and say that: %s' % (a, u'hello world')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 10: ordinal
not in range(128)

看到print 'Say this: %s and say that: %s' % (a, u'hello world') 这句报的UnicodeDecodeError错误了吗，和上句的区别只是把'hello world'改成 u'hello world'的原因，str对象变成了unicode对象。但问题是，’hello world’只是单纯的英文字符串，不包含任何ASCII之外的字符，怎么会无法decode呢？再仔细看看异常附带的message，里面提到了0xe4，这个显然不是’hello world‘里面的，所以只能怀疑那句中文了。

>>> a 'xc4xe3xbaxc3xcaxc0xbdxe7'

把它的字节序列打印了出来，果然就是它，第一个就是0xe4。

看来在字符串格式化的时候Python试图将a decode成unicode对象，并且decode时用的还是默认的ASCII编码而非实际的UTF-8编码。那这又是怎么回事呢？？下面继续我们的试验：

代码如下	复制代码
>>> 'Say this: %s' % 'hello' 'Say this: hello' >>> 'Say this: %s' % u'hello' u'Say this: hello' >>>

仔细看，’hello’是普通的字符串，结果也是字符串（str对象），u’hello’变成了unicode对象，格式化的结果也变成unicode了（注意结果开头的那个u）。

看看Python文档怎么说的：

If format is a Unicode object, or if any of the objects being converted using the %s conversion are Unicode objects, the result will also be a Unicode object.

如果代码里混合着str和unicode，这种问题很容易出现。在同事的代码里，中文字符串是用户输入的，经过了正确的编码处理，是以UTF-8编码的str对象；但那个惹事的unicode对象，虽然其内容都是ASCII码，但其来源是sqlite3数据库查询的结果，而sqlite的API返回的字符串都是unicode对象，所以导致了这么怪异的结果。

最后我测试用format格式字符串的方式不会出现上述异常！

代码如下	复制代码
>>> print 'Say this:{0} and say that:{1}'.format(a,u'hello world') Say this:你好世界 and say that:hello world

接下来我们研究下format的基本用法。

代码如下

复制代码

>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc') # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad') # arguments' indices can be repeated
'abracadabra'
>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'
>>> coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
>>> 'Coordinates: {latitude}, {longitude}'.format(**coord)
'Coordinates: 37.24N, -115.81W'
>>> coord = (3, 5)
>>> 'X: {0[0]}; Y: {0[1]}'.format(coord)
'X: 3; Y: 5'

上面是在2.x下的演示，在3.x中format方法还有更强大的功能

象C 中的sprintf函数一样，可以用“%”来格式化字符串。

Table 3.1. 字符串格式化代码

格式	描述
%%	百分号标记
%c	字符及其ASCII码
%s	字符串
%d	有符号整数(十进制)
%u	无符号整数(十进制)
%o	无符号整数(八进制)
%x	无符号整数(十六进制)
%X	无符号整数(十六进制大写字符)
%e	浮点数字(科学计数法)
%E	浮点数字(科学计数法，用E代替e)
%f	浮点数字(用小数点符号)
%g	浮点数字(根据值的大小采用%e或%f)
%G	浮点数字(类似于%g)
%p	指针(用十六进制打印值的内存地址)
%n	存储输出字符的数量放进参数列表的下一个变量中

时间： 2024-09-11 10:17:47

python字符串格式化之学习笔记

python字符串格式化之学习笔记的相关文章

Python字符串格式化

浅谈Python 字符串格式化输出(format/printf)_python

python 字符串格式化代码_python

Python ORM框架SQLAlchemy学习笔记之安装和简单查询实例_python

python运算符和表达式学习笔记

Python基础数据存储学习笔记

python的xml.dom学习笔记

Python中的异常处理学习笔记_python

Python ORM框架SQLAlchemy学习笔记之数据添加和事务回滚介绍_python