Python(Python2、Python3)读取gzip(.gz)文件中utf8(utf-8)编码字符串

Python中读取.gz文件中的UTF-8编码字符串可以分别在Python 2和Python 3中使用不同的方法,从gzip(.gz)文件读取utf-8编码字符串的方法以及gzip文件的常用操作。

1、Python2中指定编码读取gzip文件

import gzip
fp = gzip.open('foo.gz')
contents = fp.read() #contents是foo.gz的未压缩字节
fp.close()
u_str = contents.decode('utf-8') #u_str 现在是unicode 字符串

2、Python3中指定编码读取gzip文件

import gzip
gzip.open('file.gz', 'rt', encoding='utf-8')

3、Python2中gzip文件常用操作

1)读取压缩文件的示例代码:

import gzip
with gzip.open('file.txt.gz', 'rb') as f:
    file_content = f.read()

2)创建压缩gzip文件示例代码:

import gzip
content = "Lots of content here"
with gzip.open('file.txt.gz', 'wb') as f:
    f.write(content)

3)gzip压缩现有文件的示例代码:

import gzip
import shutil
with open('file.txt', 'rb') as f_in, gzip.open('file.txt.gz', 'wb') as f_out:
    shutil.copyfileobj(f_in, f_out)

官方文档:https://docs.python.org/2/library/gzip.html

4、Python3中gzip文件的常用操作

1)读取压缩文件的示例代码:

import gzip
with gzip.open('/home/joe/file.txt.gz', 'rb') as f:
    file_content = f.read()

2)创建压缩gzip文件的示例代码:

import gzip
content = b"Lots of content here"
with gzip.open('/home/joe/file.txt.gz', 'wb') as f:
    f.write(content)

3)gzip压缩现有文件的示例代码:

import gzip
import shutil
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

4)gzip压缩二进制字符串的示例代码:

import gzip
s_in = b"Lots of content here"
s_out = gzip.compress(s_in)

官方文档https://docs.python.org/3.7/library/gzip.html

推荐阅读
cjavapy编程之路首页