利用python批量改变文件编码格式、复制文件、将txt文件写入docx文件。
批量改变文件编码格式
在使用github制作网站时,发现上传的txt文件打开后是乱码,而在本地预览时是正确的。原因是我上传的txt文件内容是中文,编码格式是ANSI,需要改成utf-8后再上传。
修改单个文件编码格式很简单,用记事本打开txt文件,点击左上方文件
,选择另存为
,下方的编码
选择UTF-8
。
当有几百上千个文件时,就需要利用程序批量处理。我采用的方法是:将原文件夹内所有的文件以UTF-8
复制到新文件。
示例:将C:/文件/12
文件夹内的所有txt文件,以UTF-8
复制到C:/Users/liyil/Desktop/1
文件夹内。完整代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 import osdef convert (oldfile, newfile ): file = open (oldfile, "r" ) data = file.read() file.close() file_copy = open (newfile, "w" , encoding="utf-8" ) file_copy.write(data) file_copy.close() def travel (olddirpath, newdirpath ): i = 0 for root, dirs, files in os.walk(olddirpath): for dir in dirs: olddirpath = os.path.join(root, dir ) newdir = newdirpath + olddirpath[8 :] os.mkdir(newdir) for file in files: if os.path.splitext(file)[1 ] == ".txt" : oldfilepath = os.path.join(root, file) print ("oldfilepath: %s" % oldfilepath) i = i + 1 newfilepath = newdirpath + oldfilepath[8 :] print ("newfilePath: %s" % newfilepath) convert(oldfilepath, newfilepath) print ("" ) print (i) oldDirPath = "C:/文件/12" newDirPath = "C:/Users/liyil/Desktop/1" travel(oldDirPath, newDirPath)
修改编码格式后上传至github,能正确显示中文,但是本地预览又是乱码。
批量复制文件
python可以复制图片、视频、pdf、txt等文件,一般用"rb"、"wb"以二进制方式打开与写入。对于中文的utf-8文档,也可以用open(file, “r”, encoding=“utf-8”)打开。
遍历函数travel()与上面的程序差不多,主要的copy()函数很简单。完整代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 import osdef copy (oldfile, newfile ): file = open (oldfile, "rb" ) date = file.read() file.close() file_copy = open (newfile, "wb" ) file_copy.write(date) file_copy.close() def travel (olddirpath, newdirpath ): for root, dirs, files in os.walk(olddirpath): for dir in dirs: olddirpath = os.path.join(root, dir ) newdir = newdirpath + olddirpath[8 :] os.mkdir(newdir) for file in files: oldfilepath = os.path.join(root, file) print ("oldfilepath: %s" % oldfilepath) newfilepath = newdirpath + oldfilepath[8 :] print ("newfilePath: %s" % newfilepath) copy(oldfilepath, newfilepath) print ("" ) oldDirPath = "C:/文件/12" newDirPath = "C:/Users/liyil/Desktop/1" travel(oldDirPath, newDirPath)
批量将txt文件写入docx文件
如果直接将文件后缀txt改为docx,会发现不能正确打开文件,需要将txt文件内容重新写入docx文件。先安装python-docx库,命令:
pip install python-docx
完整代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 import osfrom docx import Documentfrom docx.oxml.ns import qndef convert (oldfile, newfile ): with open (oldfile, "r" , encoding="utf-8" ) as txt: data = txt.read().splitlines() document = Document() for line in data: document.add_paragraph(line) document.styles["Normal" ].font.name = u"楷体" document.styles["Normal" ]._element.rPr.rFonts.set (qn("w:eastAsia" ), u"楷体" ) document.save(newfile) def travel (olddirpath, newdirpath ): for root, dirs, files in os.walk(olddirpath): for dir in dirs: olddirpath = os.path.join(root, dir ) newdir = newdirpath + olddirpath[8 :] os.mkdir(newdir) for file in files: oldfilepath = os.path.join(root, file) print ("oldfilepath: %s" % oldfilepath) newfilepath = newdirpath + oldfilepath[8 :] newfilepath = newfilepath.replace(".txt" , ".docx" ) print ("newfilePath: %s" % newfilepath) file = open (newfilepath, "w" ) file.close() convert(oldfilepath, newfilepath) print ("" ) oldDirPath = "C:/文件/12" newDirPath = "C:/Users/liyil/Desktop/1" travel(oldDirPath, newDirPath)