我使用pdfplumber从PDF中提取了一个表格,输出结果是这样的—-。
print(table)
Output:
[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503']]
[['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841']]
[['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]
上面的输出是3个列表的列表。我想把这些列表组合在一起,并把它们转换成一个pandas数据框。能否请你帮忙,如何最好地迭代表的输出,并把它们放到一个数据框架中? 非常感谢!!!!!!!!。
解决方案:
你可以使用 pd.DataFrame.from_records
将其转换为数据帧。
In [53]: t
Out[53]:
[['3.42',
'EVERGY INC',
'EVRG',
'30034W106',
'14,208',
'66.56',
'713,222',
'232,462'],
['3.35',
'EQUITY LIFESTYLE\nPROPERTIES INC',
'ELS',
'29472R108',
'6,926',
'133.60',
'572,177',
'353,136'],
['2.94',
'RENAISSANCERE\nHOLDINGS LTD',
'RNR',
'G7496G103',
'4,198',
'193.45',
'565,600',
'246,503'],
['1.54',
'WEYERHAEUSER CO',
'WY',
'962166104',
'15,396',
'27.70',
'458,523',
'-32,054'],
['1.42',
'LAMB WESTON\nHOLDINGS INC',
'LW',
'513272104',
'5,407',
'72.72',
'347,519',
'45,678'],
['1.35',
'GLOBAL PAYMENTS INC',
'GPN',
'37940X102',
'2,344',
'159.00',
'165,855',
'206,841'],
['0.91',
'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD',
'CHKP',
'M22465104',
'2,288',
'109.50',
'236,679',
'13,857'],
['0.79',
'CARLISLE COS INC',
'CSL',
'142339100',
'1,501',
'145.54',
'151,642',
'66,814'],
['0.79',
'AMETEK INC',
'AME',
'031100100',
'2,374',
'91.82',
'140,321',
'77,659']]
In [51]: df = pd.DataFrame.from_records(t)
In [52]: df
Out[52]:
0 1 2 3 4 5 6 7
0 3.42 EVERGY INC EVRG 30034W106 14,208 66.56 713,222 232,462
1 3.35 EQUITY LIFESTYLE\nPROPERTIES INC ELS 29472R108 6,926 133.60 572,177 353,136
2 2.94 RENAISSANCERE\nHOLDINGS LTD RNR G7496G103 4,198 193.45 565,600 246,503
3 1.54 WEYERHAEUSER CO WY 962166104 15,396 27.70 458,523 -32,054
4 1.42 LAMB WESTON\nHOLDINGS INC LW 513272104 5,407 72.72 347,519 45,678
5 1.35 GLOBAL PAYMENTS INC GPN 37940X102 2,344 159.00 165,855 206,841
6 0.91 CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD CHKP M22465104 2,288 109.50 236,679 13,857
7 0.79 CARLISLE COS INC CSL 142339100 1,501 145.54 151,642 66,814
8 0.79 AMETEK INC AME 031100100 2,374 91.82 140,321 77,659
更新:
如果你的列表是这样的。
[[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503']], [['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841']], [['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]]
你可以把它压扁成list of list:
flatten_list = [item for sublist in new_list for item in sublist]
print(flatten_list,sep=' ')
[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503'], ['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841'], ['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]
然后你可以使用 pd.DataFrame.from_records
功能。
更新2:
另一种方法是将三个数据框合并为一个数据框。pd.concat
函数,轴=0,用于沿indexignore_index= 如果为True,不使用沿连接轴的索引值。
df_combine = pd.concat([df1,df2,df3],axis=0, ignore_index=True)
axis=0,用于沿indexignore_index= 如果为True,不使用沿连接轴的索引值。由此产生的轴将被标记为0,…,n – 1。
请参考 文件