如何将多个列表转换为pandas数据框的列表

我使用pdfplumber从PDF中提取了一个表格,输出结果是这样的—-。

print(table)

Output:

[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503']]

[['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841']]

[['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]

上面的输出是3个列表的列表。我想把这些列表组合在一起,并把它们转换成一个pandas数据框。能否请你帮忙,如何最好地迭代表的输出,并把它们放到一个数据框架中? 非常感谢!!!!!!!!。

解决方案:

你可以使用 pd.DataFrame.from_records 将其转换为数据帧。

In [53]: t
Out[53]:
[['3.42',
  'EVERGY INC',
  'EVRG',
  '30034W106',
  '14,208',
  '66.56',
  '713,222',
  '232,462'],
 ['3.35',
  'EQUITY LIFESTYLE\nPROPERTIES INC',
  'ELS',
  '29472R108',
  '6,926',
  '133.60',
  '572,177',
  '353,136'],
 ['2.94',
  'RENAISSANCERE\nHOLDINGS LTD',
  'RNR',
  'G7496G103',
  '4,198',
  '193.45',
  '565,600',
  '246,503'],
 ['1.54',
  'WEYERHAEUSER CO',
  'WY',
  '962166104',
  '15,396',
  '27.70',
  '458,523',
  '-32,054'],
 ['1.42',
  'LAMB WESTON\nHOLDINGS INC',
  'LW',
  '513272104',
  '5,407',
  '72.72',
  '347,519',
  '45,678'],
 ['1.35',
  'GLOBAL PAYMENTS INC',
  'GPN',
  '37940X102',
  '2,344',
  '159.00',
  '165,855',
  '206,841'],
 ['0.91',
  'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD',
  'CHKP',
  'M22465104',
  '2,288',
  '109.50',
  '236,679',
  '13,857'],
 ['0.79',
  'CARLISLE COS INC',
  'CSL',
  '142339100',
  '1,501',
  '145.54',
  '151,642',
  '66,814'],
 ['0.79',
  'AMETEK INC',
  'AME',
  '031100100',
  '2,374',
  '91.82',
  '140,321',
  '77,659']]

In [51]: df = pd.DataFrame.from_records(t)

In [52]: df
Out[52]:
      0                                        1     2          3       4       5        6        7
0  3.42                               EVERGY INC  EVRG  30034W106  14,208   66.56  713,222  232,462
1  3.35         EQUITY LIFESTYLE\nPROPERTIES INC   ELS  29472R108   6,926  133.60  572,177  353,136
2  2.94              RENAISSANCERE\nHOLDINGS LTD   RNR  G7496G103   4,198  193.45  565,600  246,503
3  1.54                          WEYERHAEUSER CO    WY  962166104  15,396   27.70  458,523  -32,054
4  1.42                LAMB WESTON\nHOLDINGS INC    LW  513272104   5,407   72.72  347,519   45,678
5  1.35                      GLOBAL PAYMENTS INC   GPN  37940X102   2,344  159.00  165,855  206,841
6  0.91  CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD  CHKP  M22465104   2,288  109.50  236,679   13,857
7  0.79                         CARLISLE COS INC   CSL  142339100   1,501  145.54  151,642   66,814
8  0.79                               AMETEK INC   AME  031100100   2,374   91.82  140,321   77,659

更新:

如果你的列表是这样的。

[[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503']], [['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841']], [['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]]

你可以把它压扁成list of list:

flatten_list = [item for sublist in new_list for item in sublist] 
print(flatten_list,sep=' ')
[['3.42', 'EVERGY INC', 'EVRG', '30034W106', '14,208', '66.56', '713,222', '232,462'], ['3.35', 'EQUITY LIFESTYLE\nPROPERTIES INC', 'ELS', '29472R108', '6,926', '133.60', '572,177', '353,136'], ['2.94', 'RENAISSANCERE\nHOLDINGS LTD', 'RNR', 'G7496G103', '4,198', '193.45', '565,600', '246,503'], ['1.54', 'WEYERHAEUSER CO', 'WY', '962166104', '15,396', '27.70', '458,523', '-32,054'], ['1.42', 'LAMB WESTON\nHOLDINGS INC', 'LW', '513272104', '5,407', '72.72', '347,519', '45,678'], ['1.35', 'GLOBAL PAYMENTS INC', 'GPN', '37940X102', '2,344', '159.00', '165,855', '206,841'], ['0.91', 'CHECK POINT\nSOFTWARE\nTECHNOLOGIES LTD', 'CHKP', 'M22465104', '2,288', '109.50', '236,679', '13,857'], ['0.79', 'CARLISLE COS INC', 'CSL', '142339100', '1,501', '145.54', '151,642', '66,814'], ['0.79', 'AMETEK INC', 'AME', '031100100', '2,374', '91.82', '140,321', '77,659']]

然后你可以使用 pd.DataFrame.from_records 功能。

更新2:

另一种方法是将三个数据框合并为一个数据框。pd.concat 函数,轴=0,用于沿indexignore_index= 如果为True,不使用沿连接轴的索引值。

df_combine = pd.concat([df1,df2,df3],axis=0, ignore_index=True)

axis=0,用于沿indexignore_index= 如果为True,不使用沿连接轴的索引值。由此产生的轴将被标记为0,…,n – 1。

请参考 文件

给TA打赏
共{{data.count}}人
人已打赏
解决方案

长时间运行计算的性能问题.net core - 单独的线程池

2022-5-13 3:00:09

解决方案

在启动拱门时出现新的错误

2022-5-13 3:00:21

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索