Python Pandas pandas.DataFrame.join函数方法的使用-CJavaPy

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False) 源代码

在索引或键列上与其他DataFrame连接列。通过传递列表，有效地通过索引连接多个DataFrame对象。

参数：

other：DataFrame，具有名称字段集的Series，

或DataFrame列表

索引，应该类似于此列中的一列。

如果传递了Series，则必须设置其name属性，

并将其用作生成的连接DataFrame中的列名

on：name，tuple / names 列表或array-like

调用者中的列或索引级别名称，用于连接其他索引，

否则加入index-on-index。如果给定多个值，则另一个 DataFrame，

必须具有MultiIndex。如果数组尚未包含在调用DataFrame中，

则可以将数组作为连接键传递。像Excel VLOOKUP操作一样

How：{'left'，'right'，'outer'，'inner'}，默认：'left'

如何处理这两个对象的操作。

1）left：使用调用框架的索引

（如果指定了on，则使用列）

2）right：使用其他框架的索引

3）outer：调用框架索引的形式联合

（或指定的列）与其他框架的索引，

并按字典顺序对其进行排序

4）inner：调用框架索引（或指定的列）与其他框架索引的形式交集，

保留调用框架的索引顺序

lsuffix：str

使用左框架重叠列的后缀

rsuffix：str

使用右框架重叠列的后缀

sort：布尔值，默认为False

通过联接键按词法对结果Dataframe进行排序。如果为False，

则连接键的顺序取决于连接类型（关键字）

返回：

连接的DataFrame

说明：

传递DataFrame对象列表时，不支持on，lsuffix和rsuffix选项
版本0.23.0中添加了对指定索引级别，作为on参数的支持

例如，

import pandas as pd

# 创建caller DataFrame
caller = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
                       'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})

# 创建other DataFrame
other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
                      'B': ['B0', 'B1', 'B2']})

# 使用 key 列进行连接
result = caller.join(other.set_index('key'), on='key', lsuffix='_caller', rsuffix='_other')

print(result)

'''
 key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K2  A2   B2
3  K3  A3  NaN
4  K4  A4  NaN
5  K5  A5  NaN
'''

使用索引加入DataFrames

>>> caller.join(other, lsuffix='_caller', rsuffix='_other')
>>>     A key_caller    B key_other
    0  A0         K0   B0        K0
    1  A1         K1   B1        K1
    2  A2         K2   B2        K2
    3  A3         K3  NaN       NaN
    4  A4         K4  NaN       NaN
    5  A5         K5  NaN       NaN

如果要使用键列进行连接，需要将键设置为调用者和其他者的索引。连接的DataFrame将键作为索引。

import pandas as pd

# 创建第一个DataFrame
data_caller = {
    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5'],
    'key_caller': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5']
}
df_caller = pd.DataFrame(data_caller)

# 创建第二个DataFrame
data_other = {
    'B': ['B0', 'B1', 'B2'],
    'key_other': ['K0', 'K1', 'K2']
}
df_other = pd.DataFrame(data_other)

# 使用join进行连接
result = df_caller.join(df_other.set_index('key_other'), on='key_caller', lsuffix='_caller', rsuffix='_other')

print(result)
'''
    A key_caller    B
0  A0         K0   B0
1  A1         K1   B1
2  A2         K2   B2
3  A3         K3  NaN
4  A4         K4  NaN
5  A5         K5  NaN
'''

使用键列连接的另一个选项，使用on参数。DataFrame.join是使用其他索引，但可以使用调用者中的任何列。此方法在结果中保留原始调用者的索引。

import pandas as pd

# 创建 caller DataFrame
caller = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5'],
    'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5']
})

# 创建 other DataFrame
other = pd.DataFrame({
    'B': ['B0', 'B1', 'B2'],
    'key': ['K0', 'K1', 'K2']
})

# 使用 join 方法，将 other DataFrame 按照 'key' 列的值与 caller DataFrame 进行连接
result = caller.join(other.set_index('key'), on='key')

print(result)

'''
    A key    B
0  A0  K0   B0
1  A1  K1   B1
2  A2  K2   B2
3  A3  K3  NaN
4  A4  K4  NaN
5  A5  K5  NaN
'''

Python Pandas pandas.DataFrame.join函数方法的使用

Python 2.7中安装pip的方法及步骤

Python numpy.full函数方法的使用

Java JDK11 在windows上的安装和环境变量配置

Java Stream使用多个过滤器(filter)或复杂条件方法用法及简单写法代码

Java JDK11 在Mac上的安装和配置以及JDK多个版本之间切换

Python PIP升级后执行命令报错： sys.stderr.write(f"ERROR: {exc}")解决方法

Python pandas.to_numeric函数方法的使用

Python numpy.fromfile函数方法的使用