1、内连接(inner join)
内连接是满足条件时,左边的和右边的DataFrame都存在的数据。
1)单列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '4', '6', '7', '8'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='inner', on ='Id') print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='inner',left_on = "XId",right_on = "YId")
输出:
Id Name order Gender Like 0 2 John 2 F C 1 4 Mickey 4 M Java
2)多列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'UserId':['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '9', '3', '4', '8'], 'UserId': ['1', '2', '3', '4', '5'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='inner', on =['Id','UserId']) print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='inner',left_on = ["XId","UserId"],right_on = ["YId","UserId"])
输出:
Id UserId Name order Gender Like 0 3 3 Lynn 3 M JS 1 4 4 Mickey 4 F Linux
2、外连接(outer join)
外连接返回左侧DataFrame中的所有行,右侧DataFrame中的所有行,并在可能的情况下匹配行,在其他地方使用NaN
。
1)单列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '4', '6', '7', '8'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='outer', on ='Id') print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='outer',left_on = "XId",right_on = "YId")
输出:
Id Name order Gender Like 0 1 Levi 1 NaN NaN 1 2 John 2 F C 2 3 Lynn 3 NaN NaN 3 4 Mickey 4 M Java 4 5 Rose 5 NaN NaN 5 6 NaN NaN M JS 6 7 NaN NaN F Linux 7 8 NaN NaN F Python
2)多列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'UserId':['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '9', '3', '4', '8'], 'UserId': ['1', '2', '3', '4', '5'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='outer', on =['Id','UserId']) print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='outer',left_on = ["XId","UserId"],right_on = ["YId","UserId"])
输出:
Id UserId Name order Gender Like 0 1 1 Levi 1 NaN NaN 1 2 2 John 2 NaN NaN 2 3 3 Lynn 3 M JS 3 4 4 Mickey 4 F Linux 4 5 5 Rose 5 NaN NaN 5 2 1 NaN NaN F C 6 9 2 NaN NaN M Java 7 8 5 NaN NaN F Python
3、左连接(left join)
使用左连接,将显示第一个DataFrame中的所有记录,而不管第一个DataFrame中的键是否可以在第二个DataFrame中找到。而对于第二个DataFrame,只会显示在第一个DataFrame中可以找到的具有第二个DataFrame中的key的记录。
1)单列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '4', '6', '7', '8'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='left', on ='Id') print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='left',left_on = "XId",right_on = "YId")
输出:
Id Name order Gender Like 0 1 Levi 1 NaN NaN 1 2 John 2 F C 2 3 Lynn 3 NaN NaN 3 4 Mickey 4 M Java 4 5 Rose 5 NaN NaN
2)多列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'UserId':['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '9', '3', '4', '8'], 'UserId': ['1', '2', '3', '4', '5'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='left', on =['Id','UserId']) print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='left',left_on = ["XId","UserId"],right_on = ["YId","UserId"])
输出:
Id UserId Name order Gender Like 0 1 1 Levi 1 NaN NaN 1 2 2 John 2 NaN NaN 2 3 3 Lynn 3 M JS 3 4 4 Mickey 4 F Linux 4 5 5 Rose 5 NaN NaN
4、右连接(right join)
对于右连接,将显示来自第二个DataFrame的所有记录。但是,只会显示在第二个DataFrame中可以找到的第一个DataFrame中具有Key的记录。
1)单列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '4', '6', '7', '8'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='right', on ='Id') print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='right',left_on = "XId",right_on = "YId")
输出:
Id Name order Gender Like 0 2 John 2 F C 1 4 Mickey 4 M Java 2 6 NaN NaN M JS 3 7 NaN NaN F Linux 4 8 NaN NaN F Python
2)多列条件
import numpy as np import pandas as pd left = pd.DataFrame({'Id': ['1', '2', '3', '4', '5'], 'UserId':['1', '2', '3', '4', '5'], 'Name': ['Levi', 'John', 'Lynn', 'Mickey', 'Rose'], 'order': ['1', '2', '3', '4', '5']}) right = pd.DataFrame({'Id': ['2', '9', '3', '4', '8'], 'UserId': ['1', '2', '3', '4', '5'], 'Gender': ['F', 'M', 'M', 'F', 'F'], 'Like': ['C', 'Java', 'JS', 'Linux','Python']}) #合并 df = pd.merge(left, right, how ='right', on =['Id','UserId']) print(df) #两个DataFrame列名不同,使用left_on和right_on指定列名 #df = pd.merge(left,right,how ='right',left_on = ["XId","UserId"],right_on = ["YId","UserId"])
输出:
Id UserId Name order Gender Like 0 2 1 NaN NaN F C 1 9 2 NaN NaN M Java 2 3 3 Lynn 3 M JS 3 4 4 Mickey 4 F Linux 4 8 5 NaN NaN F Python