两个DataFrame:
import pandas as pd
df1 = pd.DataFrame({
'Langs': ['C','Py','Java'],
'Values': [1,2,3]
})
df2 = pd.DataFrame({
'Langs': ['C语言','Python','CJavaPy'],
'Names': ['coder1','coder2','coder3']
})
预期结果:
Langs Values Names 0 C 1 coder1 1 Py 2 coder2 2 Java 3 coder3
或
Langs Values Names 0 C 1 coder1 1 C 1 coder3 2 Java 3 coder3 3 Py 2 coder2 4 Py 2 coder3
1、使用thefuzz包和extractOne函数
安装thefuzz:
pip install thefuzz
pip install python-Levenshtein
实现代码:
import pandas as pd
from thefuzz import process
df1 = pd.DataFrame({
'Langs': ['C','Py','Java'],
'Values': [1,2,3]
})
df2 = pd.DataFrame({
'Langs': ['C语言','Python','CJavaPy'],
'Names': ['coder1','coder2','coder3']
})
langs = lambda x: process.extractOne(x, df2["Langs"])[2]
df1['Names'] = df2.loc[df1["Langs"].map(langs).values, 'Names'].values
print(df1)
2、使用merge()和groupby()
import pandas as pd
from thefuzz import process
df1 = pd.DataFrame({
'Langs': ['C','Py','Java'],
'Values': [1,2,3]
})
df2 = pd.DataFrame({
'Langs_y': ['C语言','Python','CJavaPy'],
'Names': ['coder1','coder2','coder3']
})
df1['join'] = 1
df2['join'] = 1
dfFull = df1.merge(df2, on='join',).drop('join', axis=1)
df2.drop('join', axis=1, inplace=True)
dfFull['match'] = dfFull.apply(lambda x: x.Langs_y.find(x.Langs), axis=1).ge(0)
dfFull = dfFull[dfFull['match']==True]
dfResult = dfFull.groupby(["Langs", "Names"]).max().reset_index()[['Langs', 'Values' , 'Names' ]]
print(dfResult)