两个DataFrame:
import pandas as pd df1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] }) df2 = pd.DataFrame({ 'Langs': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] })
预期结果:
Langs Values Names 0 C 1 coder1 1 Py 2 coder2 2 Java 3 coder3
或
Langs Values Names 0 C 1 coder1 1 C 1 coder3 2 Java 3 coder3 3 Py 2 coder2 4 Py 2 coder3
1、使用thefuzz包和extractOne函数
安装thefuzz:
pip install thefuzz
pip install python-Levenshtein
实现代码:
import pandas as pd from thefuzz import process df1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] }) df2 = pd.DataFrame({ 'Langs': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] }) langs = lambda x: process.extractOne(x, df2["Langs"])[2] df1['Names'] = df2.loc[df1["Langs"].map(langs).values, 'Names'].values print(df1)
2、使用merge()和groupby()
import pandas as pd from thefuzz import process df1 = pd.DataFrame({ 'Langs': ['C','Py','Java'], 'Values': [1,2,3] }) df2 = pd.DataFrame({ 'Langs_y': ['C语言','Python','CJavaPy'], 'Names': ['coder1','coder2','coder3'] }) df1['join'] = 1 df2['join'] = 1 dfFull = df1.merge(df2, on='join',).drop('join', axis=1) df2.drop('join', axis=1, inplace=True) dfFull['match'] = dfFull.apply(lambda x: x.Langs_y.find(x.Langs), axis=1).ge(0) dfFull = dfFull[dfFull['match']==True] dfResult = dfFull.groupby(["Langs", "Names"]).max().reset_index()[['Langs', 'Values' , 'Names' ]] print(dfResult)