Python Dateframe列中匹配查找数据(list)添加新列的方法

本文主要介绍Python中,从Dataframe中一列查数据,列为数据类型为列表list,查找的数据也是list,每行数据中找到的,并添加到Dataframe中新列的方法(np.intersect1d,set,np.any,),及相关示例代码。

数据:

      ID                   Found_IDs  
0  12345        [15443, 15533, 3433]   
1  15533  [2234, 16608, 12002, 7654]   
2   6789      [43322, 876544, 36789] 

待查数据:

bad_ids = [15533, 876544]

1、使用np.intersect1d实现

df['bad_id'] = df['Found_IDs'].apply(lambda x: np.intersect1d(x, bad_ids))
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]

2、使用set求交集

bad_ids_set = set(bad_ids)
df['Found_IDs'].apply(lambda x: list(set(x) & bad_ids_set))
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]

3、使用列表表达式实现

bad_ids = [15533, 876544]
df['bad_id'] = [any(c in l for c in bad_ids) for l in df['Found_IDs']]
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] True
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] True

4、使用apply和np.any实现

df['bad_id'] = df['Found_IDs'].apply(lambda x: np.any([c in x for c in bad_ids]))

或者

df['bad_id'] = df['Found_IDs'].apply(lambda x: [*filter(lambda x: c in x, bad_ids)])

5、 使用pd.concat和merge实现

bad_ids = [15533, 876544, 36789, 11111]
df2 = pd.concat(
[
df,
pd.merge(
df["Found_IDs"].explode().reset_index(),
pd.Series(bad_ids, name="bad_ids"),
left_on="Found_IDs",
right_on="bad_ids",
how="inner",
)
.groupby("index")
.agg(bad_ids=("bad_ids", list)),
],
axis=1,
).fillna(False)
print(df2)

ID Found_IDs bad_ids
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] [876544, 36789]
推荐阅读
cjavapy编程之路首页