Python3 BeautifulSoup安装及爬取网站网页示例代码-CJavaPy

1、BeautifulSoup的安装

安装 BeautifulSoup 和 requests 库。使用 pip 安装：

pip install beautifulsoup4

pip install requests

2、使用BeautifulSoup爬取网站网页示例代码

使用 requests 库来发送 HTTP 请求，并使用 BeautifulSoup 解析网页内容。

import bs4  # 导入 BeautifulSoup 模块用于解析 HTML
import requests  # 导入 requests 模块用于发送 HTTP 请求

# 发送 HTTP GET 请求到维基百科页面并获取响应
response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

# 检查请求是否成功（非空响应表示成功）
if response is not None:
    # 解析响应内容为 HTML 格式，'html.parser' 为解析器类型
    html = bs4.BeautifulSoup(response.text, 'html.parser')
    
    # 选择页面中 id 为 'firstHeading' 的元素，通常是标题，并获取其文本
    title = html.select("#firstHeading")[0].text
    
    # 选择页面中所有的 <p> 标签，这些通常表示段落
    paragraphs = html.select("p")
    
    # 遍历所有段落并打印每个段落的文本内容
    for para in paragraphs:
        print(para.text)
    
    # 提取前5个段落的文本内容并拼接成一个字符串作为介绍部分
    # 使用列表推导式从 paragraphs[0:5] 中提取文本
    intro = '\n'.join([para.text for para in paragraphs[0:5]])
    
    # 打印介绍部分的文本内容
    print(intro)

或

# 导入 requests 库用于发送 HTTP 请求
import requests

# 从 BeautifulSoup 库导入 BeautifulSoup 类，用于解析 HTML 文档
from bs4 import BeautifulSoup

# 发送 GET 请求到指定 URL，获取页面内容
page = requests.get('https://www.nga.gov/collection/anZ1.htm')

# 使用 BeautifulSoup 解析网页内容，'html.parser' 表示使用内置的 HTML 解析器
soup = BeautifulSoup(page.text, 'html.parser')

# 查找页面中 class 为 'AlphaNav' 的元素，这部分内容不需要，因此我们删除它
last_links = soup.find(class_='AlphaNav')

# 使用 .decompose() 方法将 'AlphaNav' 部分从 DOM 中完全移除
last_links.decompose()

# 查找 class 为 'BodyText' 的元素，这部分包含了艺术家的名称列表
artist_name_list = soup.find(class_='BodyText')

# 在 'BodyText' 元素中查找所有的 <a> 标签，返回包含艺术家链接的列表
artist_name_list_items = artist_name_list.find_all('a')

# 使用循环遍历找到的 <a> 标签，并提取其中的文本内容
# 使用 .contents[0] 获取 <a> 标签的第一个子元素，即艺术家名称
for artist_name in artist_name_list_items:
    # 获取艺术家名称
    names = artist_name.contents[0]
    
    # 打印艺术家名称
    print(names)

Python3 BeautifulSoup安装及爬取网站网页示例代码

1、BeautifulSoup的安装

2、使用BeautifulSoup爬取网站网页示例代码

Python 2.7中安装pip的方法及步骤

Python numpy.full函数方法的使用

Java JDK11 在windows上的安装和环境变量配置

Java Stream使用多个过滤器(filter)或复杂条件方法用法及简单写法代码

Java JDK11 在Mac上的安装和配置以及JDK多个版本之间切换

Python PIP升级后执行命令报错： sys.stderr.write(f"ERROR: {exc}")解决方法

Python pandas.to_numeric函数方法的使用

Python numpy.fromfile函数方法的使用