本文主要介绍Python Django外网网站,如有用户恶意请求网站情况,怎样通过中间件来限制一定时间访问网站的频率,以及怎样判断IP是否为搜索引擎蜘蛛爬虫。

1、中间件代码

import time
from django.utils.deprecation import MiddlewareMixin
MAX_REQUEST_PER_SECOND=2 #每秒访问次数
class RequestBlockingMiddleware(MiddlewareMixin):
    def process_request(self,request):
        now=time.time()
        request_queue = request.session.get('request_queue',[])
        if len(request_queue) < MAX_REQUEST_PER_SECOND:
            request_queue.append(now)
            request.session['request_queue']=request_queue
        else:
            time0=request_queue[0]
            if (now-time0)<1:
                time.sleep(5)
            request_queue.append(time.time())
            request.session['request_queue']=request_queue[1:]

2、settings.py配置app.middleware.RequestBlockingMiddleware中间件

#启用RequestBlocking中间件
IDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'app.middleware.RequestBlockingMiddleware', #在sessions之后,auth之前
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
     ]

3、判断IP是否是搜索引擎

import socket
def getHost(ip):
try:
result=socket.gethostbyaddr(ip)
if result:
return result[0]
return None
except socket.herror as e:
pass
return None

>>>getHost("203.208.60.11")
'crawl-203-208-60-11.googlebot.com'

#根据返回的结果就可以判断是否为搜索引擎

注意:python2 和 python3 处理 except 子句的语法有点不同,需要注意;

1)Python2   

try:
    print (1/0)
except ZeroDivisionError, err:      # , 加原因参数名称 
    print ('Exception: ', err)

2)Python3   

try:
    print (1/0)
except ZeroDivisionError as err:        # as 加原因参数名称
    print ('Exception: ', err)


推荐文档