Your IP : 3.136.20.78
U
e5d�$�@s\dZddlZddlZddlZdgZe�dd�ZGdd�d�ZGdd�d�Z Gd d
�d
�Z
dS)a% robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://www.robotstxt.org/norobots-rfc.txt
�N�RobotFileParser�RequestRatezrequests secondsc@sreZdZdZddd�Zdd�Zdd�Zd d
�Zdd�Zd
d�Z dd�Z
dd�Zdd�Zdd�Z
dd�Zdd�ZdS)rzs This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
�cCs2g|_g|_d|_d|_d|_|�|�d|_dS)NFr)�entries�sitemaps�
default_entry�disallow_all� allow_all�set_url�last_checked��self�url�r�*/usr/lib64/python3.8/urllib/robotparser.py�__init__s
zRobotFileParser.__init__cCs|jS)z�Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
)r�r
rrr�mtime%szRobotFileParser.mtimecCsddl}|��|_dS)zYSets the time the robots.txt file was last fetched to the
current time.
rN)�timer)r
rrrr�modified.szRobotFileParser.modifiedcCs&||_tj�|�dd�\|_|_dS)z,Sets the URL referring to a robots.txt file.��N)r�urllib�parse�urlparseZhost�pathrrrrr
6szRobotFileParser.set_urlc
Cs�ztj�|j�}WnRtjjk
rd}z0|jdkr:d|_n|jdkrT|jdkrTd|_W5d}~XYnX|� �}|�
|�d����dS)z4Reads the robots.txt URL and feeds it to the parser.)i�i�Ti�i�Nzutf-8)
rZrequestZurlopenr�errorZ HTTPError�coderr �readr�decode�
splitlines)r
�f�err�rawrrrr;s
zRobotFileParser.readcCs,d|jkr|jdkr(||_n|j�|�dS�N�*)�
useragentsrr�append)r
�entryrrr�
_add_entryHs
zRobotFileParser._add_entrycCsPd}t�}|��|D�]}|sP|dkr4t�}d}n|dkrP|�|�t�}d}|�d�}|dkrn|d|�}|��}|s|q|�dd�}t|�dkr|d����|d<tj �
|d���|d<|ddkr�|dkr�|�|�t�}|j�|d�d}q|ddk�r.|dk�r6|j
�t|dd ��d}q|dd
k�rb|dk�r6|j
�t|dd��d}q|ddk�r�|dk�r6|d�����r�t|d�|_d}q|dd
k�r|dk�r6|d�d�}t|�dk�r|d�����r|d�����rtt|d�t|d��|_d}q|ddkr|j�|d�q|dk�rL|�|�dS)z�Parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.
rr��#N�:z
user-agentZdisallowFZallowTzcrawl-delayzrequest-rate�/Zsitemap)�Entryrr)�find�strip�split�len�lowerrr�unquoter&r'� rulelines�RuleLine�isdigit�int�delayr�req_rater)r
�lines�stater(�line�iZnumbersrrrrQsj
�
zRobotFileParser.parsecCs�|jr
dS|jrdS|jsdStj�tj�|��}tj�dd|j|j |j
|jf�}tj�|�}|sfd}|j
D]}|�|�rl|�|�Sql|jr�|j�|�SdS)z=using the parsed robots.txt decide if useragent can fetch urlFTrr-)rr rrrrr4�
urlunparserZparamsZqueryZfragment�quoter�
applies_to� allowancer)r
� useragentrZ
parsed_urlr(rrr� can_fetch�s*�
zRobotFileParser.can_fetchcCs>|��sdS|jD]}|�|�r|jSq|jr:|jjSdS�N)rrrAr9r�r
rCr(rrr�crawl_delay�s
zRobotFileParser.crawl_delaycCs>|��sdS|jD]}|�|�r|jSq|jr:|jjSdSrE)rrrAr:rrFrrr�request_rate�s
zRobotFileParser.request_ratecCs|js
dS|jSrE)rrrrr� site_maps�szRobotFileParser.site_mapscCs,|j}|jdk r||jg}d�tt|��S)Nz
)rr�join�map�str)r
rrrr�__str__�s
zRobotFileParser.__str__N)r)�__name__�
__module__�__qualname__�__doc__rrrr
rr)rrDrGrHrIrMrrrrrs
I
c@s(eZdZdZdd�Zdd�Zdd�ZdS) r6zoA rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path.cCs<|dkr|sd}tj�tj�|��}tj�|�|_||_dS)NrT)rrr?rr@rrB)r
rrBrrrr�s
zRuleLine.__init__cCs|jdkp|�|j�Sr$)r�
startswith)r
�filenamerrrrA�szRuleLine.applies_tocCs|jr
dndd|jS)NZAllowZDisallowz: )rBrrrrrrM�szRuleLine.__str__N)rNrOrPrQrrArMrrrrr6�sr6c@s0eZdZdZdd�Zdd�Zdd�Zdd �Zd
S)r.z?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_d|_d|_dSrE)r&r5r9r:rrrrr�szEntry.__init__cCs�g}|jD]}|�d|���q
|jdk r<|�d|j���|jdk rf|j}|�d|j�d|j���|�tt|j ��d�
|�S)NzUser-agent: z
Crawl-delay: zRequest-rate: r-�
)r&r'r9r:ZrequestsZseconds�extendrKrLr5rJ)r
Zret�agentZraterrrrM�s
z
Entry.__str__cCsF|�d�d��}|jD](}|dkr*dS|��}||krdSqdS)z2check if this entry applies to the specified agentr-rr%TF)r1r3r&)r
rCrVrrrrA�s
zEntry.applies_tocCs$|jD]}|�|�r|jSqdS)zZPreconditions:
- our agent applies to this entry
- filename is URL decodedT)r5rArB)r
rSr=rrrrB
s
zEntry.allowanceN)rNrOrPrQrrMrArBrrrrr.�s
r.)rQ�collectionsZurllib.parserZurllib.request�__all__�
namedtuplerrr6r.rrrr�<module>sB
?>