Your IP : 18.222.89.230
3
\�"�@s\dZddlZddlZddlZdgZejdd�ZGdd�d�ZGdd�d�Z Gd d
�d
�Z
dS)a% robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://www.robotstxt.org/norobots-rfc.txt
�N�RobotFileParser�RequestRatezrequests secondsc@sjeZdZdZddd�Zdd�Zdd�Zd d
�Zdd�Zd
d�Z dd�Z
dd�Zdd�Zdd�Z
dd�ZdS)rzs This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
�cCs,g|_d|_d|_d|_|j|�d|_dS)NFr)�entries�
default_entry�disallow_all� allow_all�set_url�last_checked)�self�url�r
�*/usr/lib64/python3.6/urllib/robotparser.py�__init__s
zRobotFileParser.__init__cCs|jS)z�Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
)r
)rr
r
r�mtime$szRobotFileParser.mtimecCsddl}|j�|_dS)zYSets the time the robots.txt file was last fetched to the
current time.
rN)�timer
)rrr
r
r�modified-szRobotFileParser.modifiedcCs&||_tjj|�dd�\|_|_dS)z,Sets the URL referring to a robots.txt file.��N)r�urllib�parse�urlparse�host�path)rrr
r
rr 5szRobotFileParser.set_urlcCs�ytjj|j�}WnRtjjk
rd}z2|jdkr:d|_n|jdkrT|jdkrTd|_WYdd}~XnX|j �}|j
|jd�j��dS) z4Reads the robots.txt URL and feeds it to the parser.��Ti�i�Nzutf-8)rr)
rZrequestZurlopenr�errorZ HTTPError�coderr�readr�decode�
splitlines)r�f�err�rawr
r
rr:s
zRobotFileParser.readcCs,d|jkr|jdkr(||_n|jj|�dS)N�*)�
useragentsrr�append)r�entryr
r
r�
_add_entryGs
zRobotFileParser._add_entrycCs6d}t�}|j��x|D�]�}|sT|dkr8t�}d}n|dkrT|j|�t�}d}|jd�}|dkrr|d|�}|j�}|s�q|jdd�}t|�dkr|dj�j�|d<tj j
|dj��|d<|ddk�r|dkr�|j|�t�}|jj|d�d}q|ddk�r4|dk�r|j
jt|dd ��d}q|dd
k�rh|dk�r|j
jt|dd��d}q|ddk�r�|dk�r|dj�j��r�t|d�|_d}q|dd
kr|dkr|djd�}t|�dk�r|dj�j��r|dj�j��rtt|d�t|d��|_d}qW|dk�r2|j|�dS)z�Parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.
rr��#N�:z
user-agentZdisallowFZallowTzcrawl-delayzrequest-rate�/)�Entryrr(�find�strip�split�len�lowerrr�unquoter%r&� rulelines�RuleLine�isdigit�int�delayr�req_rate)r�lines�stater'�line�iZnumbersr
r
rrPsd
zRobotFileParser.parsecCs�|jr
dS|jrdS|jsdStjjtjj|��}tjjdd|j|j |j
|jf�}tjj|�}|sfd}x"|j
D]}|j|�rn|j|�SqnW|jr�|jj|�SdS)z=using the parsed robots.txt decide if useragent can fetch urlFTrr,)rrr
rrrr3�
urlunparserZparamsZqueryZfragment�quoter�
applies_to� allowancer)r� useragentrZ
parsed_urlr'r
r
r� can_fetch�s$
zRobotFileParser.can_fetchcCs4|j�sdSx|jD]}|j|�r|jSqW|jjS)N)rrr@r8r)rrBr'r
r
r�crawl_delay�s
zRobotFileParser.crawl_delaycCs4|j�sdSx|jD]}|j|�r|jSqW|jjS)N)rrr@r9r)rrBr'r
r
r�request_rate�s
zRobotFileParser.request_ratecCs0|j}|jdk r||jg}djtt|��dS)N�
)rr�join�map�str)rrr
r
r�__str__�s
zRobotFileParser.__str__N)r)�__name__�
__module__�__qualname__�__doc__rrrr rr(rrCrDrErJr
r
r
rrs
Cc@s(eZdZdZdd�Zdd�Zdd�ZdS) r5zoA rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path.cCs>|dkr|rd}tjjtjj|��}tjj|�|_||_dS)NrT)rrr>rr?rrA)rrrAr
r
rr�s
zRuleLine.__init__cCs|jdkp|j|j�S)Nr$)r�
startswith)r�filenamer
r
rr@�szRuleLine.applies_tocCs|jr
dndd|jS)NZAllowZDisallowz: )rAr)rr
r
rrJ�szRuleLine.__str__N)rKrLrMrNrr@rJr
r
r
rr5�sr5c@s0eZdZdZdd�Zdd�Zdd�Zdd �Zd
S)r-z?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_d|_d|_dS)N)r%r4r8r9)rr
r
rr�szEntry.__init__cCs�g}x|jD]}|jd|���qW|jdk r@|jd|j���|jdk rj|j}|jd|j�d|j���|jtt|j ��|jd�dj
|�S)NzUser-agent: z
Crawl-delay: zRequest-rate: r,rrF)r%r&r8r9ZrequestsZseconds�extendrHrIr4rG)rZret�agentZrater
r
rrJ�s
z
Entry.__str__cCsF|jd�dj�}x.|jD]$}|dkr*dS|j�}||krdSqWdS)z2check if this entry applies to the specified agentr,rr$TF)r0r2r%)rrBrRr
r
rr@�szEntry.applies_tocCs$x|jD]}|j|�r|jSqWdS)zZPreconditions:
- our agent applies to this entry
- filename is URL decodedT)r4r@rA)rrPr<r
r
rrA�s
zEntry.allowanceN)rKrLrMrNrrJr@rAr
r
r
rr-�s
r-)rN�collectionsZurllib.parserZurllib.request�__all__�
namedtuplerrr5r-r
r
r
r�<module>s2
?>