Your IP : 18.222.35.126
B
� f9E�@s�dZddlZddlZddlZddlmZdgZe�d�Ze�d�Z e�d�Z
e�d�Ze�d �Ze�d
�Z
e�d�Ze�d�Ze�d
�Ze�dej�Ze�d
�Ze�d�ZGdd�dej�ZdS)zA parser for HTML and XHTML.�N)�unescape�
HTMLParserz[&<]z
&[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]�>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF
<[a-zA-Z][^\t\n\r\f />\x00]* # tag name
(?:[\s/]* # optional whitespace before attribute name
(?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name
(?:\s*=+\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|"[^"]*" # LIT-enclosed value
|(?!['"])[^>\s]* # bare value
)
(?:\s*,)* # possibly followed by a comma
)?(?:\s|/(?!>))*
)*
)?
\s* # trailing whitespace
z#</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c@s�eZdZdZdZdd�dd�Zdd�Zd d
�Zdd�Zd
Z dd�Z
dd�Zdd�Zdd�Z
dd�Zd9dd�Zdd�Zdd�Zdd �Zd!d"�Zd#d$�Zd%d&�Zd'd(�Zd)d*�Zd+d,�Zd-d.�Zd/d0�Zd1d2�Zd3d4�Zd5d6�Zd7d8�Zd
S):raEFind tags and other markup and call handler functions.
Usage:
p = HTMLParser()
p.feed(data)
...
p.close()
Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag(). The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks). If convert_charrefs is
True the character references are converted automatically to the
corresponding Unicode character (and self.handle_data() is no
longer split in chunks), otherwise they are passed by calling
self.handle_entityref() or self.handle_charref() with the string
containing respectively the named or numeric reference as the
argument.
)ZscriptZstyleT)�convert_charrefscCs||_|��dS)z�Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
N)r�reset)�selfr�r�0/opt/alt/python37/lib64/python3.7/html/parser.py�__init__WszHTMLParser.__init__cCs(d|_d|_t|_d|_tj�|�dS)z1Reset this instance. Loses all unprocessed data.�z???N)�rawdata�lasttag�interesting_normal�interesting�
cdata_elem�_markupbase�
ParserBaser)rrrr r`s
zHTMLParser.resetcCs|j||_|�d�dS)z�Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
rN)r�goahead)r�datarrr �feedhszHTMLParser.feedcCs|�d�dS)zHandle any buffered data.�N)r)rrrr �closeqszHTMLParser.closeNcCs|jS)z)Return full source of start tag: '<...>'.)�_HTMLParser__starttag_text)rrrr �get_starttag_textwszHTMLParser.get_starttag_textcCs$|��|_t�d|jtj�|_dS)Nz</\s*%s\s*>)�lowerr�re�compile�Ir)r�elemrrr �set_cdata_mode{s
zHTMLParser.set_cdata_modecCst|_d|_dS)N)rrr)rrrr �clear_cdata_modeszHTMLParser.clear_cdata_modecCsN|j}d}t|�}�x�||k�r�|jrx|jsx|�d|�}|dkr�|�dt||d��}|dkrrt�d�� ||�srP|}n(|j
� ||�}|r�|��}n|jr�P|}||kr�|jr�|js�|�t
|||���n|�|||��|�||�}||kr�P|j}|d|��rDt�||��r |�|�} n�|d|��r8|�|�} nl|d|��rP|�|�} nT|d|��rh|�|�} n<|d |��r�|�|�} n$|d
|k�r�|�d�|d
} nP| dk�r6|�s�P|�d|d
�} | dk�r�|�d|d
�} | dk�r�|d
} n| d
7} |j�r$|j�s$|�t
||| ���n|�||| ��|�|| �}q|d|��r�t�||�}|�r�|��d
d�}
|�|
�|��} |d| d
��s�| d
} |�|| �}qn:d||d�k�r�|�|||d
��|�||d
�}Pq|d|��r�t�||�}|�rH|�d
�}
|�|
�|��} |d| d
��s:| d
} |�|| �}qt�||�}|�r�|�r�|��||d�k�r�|��} | |k�r�|} |�||d
�}Pn,|d
|k�r�|�d�|�||d
�}nPqdstd��qW|�r<||k�r<|j�s<|j�r|j�s|�t
|||���n|�|||��|�||�}||d�|_dS)Nr�<�&�"z[\s;]z</z<!--z<?z<!rrz&#�����;zinteresting.search() lied)r�lenrr�find�rfind�maxrr�searchr�start�handle_datarZ updatepos�
startswith�starttagopen�match�parse_starttag�parse_endtag�
parse_comment�parse_pi�parse_html_declaration�charref�group�handle_charref�end� entityref�handle_entityref�
incomplete�AssertionError)rr9r�i�n�jZampposr0r.�k�namerrr r�s�
zHTMLParser.goaheadcCs�|j}|||d�dks"td��|||d�dkr@|�|�S|||d�dkr^|�|�S|||d���d kr�|�d
|d�}|dkr�dS|�||d|��|dS|�|�SdS)
Nr$z<!z+unexpected call to parse_html_declaration()�z<!--�z<![� z <!doctyperr%r)rr=r3Zparse_marked_sectionrr(�handle_decl�parse_bogus_comment)rr>r�gtposrrr r5s
z!HTMLParser.parse_html_declarationrcCs`|j}|||d�dks"td��|�d|d�}|dkr>dS|rX|�||d|��|dS)Nr$)z<!z</z"unexpected call to parse_comment()rr%r)rr=r(�handle_comment)rr>Zreportr�posrrr rGszHTMLParser.parse_bogus_commentcCsd|j}|||d�dks"td��t�||d�}|s:dS|��}|�||d|��|��}|S)Nr$z<?zunexpected call to parse_pi()r%)rr=�picloser+r,� handle_pir9)rr>rr0r@rrr r4!szHTMLParser.parse_picCs�d|_|�|�}|dkr|S|j}|||�|_g}t�||d�}|sPtd��|��}|�d���|_ }x�||k�r.t
�||�}|s�P|�ddd�\} }
}|
s�d}n\|dd�dkr�|dd�ks�n|dd�dkr�|dd�k�rnn|dd�}|�rt|�}|�| ��|f�|��}qnW|||��
�}|d k�r�|��\}
}d
|jk�r�|
|j�d
�}
t|j�|j�d
�}n|t|j�}|�|||��|S|�d��r�|�||�n"|�||�||jk�r�|�|�|S)Nrrz#unexpected call to parse_starttag()r$rD�'r%�")rz/>�
z/>)r�check_for_whole_start_tagr�tagfind_tolerantr0r=r9r7rr
�attrfind_tolerantr�append�stripZgetpos�countr'r)r-�endswith�handle_startendtag�handle_starttag�CDATA_CONTENT_ELEMENTSr)rr>�endposr�attrsr0rA�tag�mZattrname�restZ attrvaluer9�lineno�offsetrrr r1-sR
&*
zHTMLParser.parse_starttagcCs�|j}t�||�}|r�|��}|||d�}|dkr>|dS|dkr~|�d|�rZ|dS|�d|�rjdS||krv|S|dS|dkr�dS|dkr�dS||kr�|S|dStd ��dS)
Nrr�/z/>r$r%rz6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzwe should not get here!)r�locatestarttagend_tolerantr0r9r.r=)rr>rr]r@�nextrrr rP`s.z$HTMLParser.check_for_whole_start_tagcCs.|j}|||d�dks"td��t�||d�}|s:dS|��}t�||�}|s�|jdk rr|�|||��|St �||d�}|s�|||d�dkr�|dS|�
|�S|�d���}|�
d|���}|�|�|dS|�d���}|jdk �r||jk�r|�|||��|S|�|�|��|S) Nr$z</zunexpected call to parse_endtagrr%rDz</>r)rr=� endendtagr+r9�
endtagfindr0rr-rQrGr7rr(�
handle_endtagr )rr>rr0rHZ namematchZtagnamerrrr r2�s8
zHTMLParser.parse_endtagcCs|�||�|�|�dS)N)rXrf)rr\r[rrr rW�szHTMLParser.handle_startendtagcCsdS)Nr)rr\r[rrr rX�szHTMLParser.handle_starttagcCsdS)Nr)rr\rrr rf�szHTMLParser.handle_endtagcCsdS)Nr)rrBrrr r8�szHTMLParser.handle_charrefcCsdS)Nr)rrBrrr r;�szHTMLParser.handle_entityrefcCsdS)Nr)rrrrr r-�szHTMLParser.handle_datacCsdS)Nr)rrrrr rI�szHTMLParser.handle_commentcCsdS)Nr)rZdeclrrr rF�szHTMLParser.handle_declcCsdS)Nr)rrrrr rL�szHTMLParser.handle_picCsdS)Nr)rrrrr �unknown_decl�szHTMLParser.unknown_declcCstjdtdd�t|�S)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.r$)�
stacklevel)�warnings�warn�DeprecationWarningr)r�srrr r�s
zHTMLParser.unescape)r)�__name__�
__module__�__qualname__�__doc__rYr
rrrrrrr rr5rGr4r1rPr2rWrXrfr8r;r-rIrFrLrgrrrrr r?s8 z
3"()rprrirZhtmlr�__all__rrr<r:r6r/rKZcommentcloserQrR�VERBOSErbrdrerrrrrr �<module>s(
?>