e*�Z+e+� Z,e,j-Z-e�dej�e�d ej�gZ.d!d"d#d$d%d&gZ/e�d'ej�e�d(ej�e�d)�gZ0d*gZ1e.e/e0e1fd+d�Z2d,d-� Z3d.d� Z4e2j e4_ d"d!d#gZ5d/gZ6d0e5e6ed1�fd2d�Z7d3d� Z8d4d5� Z9e�d6ej�Z:d7d8� Z;dS )9zcA cleanup tool for HTML.
Removes unwanted tags and content. See the `Cleaner` class for
@\s*importz^data:image/.+;base64z<(?:javascript|jscript|livescript|vbscript|data|about|mocha):c C s t | �rd S t| �S )N)�_is_image_dataurl�_is_possibly_malicious_scheme)�s� r �B/opt/alt/python37/lib64/python3.7/site-packages/lxml/html/�_is_javascript_schemeN s r z[\s\x00-\x08\x0B\x0C\x0E-\x19]+z\[if[\s\n\r]+.*?][\s\n\r]*>zdescendant-or-self::*[@style]z�descendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']�x)Z
Instances cleans the document of each of the possible offending
elements. The cleaning is controlled by attributes; you can
override attributes in a subclass, or set them in the constructor.
Removes any ``<script>`` tags.
Removes any Javascript, like an ``onclick`` attribute. Also removes stylesheets
as they could contain Javascript.
Removes any comments.
Removes any style tags.
Removes any style attributes. Defaults to the value of the ``style`` option.
Removes any ``<link>`` tags
Removes any ``<meta>`` tags
Structural parts of a page: ``<head>``, ``<html>``, ``<title>``.
Removes any processing instructions.
Removes any embedded objects (flash, iframes)
Removes any frame-related tags
Removes any form tags
Tags that aren't *wrong*, but are annoying. ``<blink>`` and ``<marquee>``
A list of tags to remove. Only the tags will be removed,
their content will get pulled up into the parent tag.
A list of tags to kill. Killing also removes the tag's content,
i.e. the whole subtree, not just the tag itself.
A list of tags to include (default include all).
Remove any tags that aren't standard parts of HTML.
If true, only include 'safe' attributes (specifically the list
from the feedparser HTML sanitisation web site).
A set of attribute names to override the default list of attributes
considered 'safe' (when safe_attrs_only=True).
If true, then any <a> tags will have ``rel="nofollow"`` added to them.
A list or set of hosts that you can use for embedded content
(for content like ``<object>``, ``<link rel="stylesheet">``, etc).
You can also implement/override the method
``allow_embedded_url(el, url)`` or ``allow_element(el)`` to
implement more complex rules for what can be embedded.
Anything that passes this test will be shown, regardless of
the value of (for instance) ``embedded``.
Note that this parameter might not work as intended if you do not
make the links absolute before doing the cleaning.
Note that you may also need to set ``whitelist_tags``.
A set of tags that can be included with ``host_whitelist``.
The default is ``iframe`` and ``embed``; you may wish to
include other tags like ``script``, or you may want to
implement ``allow_embedded_url`` for more control. Set to None to
include all tags.
This modifies the document *in place*.
zCleaner.__call__c C s dS )zF
Override to suppress rel="nofollow" on some anchors.
Fr )r! �anchorr r r rj � s zCleaner.allow_followc C s� |j | jkrdS | j|j }t|ttf�r^x.|D ]&}|�|�}|sFdS | �||�s0dS q0W dS |�|�}|spdS | �||�S dS )z�
Decide whether an element is configured to be accepted or rejected.
:param el: an element.
:return: true to accept the element or false to reject/discard it.
FTN)r= �_tag_link_attrs�
isinstancerS �tuplerN �allow_embedded_url)r! rl �attrZone_attr�urlr r r r[ � s
zCleaner.allow_elementc C s^ | j dk r|j| j krdS t|�\}}}}}|�� �dd�d }|dkrLdS || jkrZdS dS )a
Decide whether a URL that was found in an element's attributes or text
if configured to be accepted or rejected.
:param el: an element.
:param url: a URL found on the element.
:return: true to accept the URL and false to reject it.
NF�:� r )ZhttpZhttpsT)�whitelist_tagsr= r rT �split�host_whitelist)r! rl ry ZschemeZnetloc�pathZqueryZfragmentr r r rw � s
zCleaner.allow_embedded_urlc C s g }| � |dd� tj� dS )z�
IE conditional comments basically embed HTML that the parser
doesn't normally see. We can't allow anything like that, so
we'll kill any comments that could be conditional.
c S s t �| j�S )N)�_conditional_comment_re�searchrW )rl r r r �<lambda>� � z3Cleaner.kill_conditional_comments.<locals>.<lambda>N)�_kill_elementsr rY )r! rk rq r r r r? � s z!Cleaner.kill_conditional_commentsc C sD g }x$|� |�D ]}||�r|�|� qW x|D ]}|�� q0W d S )N)r<