Your IP : 216.73.216.209
3
�]9Y��@s0ddlZddlZddlmZGdd�de�ZdS)�N�)�ProbingStatec@sneZdZdZddd�Zdd�Zedd��Zd d
�Zedd��Z d
d�Z
edd��Zedd��Z
edd��ZdS)�
CharSetProbergffffff�?NcCsd|_||_tjt�|_dS)N)�_state�lang_filter�loggingZ getLogger�__name__Zlogger)�selfr�r
�#/usr/lib/python3.6/charsetprober.py�__init__'szCharSetProber.__init__cCstj|_dS)N)rZ DETECTINGr)r r
r
r�reset,szCharSetProber.resetcCsdS)Nr
)r r
r
r�charset_name/szCharSetProber.charset_namecCsdS)Nr
)r �bufr
r
r�feed3szCharSetProber.feedcCs|jS)N)r)r r
r
r�state6szCharSetProber.statecCsdS)Ngr
)r r
r
r�get_confidence:szCharSetProber.get_confidencecCstjdd|�}|S)Ns([-])+� )�re�sub)rr
r
r�filter_high_byte_only=sz#CharSetProber.filter_high_byte_onlycCsbt�}tjd|�}xJ|D]B}|j|dd��|dd�}|j�rP|dkrPd}|j|�qW|S)u9
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [-ÿ]
marker: everything else [^a-zA-Z-ÿ]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
s%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?Nr��r���r)� bytearrayr�findall�extend�isalpha)r�filteredZwordsZwordZ last_charr
r
r�filter_international_wordsBs
z(CharSetProber.filter_international_wordscCs�t�}d}d}x�tt|��D]r}|||d�}|dkr>d}n|dkrJd}|dkr|j�r||kr�|r�|j|||��|jd�|d}qW|s�|j||d ��|S)
a�
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
Also retains English alphabet and high byte characters immediately
before occurrences of >.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.
Frr�>�<TrrN)r�range�lenrr)rrZin_tag�prevZcurrZbuf_charr
r
r�filter_with_english_lettersgs"
z)CharSetProber.filter_with_english_letters)N)r�
__module__�__qualname__ZSHORTCUT_THRESHOLDrr
�propertyrrrr�staticmethodrrr$r
r
r
rr#s
%r)rrZenumsr�objectrr
r
r
r�<module>s
?>