Your IP : 3.137.177.204
B
<_9Y�0 � @ s� d Z ddlZddlZddlZddlmZ ddlmZmZm Z ddl
mZ ddlm
Z
ddlmZ dd lmZ G d
d� de�ZdS )a
Module containing the UniversalDetector detector class, which is the primary
class a user of ``chardet`` should use.
:author: Mark Pilgrim (initial port to Python)
:author: Shy Shalom (original C code)
:author: Dan Blanchard (major refactoring for 3.0)
:author: Ian Cordasco
� N� )�CharSetGroupProber)�
InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc @ sn e Zd ZdZdZe�d�Ze�d�Ze�d�Z dddd d
ddd
d�Z
ejfdd�Z
dd� Zdd� Zdd� ZdS )�UniversalDetectoraq
The ``UniversalDetector`` class underlies the ``chardet.detect`` function
and coordinates all of the different charset probers.
To get a ``dict`` containing an encoding and its confidence, you can simply
run:
.. code::
u = UniversalDetector()
u.feed(some_bytes)
u.close()
detected = u.result
g�������?s [�-�]s (|~{)s [�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z
iso-8859-1z
iso-8859-2z
iso-8859-5z
iso-8859-6z
iso-8859-7z
iso-8859-8z
iso-8859-9ziso-8859-13c C sN d | _ g | _d | _d | _d | _d | _d | _|| _t� t
�| _d | _| �
� d S )N)�_esc_charset_prober�_charset_probers�result�done� _got_data�_input_state�
_last_char�lang_filter�logging� getLogger�__name__�logger�_has_win_bytes�reset)�selfr � r �J/opt/alt/python37/lib/python3.7/site-packages/chardet/universaldetector.py�__init__Q s zUniversalDetector.__init__c C sZ dddd�| _ d| _d| _d| _tj| _d| _| jr>| j� � x| j
D ]}|� � qFW dS )z�
Reset the UniversalDetector and all of its probers back to their
initial states. This is called by ``__init__``, so you only need to
call this directly in between analyses of different documents.
Ng )�encoding�
confidence�languageF� )r r r r r �
PURE_ASCIIr r r r r
)r �proberr r r r ^ s
zUniversalDetector.resetc C s> | j r
dS t|�sdS t|t�s(t|�}| js�|�tj�rJdddd�| _nv|�tj tj
f�rldddd�| _nT|�d�r�dddd�| _n:|�d �r�d
ddd�| _n |�tjtjf�r�dddd�| _d| _| jd
dk r�d| _ dS | j
tjk�r.| j�|��rtj| _
n*| j
tjk�r.| j�| j| ��r.tj| _
|dd� | _| j
tjk�r�| j�s^t| j�| _| j�|�tjk�r:| jj| j�� | jjd�| _d| _ n�| j
tjk�r:| j�s�t | j�g| _| jt!j"@ �r�| j�#t$� � | j�#t%� � x@| jD ]6}|�|�tjk�r�|j|�� |jd�| _d| _ P �q�W | j&�|��r:d| _'dS )a�
Takes a chunk of a document and feeds it through all of the relevant
charset probers.
After calling ``feed``, you can check the value of the ``done``
attribute to see if you need to continue feeding the
``UniversalDetector`` more data, or if it has made a prediction
(in the ``result`` attribute).
.. note::
You should always call ``close`` when you're done feeding in your
document if ``done`` is not already ``True``.
Nz UTF-8-SIGg �?� )r r r zUTF-32s �� zX-ISO-10646-UCS-4-3412s ��zX-ISO-10646-UCS-4-2143zUTF-16Tr ���)(r �len�
isinstance� bytearrayr �
startswith�codecs�BOM_UTF8r �BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr r r"