|
:mod:`email` Package Architecture |
|
================================= |
|
|
|
Overview |
|
|
|
|
|
The email package consists of three major components: |
|
|
|
Model |
|
An object structure that represents an email message, and provides an |
|
API for creating, querying, and modifying a message. |
|
|
|
Parser |
|
Takes a sequence of characters or bytes and produces a model of the |
|
email message represented by those characters or bytes. |
|
|
|
Generator |
|
Takes a model and turns it into a sequence of characters or bytes. The |
|
sequence can either be intended for human consumption (a printable |
|
unicode string) or bytes suitable for transmission over the wire. In |
|
the latter case all data is properly encoded using the content transfer |
|
encodings specified by the relevant RFCs. |
|
|
|
Conceptually the package is organized around the model. The model provides both |
|
"external" APIs intended for use by application programs using the library, |
|
and "internal" APIs intended for use by the Parser and Generator components. |
|
This division is intentionally a bit fuzzy; the API described by this |
|
documentation is all a public, stable API. This allows for an application |
|
with special needs to implement its own parser and/or generator. |
|
|
|
In addition to the three major functional components, there is a third key |
|
component to the architecture: |
|
|
|
Policy |
|
An object that specifies various behavioral settings and carries |
|
implementations of various behavior-controlling methods. |
|
|
|
The Policy framework provides a simple and convenient way to control the |
|
behavior of the library, making it possible for the library to be used in a |
|
very flexible fashion while leveraging the common code required to parse, |
|
represent, and generate message-like objects. For example, in addition to the |
|
default :rfc:`5322` email message policy, we also have a policy that manages |
|
HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy |
|
controls, such as the maximum line length produced by the generator, can also |
|
be controlled individually to meet specialized application requirements. |
|
|
|
|
|
The Model |
|
|
|
|
|
The message model is implemented by the :class:`~email.message.Message` class. |
|
The model divides a message into the two fundamental parts discussed by the |
|
RFC: the header section and the body. The `Message` object acts as a |
|
pseudo-dictionary of named headers. Its dictionary interface provides |
|
convenient access to individual headers by name. However, all headers are kept |
|
internally in an ordered list, so that the information about the order of the |
|
headers in the original message is preserved. |
|
|
|
The `Message` object also has a `payload` that holds the body. A `payload` can |
|
be one of two things: data, or a list of `Message` objects. The latter is used |
|
to represent a multipart MIME message. Lists can be nested arbitrarily deeply |
|
in order to represent the message, with all terminal leaves having non-list |
|
data payloads. |
|
|
|
|
|
Message Lifecycle |
|
|
|
|
|
The general lifecycle of a message is: |
|
|
|
Creation |
|
A `Message` object can be created by a Parser, or it can be |
|
instantiated as an empty message by an application. |
|
|
|
Manipulation |
|
The application may examine one or more headers, and/or the |
|
payload, and it may modify one or more headers and/or |
|
the payload. This may be done on the top level `Message` |
|
object, or on any sub-object. |
|
|
|
Finalization |
|
The Model is converted into a unicode or binary stream, |
|
or the model is discarded. |
|
|
|
|
|
|
|
Header Policy Control During Lifecycle |
|
|
|
|
|
One of the major controls exerted by the Policy is the management of headers |
|
during the `Message` lifecycle. Most applications don't need to be aware of |
|
this. |
|
|
|
A header enters the model in one of two ways: via a Parser, or by being set to |
|
a specific value by an application program after the Model already exists. |
|
Similarly, a header exits the model in one of two ways: by being serialized by |
|
a Generator, or by being retrieved from a Model by an application program. The |
|
Policy object provides hooks for all four of these pathways. |
|
|
|
The model storage for headers is a list of (name, value) tuples. |
|
|
|
The Parser identifies headers during parsing, and passes them to the |
|
:meth:`~email.policy.Policy.header_source_parse` method of the Policy. The |
|
result of that method is the (name, value) tuple to be stored in the model. |
|
|
|
When an application program supplies a header value (for example, through the |
|
`Message` object `__setitem__` interface), the name and the value are passed to |
|
the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which |
|
returns the (name, value) tuple to be stored in the model. |
|
|
|
When an application program retrieves a header (through any of the dict or list |
|
interfaces of `Message`), the name and value are passed to the |
|
:meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to |
|
obtain the value returned to the application. |
|
|
|
When a Generator requests a header during serialization, the name and value are |
|
passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which |
|
returns a string containing line breaks in the appropriate places. The |
|
:meth:`~email.policy.Policy.cte_type` Policy control determines whether or |
|
not Content Transfer Encoding is performed on the data in the header. There is |
|
also a :meth:`~email.policy.Policy.binary_fold` method for use by generators |
|
that produce binary output, which returns the folded header as binary data, |
|
possibly folded at different places than the corresponding string would be. |
|
|
|
|
|
Handling Binary Data |
|
|
|
|
|
In an ideal world all message data would conform to the RFCs, meaning that the |
|
parser could decode the message into the idealized unicode message that the |
|
sender originally wrote. In the real world, the email package must also be |
|
able to deal with badly formatted messages, including messages containing |
|
non-ASCII characters that either have no indicated character set or are not |
|
valid characters in the indicated character set. |
|
|
|
Since email messages are *primarily* text data, and operations on message data |
|
are primarily text operations (except for binary payloads of course), the model |
|
stores all text data as unicode strings. Un-decodable binary inside text |
|
data is handled by using the `surrogateescape` error handler of the ASCII |
|
codec. As with the binary filenames the error handler was introduced to |
|
handle, this allows the email package to "carry" the binary data received |
|
during parsing along until the output stage, at which time it is regenerated |
|
in its original form. |
|
|
|
This carried binary data is almost entirely an implementation detail. The one |
|
place where it is visible in the API is in the "internal" API. A Parser must |
|
do the `surrogateescape` encoding of binary input data, and pass that data to |
|
the appropriate Policy method. The "internal" interface used by the Generator |
|
to access header values preserves the `surrogateescaped` bytes. All other |
|
interfaces convert the binary data either back into bytes or into a safe form |
|
(losing information in some cases). |
|
|
|
|
|
Backward Compatibility |
|
|
|
|
|
The :class:`~email.policy.Policy.Compat32` Policy provides backward |
|
compatibility with version 5.1 of the email package. It does this via the |
|
following implementation of the four+1 Policy methods described above: |
|
|
|
header_source_parse |
|
Splits the first line on the colon to obtain the name, discards any spaces |
|
after the colon, and joins the remainder of the line with all of the |
|
remaining lines, preserving the linesep characters to obtain the value. |
|
Trailing carriage return and/or linefeed characters are stripped from the |
|
resulting value string. |
|
|
|
header_store_parse |
|
Returns the name and value exactly as received from the application. |
|
|
|
header_fetch_parse |
|
If the value contains any `surrogateescaped` binary data, return the value |
|
as a :class:`~email.header.Header` object, using the character set |
|
`unknown-8bit`. Otherwise just returns the value. |
|
|
|
fold |
|
Uses :class:`~email.header.Header`'s folding to fold headers in the |
|
same way the email5.1 generator did. |
|
|
|
binary_fold |
|
Same as fold, but encodes to 'ascii'. |
|
|
|
|
|
New Algorithm |
|
|
|
|
|
header_source_parse |
|
Same as legacy behavior. |
|
|
|
header_store_parse |
|
Same as legacy behavior. |
|
|
|
header_fetch_parse |
|
If the value is already a header object, returns it. Otherwise, parses the |
|
value using the new parser, and returns the resulting object as the value. |
|
`surrogateescaped` bytes get turned into unicode unknown character code |
|
points. |
|
|
|
fold |
|
Uses the new header folding algorithm, respecting the policy settings. |
|
surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for |
|
``cte_type=7bit`` or ``8bit``. Returns a string. |
|
|
|
At some point there will also be a ``cte_type=unicode``, and for that |
|
policy fold will serialize the idealized unicode message with RFC-like |
|
folding, converting any surrogateescaped bytes into the unicode |
|
unknown character glyph. |
|
|
|
binary_fold |
|
Uses the new header folding algorithm, respecting the policy settings. |
|
surrogateescaped bytes are encoded using the `unknown-8bit` charset for |
|
``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. |
|
Returns bytes. |
|
|
|
At some point there will also be a ``cte_type=unicode``, and for that |
|
policy binary_fold will serialize the message according to :rfc:``5335``. |
|
|