Skip to content

utils.xml

Warning

These are internal functions that you probably don't want to interact with directly.

Functions for XML serialization.

TAGS_WITHOUT_LINEBREAKS module-attribute

TAGS_WITHOUT_LINEBREAKS = {
    "author",
    "editor",
    "speaker",
    "title",
    "booktitle",
    "variant",
}

XML tags that should always be serialized without line breaks.

TAGS_WITH_MARKUP module-attribute

TAGS_WITH_MARKUP = {
    "b",
    "i",
    "fixed-case",
    "title",
    "abstract",
    "booktitle",
    "shortbooktitle",
}

XML tags which contain MarkupText.

TAGS_WITH_ORDER_SEMANTICS module-attribute

TAGS_WITH_ORDER_SEMANTICS = {
    "author",
    "editor",
    "speaker",
    "erratum",
    "revision",
    "talk",
    "venue",
    "attachment",
    "award",
    "video",
}

XML tags that may appear multiple times per parent tag, and whose relative order matters even if their parent tag belongs to TAGS_WITH_UNORDERED_CHILDREN.

TAGS_WITH_UNORDERED_CHILDREN module-attribute

TAGS_WITH_UNORDERED_CHILDREN = {
    "talk",
    "paper",
    "meta",
    "frontmatter",
    "event",
    "colocated",
    "author",
    "editor",
    "speaker",
    "variant",
}

XML tags that may contain child elements that can logically appear in arbitrary order.

append_text

append_text(elem, text)

Append text to an XML element.

If the XML element has children, the text will be appended to the tail of the last child; otherwise, it will be appended to its text attribute.

Parameters:

Name Type Description Default
elem _Element

The XML element.

required
text str

The text string to append to the XML element.

required

Returns:

Type Description
None

None; the XML element is modified in-place.

assert_equals

assert_equals(elem, other)

Assert that two Anthology XML elements are logically equivalent.

Parameters:

Name Type Description Default
elem _Element

The first element to compare.

required
other _Element

The second element to compare.

required

Raises:

Type Description
AssertionError

If the two elements are not logically equivalent.

ensure_minimal_diff

ensure_minimal_diff(elem, reference)

Change a node to minimize the diff compared to a reference node, without changing logical equivalence.

This will change the order of nodes and attributes to match the order in the reference whenever this makes no functional difference. Elements that are logically equivalent to those in the reference will be copied exactly.

Parameters:

Name Type Description Default
elem _Element

The XML element whose children should be matches.

required
reference _Element

The XML element that serves as a reference.

required

Raises:

Type Description
ValueError

If elem and reference do not have identical tags.

indent

indent(elem, level=0, internal=False)

Enforce canonical indentation.

"Canonical indentation" is two spaces, with each tag on a new line, except that 'author', 'editor', 'title', and 'booktitle' tags are placed on a single line.

Parameters:

Name Type Description Default
elem _Element

The XML element to apply canonical indentation to.

required
level int

Indentation level; used for recursive calls of this function.

0
internal bool

If True, assume we are within a single-line element.

False
Note

Adapted from https://stackoverflow.com/a/33956544.

stringify_children

stringify_children(node)

Parameters:

Name Type Description Default
node _Element

An XML element.

required

Returns:

Type Description
str

The full content of the input node, including tags.

Used for nodes that can have mixed text and HTML elements (like <b> and <i>).

xml_escape_or_none

xml_escape_or_none(t)

Like xml.sax.saxutils.escape, but accepts None.

xsd_boolean

xsd_boolean(value)

Converts an xsd:boolean value to a bool.