text¶
Classes and functions for text markup manipulation.
MarkupText ¶
Text with optional markup.
Warning
This class should not be instantiated directly. Use its class method constructors instead.
Example
title = MarkupText.from_("A Structured Review of the Validity of BLEU")
title = MarkupText.from_("TTCS$^{\mathcal{E}}$: a Vectorial Resource for Computing Conceptual Similarity")
Note
This class implements a limited subset of string methods to make common operations more convenient, for example:
title = MarkupText.from_string("A Structured Review of the Validity of BLEU")
title == "A Structured Review of the Validity of BLEU" # True
"BLEU" in title # True
title.startswith("A ") # True
These operate on the stringified XML representation of the class.
contains_markup
property
¶
contains_markup
True if this text contains markup; False if it is a plain string.
as_html ¶
as_html(allow_url=True)
as_latex ¶
as_latex()
Returns:
| Type | Description |
|---|---|
str
|
Text with markup transformed into LaTeX commands. |
as_text ¶
as_text()
Returns:
| Type | Description |
|---|---|
str
|
The plain text with any markup stripped. The only transformation that will be performed is replacing TeX-math expressions with their corresponding Unicode representation, if possible. |
endswith ¶
endswith(suffix, start=None, end=None)
Return True if the string ends with the specified suffix, False otherwise.
Equivalent to self.as_xml().endswith(...).
from_
classmethod
¶
from_(content)
Instantiate MarkupText from an XML element or a string, heuristically parsing any supported markup.
- If called with an XML element, assumes it uses the Anthology's markup format and calls
from_xml(). - If called with a string, assumes the string might contain markup and will try to intelligently parse it. At the moment, only LaTeX markup is supported, which means that the effect of this is identical to calling
from_latex_maybe(), but this may change if we support different types of markup in the future.
Note
If you want more fine-grained control over markup detection, call one of the more specific builder functions instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content
|
_Element | str
|
A string potentially containing markup, or an XML element containing valid MarkupText according to the schema. |
required |
Returns:
| Type | Description |
|---|---|
MarkupText
|
Instantiated MarkupText object corresponding to the content. |
from_latex
classmethod
¶
from_latex(text, clean=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
A text string potentially containing LaTeX markup. |
required |
clean
|
bool
|
If True, applies the Anthology's Unicode normalization. |
True
|
Returns:
| Type | Description |
|---|---|
MarkupText
|
Instantiated MarkupText object corresponding to the string. |
from_latex_maybe
classmethod
¶
from_latex_maybe(text, clean=True)
Like from_latex(), but can be used if it is unclear if the string is plain text or LaTeX. Will prevent percentage signs being interpreted as LaTeX comments, and apply a heuristic to decide if a tilde is literal or a non-breaking space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
A text string potentially in plain text or LaTeX format. |
required |
clean
|
bool
|
If True, applies the Anthology's Unicode normalization. |
True
|
Returns:
| Type | Description |
|---|---|
MarkupText
|
Instantiated MarkupText object corresponding to the string. |
from_string
classmethod
¶
from_string(text, clean=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
A simple text string without any markup. |
required |
clean
|
bool
|
If True, applies the Anthology's Unicode normalization. |
True
|
Returns:
| Type | Description |
|---|---|
MarkupText
|
Instantiated MarkupText object corresponding to the string. |
from_xml
classmethod
¶
from_xml(element)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
_Element
|
An XML element containing valid MarkupText according to the schema. |
required |
Returns:
| Type | Description |
|---|---|
MarkupText
|
Instantiated MarkupText object corresponding to the element. |
startswith ¶
startswith(prefix, start=None, end=None)
Return True if the string starts with the specified prefix, False otherwise.
Equivalent to self.as_xml().startswith(...).