Skip to content

collections.collection

Collection

Bases: SlottedDict[Volume]

A collection of volumes and events, corresponding to an XML file in the data/xml/ directory of the Anthology repo.

Provides dictionary-like functionality mapping volume IDs to Volume objects in the collection.

Info

To create a new collection, use CollectionIndex.create().

Required Attributes:

Name Type Description
id str

The ID of this collection (e.g. "L06" or "2022.emnlp").

parent CollectionIndex

The parent CollectionIndex instance to which this collection belongs.

path Path

The path of the XML file representing this collection.

Non-Init Attributes:

Name Type Description
event Optional[Event]

An event represented by this collection.

is_data_loaded bool

A flag indicating whether the XML file has already been loaded.

is_modified bool

A flag indicating whether any of the data in this collection has been modified after loading.

raised_error_on_load bool

A flag indicating whether loading the XML file raised an error. If True, calling load() again won't do anything. This is to prevent the same exceptions being triggered over and over again by other functions accessing this collection (and thereby triggering a load) multiple times.

root property

root

The Anthology instance to which this object belongs.

create_event

create_event(id=None, **kwargs)

Create a new (explicit) Event object in this collection.

Parameters:

Name Type Description Default
id Optional[str]

The ID of the event; must follow RE_EVENT_ID. If None (default), and this collection has a new-style ID, will generate an event ID based on this (e.g., collection "2022.emnlp" will generate event "emnlp-2022").

None
**kwargs Any

Any valid list or optional attribute of Event.

{}

Returns:

Type Description
Event

The created Event object.

Raises:

Type Description
ValueError

If an explicitly defined event already exists in this collection, or if id was None and this collection has an old-style ID.

Danger

If the event index is loaded and an event with the given ID is already implicitly defined, the newly created event will replace that one, but will inherit its co-located IDs. It is currently not possible to explicitly create an event without also explicitly linking all co-located item IDs to it, but for performance reasons (this linking needs to load the entire Anthology data), it only happens when the event index is loaded. This means that e.g. entirely new proceedings can be created without the performance impact of loading everything, but for adding new events to existing proceedings, the event index should probably be loaded first.

create_volume

create_volume(
    id, title, year=None, type=PROCEEDINGS, **kwargs
)

Create a new Volume object in this collection.

Parameters:

Name Type Description Default
id str

The ID of the new volume.

required
title MarkupText | str

The title of the new volume. If given as a string, it will be heuristically parsed for markup.

required
year Optional[str]

The year of the new volume (optional); if None, will infer the year from this collection's ID.

None
type VolumeType

Whether this is a journal or proceedings volume; defaults to VolumeType.PROCEEDINGS.

PROCEEDINGS
**kwargs Any

Any valid list or optional attribute of Volume.

{}

Returns:

Type Description
Volume

The created Volume object.

Raises:

Type Description
AnthologyDuplicateIDError

If a volume with the given ID already exists.

ValueError

If this collection has an old-style ID.

get_event

get_event()

An Event explicitly defined in this collection, if any.

load

load()

Loads the XML file belonging to this collection.

papers

papers()

An iterator over all Paper objects in all volumes in this collection.

save

save(path=None, minimal_diff=True)

Saves this collection as an XML file.

Parameters:

Name Type Description Default
path Optional[StrPath]

The filename to save to. If None, defaults to self.path.

None
minimal_diff bool

If True (default), will compare against an existing XML file in self.path to minimize the difference, i.e., to prevent noise from changes in the XML that make no semantic difference. See utils.xml.ensure_minimal_diff for details.

True

validate_schema

validate_schema()

Validates the XML file belonging to this collection against the RelaxNG schema.

Raises:

Type Description
DocumentInvalid

If the XML file does not validate against the schema.

volumes

volumes()

An iterator over all Volume objects in this collection.