Skip to content

anthology

Anthology

Anthology(datadir, verbose=None)

An instance of the ACL Anthology data.

Attributes:

Name Type Description
datadir

The path to the data folder.

verbose

Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal.

collections instance-attribute

collections = CollectionIndex(self)

The CollectionIndex for accessing collections, volumes, and papers.

events instance-attribute

events = EventIndex(self)

The EventIndex for accessing events.

people instance-attribute

people = PersonIndex(self)

The PersonIndex for accessing authors and editors.

relaxng property

relaxng

The RelaxNG schema for the Anthology's XML data files.

sigs instance-attribute

sigs = SIGIndex(self)

The SIGIndex for accessing SIGs.

venues instance-attribute

venues = VenueIndex(self)

The VenueIndex for accessing venues.

create_collection

create_collection(id)

Create a new Collection object.

Alias for CollectionIndex.create().

find_people

find_people(name_def)

Find people by name.

See also: get_person() to find a person by their ID.

Parameters:

Name Type Description Default
name_def ConvertableIntoName

Anything that can be resolved to a name; see below for examples.

required

Returns:

Type Description
list[Person]

A list of Person objects with the given name.

Examples:

>>> anthology.find_people("Doe, Jane")       # all of these are identical
>>> anthology.find_people(("Jane", "Doe"))
>>> anthology.find_people({"first": "Jane", "last": "Doe"})
>>> anthology.find_people(Name("Jane", "Doe"))

from_repo classmethod

from_repo(
    repo_url="https://github.com/acl-org/acl-anthology.git",
    path=None,
    verbose=None,
)

Instantiates the Anthology from a Git repo.

Parameters:

Name Type Description Default
repo_url str

The URL of a Git repo with Anthology data. If not given, defaults to the official ACL Anthology repo.

'https://github.com/acl-org/acl-anthology.git'
path Optional[StrPath]

The local path for the repo data. If not given, automatically determines a path within the user's data directory.

None
verbose Optional[bool]

Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal.

None
Note

If no explicit path is supplied, attempts to call any save functions on the returned objects will throw a warning, since this is likely a user error.

from_within_repo classmethod

from_within_repo(verbose=None)

Instantiates the Anthology from within its own Git repo, using the repo's main data folder.

Assumes that you have cloned the acl-org/acl-anthology repo and run a script that imports this library from within the repo.

Parameters:

Name Type Description Default
verbose Optional[bool]

Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal.

None

Raises:

Type Description
InvalidGitRepositoryError

If this module is not within a Git repository, e.g. if it was pip-installed.

get

get(full_id)

Access collections, volumes, and papers, depending on the provided ID.

Parameters:

Name Type Description Default
full_id AnthologyID

An Anthology ID that refers to a collection, volume, or paper.

required

Returns:

Type Description
Optional[Collection | Volume | Paper]

The object corresponding to the given ID.

get_collection

get_collection(full_id)

Access a collection by its ID or the ID of a contained volume or paper.

Parameters:

Name Type Description Default
full_id AnthologyID

An Anthology ID.

required

Returns:

Type Description
Optional[Collection]

The collection associated with the given ID.

get_event

get_event(event_id)

Access an event by its ID.

Parameters:

Name Type Description Default
event_id str

An ID that refers to an event, e.g. "acl-2022".

required

Returns:

Type Description
Optional[Event]

The event associated with the given ID.

get_paper

get_paper(full_id)

Access a paper by its ID.

Parameters:

Name Type Description Default
full_id AnthologyID

An Anthology ID that refers to a paper.

required

Returns:

Type Description
Optional[Paper]

The paper associated with the given ID.

get_paper_by_bibkey

get_paper_by_bibkey(bibkey)

Access a paper by its citation key/bibkey.

Parameters:

Name Type Description Default
bibkey str

A bibkey belonging to an Anthology paper, e.g. 'devlin-etal-2019-bert'.

required

Returns:

Type Description
Optional[Paper]

The paper associated with the given bibkey.

get_person

get_person(person_id)

Access a person by their ID.

See also: find_people() to find a person by name.

Parameters:

Name Type Description Default
person_id str

An ID that refers to a person.

required

Returns:

Type Description
Optional[Person]

The person associated with the given ID.

get_volume

get_volume(full_id)

Access a volume by its ID or the ID of a contained paper.

Parameters:

Name Type Description Default
full_id AnthologyID

An Anthology ID that refers to a volume or paper.

required

Returns:

Type Description
Optional[Volume]

The volume associated with the given ID.

load_all

load_all()

Load all Anthology data files.

Calling this function is not strictly necessary. If you access Anthology data through object methods or SlottedDict functionality, data will be loaded on-the-fly as required. However, if you know that your program will load all data files (particularly the XML files) eventually, for example by iterating over all volumes/papers, loading everything at once with this function can result in a considerable speed-up.

Important

Exceptions raised during the index creation are sent to the logger, and only a generic exception is raised at the end.

papers

papers(full_id=None)

Returns an iterator over all papers.

Parameters:

Name Type Description Default
full_id Optional[AnthologyID]

If provided, only papers matching the given ID will be included.

None

reset_indices

reset_indices()

Reset all non-collection indices.

Note
  • Calling this function should normally not be necessary, as indices (and their child objects) update automatically when making changes.
  • Any modifications to data stored directly by the indices (i.e. stored in the YAML files, rather than inferred from the XML) need to be saved before calling this, or they will be lost.
  • This will not update any Event, Person, or Venue objects you may have already obtained, but any objects returned by an index after the reset will reflect the new data.

resolve

resolve(name_spec: NameSpecification) -> Person
resolve(
    name_spec: Iterator[NameSpecification],
) -> list[Person]
resolve(name_spec)

Resolve a name specification (e.g. as attached to papers) to a natural person.

Warning

Deprecated in favor of NameSpecification.resolve(); alternatively, PersonIndex.get_by_namespec() if you want to see what a hypothetical NameSpecification would resolve to that is not yet attached to a paper.

Parameters:

Name Type Description Default
name_spec NameSpecificationOrIter

A name specification, or an iterator over name specifications.

required

Returns:

Type Description
PersonOrList

A single Person object if a single name specification was given, or a list of Person objects with equal length to the input iterable otherwise.

Examples:

>>> paper = anthology.get("C92-1025")
>>> anthology.resolve(paper.authors)
[Person(id='lauri-karttunen', ...), Person(id='ronald-kaplan', ...), Person(id='annie-zaenen', ...)]

save_all

save_all()

Save all Anthology data files.

volumes

volumes(collection_id=None)

Returns an iterator over all volumes.

Parameters:

Name Type Description Default
collection_id Optional[str]

If provided, only volumes belonging to the given collection ID will be included.

None