anthology¶
Anthology ¶
Anthology(datadir, verbose=None)
An instance of the ACL Anthology data.
Attributes:
| Name | Type | Description |
|---|---|---|
datadir |
The path to the data folder. |
|
verbose |
Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal. |
collections
instance-attribute
¶
collections = CollectionIndex(self)
The CollectionIndex for accessing collections, volumes, and papers.
people
instance-attribute
¶
people = PersonIndex(self)
The PersonIndex for accessing authors and editors.
create_collection ¶
create_collection(id)
Create a new Collection object.
Alias for CollectionIndex.create().
find_people ¶
find_people(name_def)
Find people by name.
See also: get_person() to find a person by their ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name_def
|
ConvertableIntoName
|
Anything that can be resolved to a name; see below for examples. |
required |
Returns:
| Type | Description |
|---|---|
list[Person]
|
A list of |
Examples:
>>> anthology.find_people("Doe, Jane") # all of these are identical
>>> anthology.find_people(("Jane", "Doe"))
>>> anthology.find_people({"first": "Jane", "last": "Doe"})
>>> anthology.find_people(Name("Jane", "Doe"))
from_repo
classmethod
¶
from_repo(
repo_url="https://github.com/acl-org/acl-anthology.git",
path=None,
verbose=None,
)
Instantiates the Anthology from a Git repo.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_url
|
str
|
The URL of a Git repo with Anthology data. If not given, defaults to the official ACL Anthology repo. |
'https://github.com/acl-org/acl-anthology.git'
|
path
|
Optional[StrPath]
|
The local path for the repo data. If not given, automatically determines a path within the user's data directory. |
None
|
verbose
|
Optional[bool]
|
Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal. |
None
|
Note
If no explicit path is supplied, attempts to call any save functions on the returned objects will throw a warning, since this is likely a user error.
from_within_repo
classmethod
¶
from_within_repo(verbose=None)
Instantiates the Anthology from within its own Git repo, using the repo's main data folder.
Assumes that you have cloned the acl-org/acl-anthology repo and run a script that imports this library from within the repo.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verbose
|
Optional[bool]
|
Whether or not to show progress bars during longer operations. If this argument is not supplied explicitly, it will default to True if the standard output is a terminal. |
None
|
Raises:
| Type | Description |
|---|---|
InvalidGitRepositoryError
|
If this module is not within a Git repository, e.g. if it was pip-installed. |
get ¶
get(full_id)
Access collections, volumes, and papers, depending on the provided ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_id
|
AnthologyID
|
An Anthology ID that refers to a collection, volume, or paper. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Collection | Volume | Paper]
|
The object corresponding to the given ID. |
get_collection ¶
get_collection(full_id)
Access a collection by its ID or the ID of a contained volume or paper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_id
|
AnthologyID
|
An Anthology ID. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Collection]
|
The collection associated with the given ID. |
get_event ¶
get_event(event_id)
get_paper ¶
get_paper(full_id)
Access a paper by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_id
|
AnthologyID
|
An Anthology ID that refers to a paper. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Paper]
|
The paper associated with the given ID. |
get_paper_by_bibkey ¶
get_paper_by_bibkey(bibkey)
get_person ¶
get_person(person_id)
Access a person by their ID.
See also: find_people() to find a person by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
person_id
|
str
|
An ID that refers to a person. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Person]
|
The person associated with the given ID. |
get_volume ¶
get_volume(full_id)
Access a volume by its ID or the ID of a contained paper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_id
|
AnthologyID
|
An Anthology ID that refers to a volume or paper. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Volume]
|
The volume associated with the given ID. |
load_all ¶
load_all()
Load all Anthology data files.
Calling this function is not strictly necessary. If you access Anthology data through object methods or SlottedDict functionality, data will be loaded on-the-fly as required. However, if you know that your program will load all data files (particularly the XML files) eventually, for example by iterating over all volumes/papers, loading everything at once with this function can result in a considerable speed-up.
Important
Exceptions raised during the index creation are sent to the logger, and only a generic exception is raised at the end.
papers ¶
papers(full_id=None)
Returns an iterator over all papers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_id
|
Optional[AnthologyID]
|
If provided, only papers matching the given ID will be included. |
None
|
reset_indices ¶
reset_indices()
Reset all non-collection indices.
Note
- Calling this function should normally not be necessary, as indices (and their child objects) update automatically when making changes.
- Any modifications to data stored directly by the indices (i.e. stored in the YAML files, rather than inferred from the XML) need to be saved before calling this, or they will be lost.
- This will not update any Event, Person, or Venue objects you may have already obtained, but any objects returned by an index after the reset will reflect the new data.
resolve ¶
resolve(name_spec: NameSpecification) -> Person
resolve(
name_spec: Iterator[NameSpecification],
) -> list[Person]
resolve(name_spec)
Resolve a name specification (e.g. as attached to papers) to a natural person.
Warning
Deprecated in favor of NameSpecification.resolve(); alternatively, PersonIndex.get_by_namespec() if you want to see what a hypothetical NameSpecification would resolve to that is not yet attached to a paper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name_spec
|
NameSpecificationOrIter
|
A name specification, or an iterator over name specifications. |
required |
Returns:
| Type | Description |
|---|---|
PersonOrList
|
A single Person object if a single name specification was given, or a list of Person objects with equal length to the input iterable otherwise. |
Examples:
>>> paper = anthology.get("C92-1025")
>>> anthology.resolve(paper.authors)
[Person(id='lauri-karttunen', ...), Person(id='ronald-kaplan', ...), Person(id='annie-zaenen', ...)]