Title: | Parse and Deduplicate Author Names |
---|---|
Description: | Utilities to parse authors fields from DESCRIPTION files and general purpose functions to deduplicate names in database, beyond the specific case of R package authors. |
Authors: | Hugo Gruson [aut, cre, cph] |
Maintainer: | Hugo Gruson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2025-02-04 09:31:26 UTC |
Source: | https://github.com/epiverse-connect/authoritative |
A data.frame of historical metadata from CRAN packages epidemiology.
cran_epidemiology_packages
cran_epidemiology_packages
A data.frame with 5 variables:
package name
package version
authors as listed in the Authors@R
field from the
DESCRIPTION
file
authors as listed in the Author
field from the
DESCRIPTION
file
package maintainer
Expand names from abbreviated forms or initials
expand_names(short, expanded)
expand_names(short, expanded)
short |
A character vector of potentially abbreviated names |
expanded |
A character vector of potentially expanded names |
When you have a list x
of abbreviated and non-abbreviated names and you want
to deduplicate them, this function can be used as expand_names(x, x)
, which
will return the most expanded version available in x
for each name
A character vector with the same length as short
expand_names( c("W A Mozart", "Wolfgang Mozart", "Wolfgang A Mozart"), "Wolfgang Amadeus Mozart" ) # Real-case application example # Deduplicate names in list, as described in "details" epi_pkg_authors <- cran_epidemiology_packages |> subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |> parse_authors_r() |> # Drop email, role, ORCID and format as string rather than person object lapply(function(x) format(x, include = c("given", "family"))) |> unlist() # With all duplicates length(unique(epi_pkg_authors)) # Deduplicate epi_pkg_authors_normalized <- expand_names(epi_pkg_authors, epi_pkg_authors) length(unique(epi_pkg_authors_normalized))
expand_names( c("W A Mozart", "Wolfgang Mozart", "Wolfgang A Mozart"), "Wolfgang Amadeus Mozart" ) # Real-case application example # Deduplicate names in list, as described in "details" epi_pkg_authors <- cran_epidemiology_packages |> subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |> parse_authors_r() |> # Drop email, role, ORCID and format as string rather than person object lapply(function(x) format(x, include = c("given", "family"))) |> unlist() # With all duplicates length(unique(epi_pkg_authors)) # Deduplicate epi_pkg_authors_normalized <- expand_names(epi_pkg_authors, epi_pkg_authors) length(unique(epi_pkg_authors_normalized))
Author
field from a DESCRIPTION fileParse the Author
field from a DESCRIPTION file into a person
object
parse_authors(author_string)
parse_authors(author_string)
author_string |
A character containing the |
A character vector, or a list of character vectors of length equals
to the length of author_string
# Read from a DESCRIPTION file directly utils_description <- system.file("DESCRIPTION", package = "utils") utils_authors <- read.dcf(utils_description, "Author") parse_authors(utils_authors) # Read from a database of CRAN metadata cran_epidemiology_packages$Author |> parse_authors() |> unlist() |> unique() |> sort()
# Read from a DESCRIPTION file directly utils_description <- system.file("DESCRIPTION", package = "utils") utils_authors <- read.dcf(utils_description, "Author") parse_authors(utils_authors) # Read from a database of CRAN metadata cran_epidemiology_packages$Author |> parse_authors() |> unlist() |> unique() |> sort()
Authors@R
field from a DESCRIPTION fileParse the Authors@R
field from a DESCRIPTION file into a person
object
parse_authors_r(authors_r_string)
parse_authors_r(authors_r_string)
authors_r_string |
A character containing the |
A person
object, or a list
of person
objects of length equals
to the length of authors_r_string
# Read from a DESCRIPTION file directly pkg_description <- system.file("DESCRIPTION", package = "authoritative") authors_r_pkg <- read.dcf(pkg_description, "Authors@R") parse_authors_r(authors_r_pkg) # Read from a database of CRAN metadata cran_epidemiology_packages |> subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |> parse_authors_r() |> head()
# Read from a DESCRIPTION file directly pkg_description <- system.file("DESCRIPTION", package = "authoritative") authors_r_pkg <- read.dcf(pkg_description, "Authors@R") parse_authors_r(authors_r_pkg) # Read from a database of CRAN metadata cran_epidemiology_packages |> subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |> parse_authors_r() |> head()