Design Principles for linelist

This vignette outlines the design decisions that have been taken during the development of the linelist R package, and provides some of the reasoning, and possible pros and cons of each decision.

This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.

Scope

linelist provides a lightweight layer to add tags to data.frame columns. This allows:

  • column identification without renaming
  • extra feature for the tagged columns, such as the ability to warn when a tagged column is dropped, or when its data type is incompatible with the expected one.

Input/Output/Interoperability

Because of its scope, linelist is intended to provide maximum compatibility with data.frames, or packages defining subclasses of data.frames.

We prefer not adding a new feature, rather than having this feature alter the usual behaviour of a data.frame.

One notable exception to this rule are data.table since they differ too much from the standard data.frame behaviour, which makes it difficult to ensure compatibility.

make_linelist() is the main user-facing function of this package. It takes a data.frame/tibble/X as input as returns an output of class c("linelist", "data.frame")/c("linelist", "tibble")/c("linelist", "X"). As a consequence, differences in behaviour between data.frame and tibble are still present after conversion to a linelist object.

Design decisions

  • Wherever possible, linelist should not provide its own method, but degrade gracefully and rely on the superclass method. This is to ensure that linelist is as compatible as possible with other packages and data types.

Dependencies

Because of its strong interoperability with the tidyverse packages, it is accepted for linelist to depend on low-level tidyverse or r-lib packages, such as rlang, vctrs or tidyselect.