In This Vignette

  • Introduction
  • Pivot Table
  • Base Data
  • Data Groups
  • Calculation Definitions
  • Cells
  • Filter Classes
  • Calculator Classes
  • Styling Classes
  • Rendering Classes
  • Summary
  • Further Reading

Introduction

The pivottabler package is comprised of R6 classes. This section provides a short description of the main classes, their function and relationships in the package.

Pivot Table

The PivotTable class represents a single pivot table. In all of the example code, the instance of the pivot table is represented by the pt variable, on which various functions/methods are invoked e.g. pivot tables are created by:

pt <- PivotTable$new()
pt$addData(bhmtrains)
...

Base Data

The PivotData class contains the references to the source data (i.e. data frames) used to build a pivot table and is accessible via pt$data.

A pivot table can be built from more than one data frame, though this is typically only viable if the variable names and values in the data frames are consistent - e.g. if a given variable is present in multiple data frames, it should have the same name in all of them.

Data Groups

A pivot table has a set of column headings and a set of row headings, each of which is an instance of the PivotDataGroup class. A data group can (and normally does) have child groups.

The data groups exist in a hierarchy. A single invisible data group pt$rowGroup acts as the top-level parent for the row groups, and another single invisible data group pt$columnGroup acts as the top-level parent for the column groups.

The first level of visible data groups headings can be accessed via pt$topRowGroups or pt$topColumnGroups. The leaf level data groups can be accessed using pt$leafRowGroups or pt$leafColumnGroups.

The parent of a given PivotDataGroup can be accessed using the parentGroup property.

The children for a given PivotDataGroup can be accessed using the childGroups property, e.g. the child data groups of a data group in the top level level can then be accessed via pt$topRowGroups[[1]]$childGroups.

Calling pt$addColumnDataGroups(...) or pt$addRowDataGroups(...) generates a new level of child data groups. Each child data group is also an instance of the PivotDataGroup.

The display value of a data group is accessible via the caption property, e.g. pt$topRowGroups[[1]]$caption.

See the Data Groups vignette for more details.

Calculation Definitions

A pivot table normally has one or more calculations. Each calculation definition is an instance of the PivotCalculation class. Calculation definitions are grouped together by a PivotCalculationGroup, of which there is normally only a single default group in a pivot table. If this calculation group contains three calculation definitions, then in the populated pivot table there will be three cells underneath each data group (one for each calculation), e.g.

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains",
                     summariseExpression="n()")
pt$defineCalculation(calculationName="MinimumpeedMPH", caption="Minimum Speed (MPH)",
                     summariseExpression="min(SchedSpeedMPH, na.rm=TRUE)")
pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)",
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
pt$renderPivot()

It is possible (though not typical) for a pivot table to contain multiple calculation groups. See the Irregular Layout vignette for an example.

pt$calculationGroups contains the calculation groups present in a pivot table. The calculation definitions within the default group are accessed via pt$calculationGroups$defaultGroup$calculations.

see the Calculations vignette for more details.

Cells

A pivot table contains a set of cells represented by the PivotCells class. The cells of a pivot table are accessed via pt$cells. The dimensions of a pivot table (excluding data group headings) can be checked with pt$rowCount and pt$columnCount.

Individual cells are represented by the PivotCell class. The easiest way to access individual cells (i.e. individual PivotCell objects) is pt$cells$getCell(r, c).

Within the PivotCells object is a list of rows, each of which is a list of PivotCell objects. So a more direct (but less-safe) way to access individual cells is pt$cells$rows[[r]][[c]].

Each cell has a raw value (typically of data type numeric) and a formatted value (typically of data type character). These can be accessed via cell$rawValue and cell$formattedValue respectively.

It is possible to find the leaf-level data groups (i.e. right-most row heading and bottom column heading) that relate to a cell using cell$rowLeafGroup and cell$columnLeafGroup. If the data group has other parent groups, then these can be accessed recursively via cell$rowLeafGroup, cell$rowLeafGroup$parentGroup, cell$rowLeafGroup$parentGroup$parentGroup, etc.

The Finding and Formatting vignette describes other ways of accessing cells and data group headings.

Filter Classes

Each data group typically acts as a filter for the data in that row/column of the pivot table. E.g. a heading of “France” typically implies “Country=France”. This filter condition is represented by the PivotFilter class. Every cell in a pivot table has a set of filters (one filter from each row/column heading) and these filters are represented by the PivotFilters class. See the Cell Context vignette for more details.

The PivotFilterOverrides class provides a mechanism for individual calculation definitions to override the default filters associated with a cell. See the Calculations and Appendix: Calculations vignettes for more details.

Calculator Classes

The PivotCalculator class provides much of the functionality for calculating the values in a pivot table. Users of the pivottabler package typically do not directly interact with this internal class.

The PivotBatch, PivotBatchCalculator and PivotBatchStatistics classes provide functionality for calculating the values of batches of cells in one/a small number of dplyr/data.table calculations. These classes are also internal.

Styling Classes

The PivotStyle class represents a list of style declarations in the form of name-value pairs (similar to a CSS style declaration) that would be applied to a single data group or cell in the pivot table (or sets of data groups / cells).

The PivotStyles class is a set of styles that are applied to different parts of a pivot table (headings, cells, totals, etc). The PivotStyles class can be thought of as a theme applied to a pivot table.

The PivotOpenXlsxStyle and PivotOpenXlsxStyles classes are similar, except these are specific to Excel export.

See the Styling vignette for more details.

Rendering Classes

The PivotHtmlRenderer, PivotLatexRenderer and PivotOpenXlsxRenderer classes provide rendering logic for each of the formats that a pivot table can be output in.

See the Outputs vignette for more details.

Summary

  • The PivotTable class represents a single pivot table, typically named pt in the package examples.
  • pt$data is an instance of the PivotData class that wraps the data frames used to build the pivot table.
  • The data groups that comprise the row/column headings exist in a hierarchy.
  • pt$rowGroup and pt$columnGroup are the invisible top-level data groups on each axis - each is an instance of the PivotDataGroup class.
  • pt$topRowGroups and pt$topColumnGroups provide the direct way to access the first visible level of row or column data groups in a pivot table.
  • pt$leafRowGroups and pt$leafColumnGroups provide the direct way to access the lowest level of row or column data groups in a pivot table.
  • The parent of a given PivotDataGroup can be accessed using the parentGroup property.
  • Child data groups can be accessed recursively, e.g. pt$topRowGroups[[1]]$childGroups[[1]]$childGroups, etc.
  • pt$calculationGroups contains the calculation groups present in a pivot table. The calculation definitions within the default calculation group are accessed via pt$calculationGroups$defaultGroup$calculations.
  • Check pivot table size using pt$rowCount and pt$columnCount.
  • Individual cells in a pivot table are represented by the PivotCell class and can be accessed via pt$cells$getCell(r, c).
  • The data groups related to a cell can be accessed recursively via cell$rowLeafGroup, cell$rowLeafGroup$parentGroup, cell$rowLeafGroup$parentGroup$parentGroup, etc.