Skip to contents

Downloads and returns vaccination data from SI-PNI. For years 1994–2019, data is downloaded from DATASUS FTP (aggregated doses/coverage). For years 2020+, data is downloaded from OpenDataSUS as monthly CSV bulk files (individual-level microdata with one row per vaccination dose).

Usage

sipni_data(
  year,
  type = "DPNI",
  uf = NULL,
  month = NULL,
  vars = NULL,
  parse = TRUE,
  col_types = NULL,
  cache = TRUE,
  cache_dir = NULL,
  lazy = FALSE,
  backend = c("arrow", "duckdb")
)

Arguments

year

Integer. Year(s) of the data. Required.

type

Character. File type for FTP data (1994–2019). Default: "DPNI" (doses applied). Use "CPNI" for vaccination coverage. Ignored for years >= 2020 (API data is always microdata).

uf

Character. Two-letter state abbreviation(s) to download. If NULL (default), downloads all 27 states. Example: "SP", c("SP", "RJ").

month

Integer. Month(s) to download (1–12). For years >= 2020 (CSV), selects which monthly CSV files to download. For years <= 2019 (FTP), this parameter is ignored (FTP files are annual). If NULL (default), downloads all 12 months.

vars

Character vector. Variables to keep. If NULL (default), returns all available variables. Use sipni_variables() to see available variables.

parse

Logical. If TRUE (default), converts columns to appropriate types (integer, double, Date) based on the variable metadata. Use sipni_variables() to see the target type for each variable. Set to FALSE for backward-compatible all-character output.

col_types

Named list. Override the default type for specific columns. Names are column names, values are type strings: "character", "integer", "double", "date_dmy", "date_ymd", "date_ym", "date". Example: list(QT_DOSE = "character") to keep QT_DOSE as character.

cache

Logical. If TRUE (default), caches downloaded data for faster future access.

cache_dir

Character. Directory for caching. Default: tools::R_user_dir("healthbR", "cache").

lazy

Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call dplyr::collect() to materialize the result. Default: FALSE.

backend

Character. Backend for lazy evaluation: "arrow" (default) or "duckdb". Only used when lazy = TRUE. DuckDB backend requires the duckdb package.

Value

A tibble with vaccination data. Includes columns year and uf_source to identify the source when multiple years/states are combined.

Output differs by year range:

  • 1994–2019 (FTP): Aggregated data with DPNI (12 vars) or CPNI (7 vars) columns, all character.

  • 2020+ (CSV): Individual-level microdata with ~47 columns (snake_case Portuguese), all character. Use sipni_variables(type = "API") to see the full list.

Details

FTP data (1994–2019): Downloaded as plain .DBF files. SI-PNI FTP data is aggregated (dose counts and coverage rates per municipality, vaccine, and age group). Two file types: DPNI (doses) and CPNI (coverage).

CSV data (2020+): Downloaded from OpenDataSUS as monthly CSV bulk files (national, semicolon-delimited, latin1 encoding). Each monthly ZIP is ~1.4 GB. This is individual-level microdata (one row per vaccination dose, ~47 fields per record). The type parameter is ignored for CSV years. Data is filtered by UF during chunked reading to avoid loading the full national file into memory.

See also

Examples

if (FALSE) { # interactive()
# FTP: doses applied in Acre, 2019
ac_doses <- sipni_data(year = 2019, uf = "AC")

# FTP: vaccination coverage in Acre, 2019
ac_cob <- sipni_data(year = 2019, type = "CPNI", uf = "AC")

# API: microdata for Acre, January 2024
ac_api <- sipni_data(year = 2024, uf = "AC", month = 1)

# API: select specific variables
sipni_data(year = 2024, uf = "AC", month = 1,
           vars = c("descricao_vacina", "tipo_sexo_paciente",
                    "data_vacina"))
}