Downloads and returns vaccination data from SI-PNI. For years 1994–2019, data is downloaded from DATASUS FTP (aggregated doses/coverage). For years 2020+, data is downloaded from OpenDataSUS as monthly CSV bulk files (individual-level microdata with one row per vaccination dose).
Usage
sipni_data(
year,
type = "DPNI",
uf = NULL,
month = NULL,
vars = NULL,
parse = TRUE,
col_types = NULL,
cache = TRUE,
cache_dir = NULL,
lazy = FALSE,
backend = c("arrow", "duckdb")
)Arguments
- year
Integer. Year(s) of the data. Required.
- type
Character. File type for FTP data (1994–2019). Default:
"DPNI"(doses applied). Use"CPNI"for vaccination coverage. Ignored for years >= 2020 (API data is always microdata).- uf
Character. Two-letter state abbreviation(s) to download. If NULL (default), downloads all 27 states. Example:
"SP",c("SP", "RJ").- month
Integer. Month(s) to download (1–12). For years >= 2020 (CSV), selects which monthly CSV files to download. For years <= 2019 (FTP), this parameter is ignored (FTP files are annual). If NULL (default), downloads all 12 months.
- vars
Character vector. Variables to keep. If NULL (default), returns all available variables. Use
sipni_variables()to see available variables.- parse
Logical. If TRUE (default), converts columns to appropriate types (integer, double, Date) based on the variable metadata. Use
sipni_variables()to see the target type for each variable. Set to FALSE for backward-compatible all-character output.- col_types
Named list. Override the default type for specific columns. Names are column names, values are type strings:
"character","integer","double","date_dmy","date_ymd","date_ym","date". Example:list(QT_DOSE = "character")to keep QT_DOSE as character.- cache
Logical. If TRUE (default), caches downloaded data for faster future access.
- cache_dir
Character. Directory for caching. Default:
tools::R_user_dir("healthbR", "cache").- lazy
Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call
dplyr::collect()to materialize the result. Default: FALSE.- backend
Character. Backend for lazy evaluation:
"arrow"(default) or"duckdb". Only used whenlazy = TRUE. DuckDB backend requires the duckdb package.
Value
A tibble with vaccination data. Includes columns
year and uf_source to identify the source
when multiple years/states are combined.
Output differs by year range:
1994–2019 (FTP): Aggregated data with DPNI (12 vars) or CPNI (7 vars) columns, all character.
2020+ (CSV): Individual-level microdata with ~47 columns (snake_case Portuguese), all character. Use
sipni_variables(type = "API")to see the full list.
Details
FTP data (1994–2019): Downloaded as plain .DBF files. SI-PNI FTP data is aggregated (dose counts and coverage rates per municipality, vaccine, and age group). Two file types: DPNI (doses) and CPNI (coverage).
CSV data (2020+):
Downloaded from OpenDataSUS as monthly CSV bulk files (national,
semicolon-delimited, latin1 encoding). Each monthly ZIP is ~1.4 GB.
This is individual-level microdata (one row per vaccination dose,
~47 fields per record). The type parameter is ignored for CSV
years. Data is filtered by UF during chunked reading to avoid loading
the full national file into memory.
See also
sipni_info() for type descriptions,
censo_populacao() for population denominators.
Other sipni:
sipni_cache_status(),
sipni_clear_cache(),
sipni_dictionary(),
sipni_info(),
sipni_variables(),
sipni_years()
Examples
if (FALSE) { # interactive()
# FTP: doses applied in Acre, 2019
ac_doses <- sipni_data(year = 2019, uf = "AC")
# FTP: vaccination coverage in Acre, 2019
ac_cob <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
# API: microdata for Acre, January 2024
ac_api <- sipni_data(year = 2024, uf = "AC", month = 1)
# API: select specific variables
sipni_data(year = 2024, uf = "AC", month = 1,
vars = c("descricao_vacina", "tipo_sexo_paciente",
"data_vacina"))
}