Downloads and returns notifiable disease microdata from DATASUS FTP. Each row represents one notification record (Ficha de Notificacao). Data is downloaded as national .dbc files (one file per disease per year), decompressed internally, and returned as a tibble.
Usage
sinan_data(
year,
disease = "DENG",
vars = NULL,
parse = TRUE,
col_types = NULL,
cache = TRUE,
cache_dir = NULL,
lazy = FALSE,
backend = c("arrow", "duckdb")
)Arguments
- year
Integer. Year(s) of the data. Required.
- disease
Character. Disease code to download. Default:
"DENG"(Dengue). Usesinan_diseases()to see all available codes.- vars
Character vector. Variables to keep. If NULL (default), returns all available variables. Use
sinan_variables()to see available variables.- parse
Logical. If TRUE (default), converts columns to appropriate types (integer, double, Date) based on the variable metadata. Use
sinan_variables()to see the target type for each variable. Set to FALSE for backward-compatible all-character output.- col_types
Named list. Override the default type for specific columns. Names are column names, values are type strings:
"character","integer","double","date_dmy","date_ymd","date_ym","date". Example:list(DT_NOTIFIC = "character")to keep DT_NOTIFIC as character.- cache
Logical. If TRUE (default), caches downloaded data for faster future access.
- cache_dir
Character. Directory for caching. Default:
tools::R_user_dir("healthbR", "cache").- lazy
Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call
dplyr::collect()to materialize the result. Default: FALSE.- backend
Character. Backend for lazy evaluation:
"arrow"(default) or"duckdb". Only used whenlazy = TRUE. DuckDB backend requires the duckdb package.
Value
A tibble with notifiable disease microdata. Includes columns
year and disease to identify the source when multiple years are
combined.
Details
SINAN files are national (not per-state). Each file contains all
notifications for a given disease in a given year across all of Brazil.
To filter by state, use the SG_UF_NOT (UF of notification) or
ID_MUNICIP (municipality code) columns after download.
Data is downloaded from DATASUS FTP as .dbc files. The .dbc format is decompressed internally using vendored C code from the blast library. No external dependencies are required.
Examples
if (FALSE) { # interactive()
# dengue notifications, 2022
dengue_2022 <- sinan_data(year = 2022)
# tuberculosis, 2020-2022
tb <- sinan_data(year = 2020:2022, disease = "TUBE")
# only key variables
sinan_data(year = 2022, disease = "DENG",
vars = c("DT_NOTIFIC", "CS_SEXO", "NU_IDADE_N",
"CS_RACA", "ID_MUNICIP", "CLASSI_FIN"))
}