Downloads and returns mortality microdata from DATASUS FTP. Each row represents one death record (Declaracao de Obito). Data is downloaded per state (UF) as compressed .dbc files, decompressed internally, and returned as a tibble.
Usage
sim_data(
year,
vars = NULL,
uf = NULL,
cause = NULL,
decode_age = TRUE,
parse = TRUE,
col_types = NULL,
cache = TRUE,
cache_dir = NULL,
lazy = FALSE,
backend = c("arrow", "duckdb")
)Arguments
- year
Integer. Year(s) of the data. Required.
- vars
Character vector. Variables to keep. If NULL (default), returns all available variables. Use
sim_variables()to see available variables.- uf
Character. Two-letter state abbreviation(s) to download. If NULL (default), downloads all 27 states. Example:
"SP",c("SP", "RJ").- cause
Character. CID-10 code pattern(s) to filter by cause of death (
CAUSABAS). Supports partial matching (prefix). If NULL (default), returns all causes. Example:"I21"(infarct),"C"(all neoplasms).- decode_age
Logical. If TRUE (default), adds a numeric column
age_yearswith age in years decoded from theIDADEvariable.- parse
Logical. If TRUE (default), converts columns to appropriate types (integer, double, Date) based on the variable metadata. Use
sim_variables()to see the target type for each variable. Set to FALSE for backward-compatible all-character output.- col_types
Named list. Override the default type for specific columns. Names are column names, values are type strings:
"character","integer","double","date_dmy","date_ymd","date_ym","date". Example:list(PESO = "character")to keep PESO as character.- cache
Logical. If TRUE (default), caches downloaded data for faster future access.
- cache_dir
Character. Directory for caching. Default:
tools::R_user_dir("healthbR", "cache").- lazy
Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call
dplyr::collect()to materialize the result. Default: FALSE.- backend
Character. Backend for lazy evaluation:
"arrow"(default) or"duckdb". Only used whenlazy = TRUE. DuckDB backend requires the duckdb package.
Value
A tibble with mortality microdata. Includes columns year
and uf_source to identify the source when multiple years/states
are combined.
Details
Data is downloaded from DATASUS FTP as .dbc files (one per state per year). The .dbc format is decompressed internally using vendored C code from the blast library. No external dependencies are required.
When uf is specified, only the requested state(s) are downloaded,
making the operation much faster than downloading the entire country.
See also
censo_populacao() for population denominators to calculate
mortality rates.
Other sim:
sim_cache_status(),
sim_clear_cache(),
sim_dictionary(),
sim_info(),
sim_variables(),
sim_years()
Examples
if (FALSE) { # interactive()
# all deaths in Acre, 2022
ac_2022 <- sim_data(year = 2022, uf = "AC")
# deaths by infarct in Sao Paulo, 2020-2022
infarct_sp <- sim_data(year = 2020:2022, uf = "SP", cause = "I21")
# only key variables, Rio de Janeiro, 2022
sim_data(year = 2022, uf = "RJ",
vars = c("DTOBITO", "SEXO", "IDADE",
"RACACOR", "CODMUNRES", "CAUSABAS"))
}