Overview
The SIM (Sistema de Informacoes sobre Mortalidade) is Brazil’s national mortality information system, managed by the Ministry of Health through DATASUS. It records individual death certificates (Declaracao de Obito) with cause of death coded by ICD-10.
| Feature | Details |
|---|---|
| Coverage | Per state (UF), all 27 states |
| Years | 1996–2024 (CID-10 era) |
| Unit | One row per death certificate |
| Format | .dbc files from DATASUS FTP |
Getting started
Module information
sim_info()Downloading data
Basic download (one state, one year)
deaths_ac <- sim_data(year = 2022, uf = "AC")All states (default)
# downloads all 27 states -- may take several minutes
deaths_all <- sim_data(year = 2022)Age decoding
The IDADE variable uses a 3-digit encoding where the
first digit indicates the unit and the remaining two indicate the
value:
| First digit | Unit | Example |
|---|---|---|
| 0 | Minutes |
005 = 5 minutes |
| 1 | Hours |
112 = 12 hours |
| 2 | Days |
215 = 15 days |
| 3 | Months |
306 = 6 months |
| 4 | Years |
445 = 45 years |
| 5 | 100+ years |
502 = 102 years |
By default, decode_age = TRUE adds an
age_years column:
Key variables
| Variable | Description |
|---|---|
| CAUSABAS | Underlying cause of death (CID-10) |
| DTOBITO | Date of death |
| SEXO | Sex (1=Male, 2=Female, 0=Unknown) |
| IDADE | Age (3-digit encoded) |
| RACACOR | Race/color (1=White, 2=Black, 3=Yellow, 4=Brown, 5=Indigenous) |
| CODMUNRES | Municipality of residence (IBGE 6 digits) |
| LINHAA-D | Cause of death lines A-D from the certificate |
| ESCMAE | Mother’s education level |
| ESTCIV | Marital status |
Data dictionary
sim_dictionary()
sim_dictionary("SEXO")
sim_dictionary("RACACOR")Explore variables
sim_variables()
sim_variables(search = "causa")Example: Age-specific mortality rate
Combine SIM data with Census population denominators:
# deaths by age group
deaths <- sim_data(year = 2022, uf = "SP") |>
filter(!is.na(age_years)) |>
mutate(age_group = cut(age_years,
breaks = c(0, 1, 5, 15, 30, 45, 60, 80, Inf),
right = FALSE
)) |>
count(age_group, name = "deaths")
# population from Census 2022
pop <- censo_populacao(year = 2022, territorial_level = "state", geo_code = "35")
# join and calculate rates per 100,000Cache and lazy evaluation
sim_cache_status()
sim_clear_cache()
# lazy query (requires arrow)
lazy <- sim_data(year = 2022, uf = "SP", lazy = TRUE)
lazy |>
filter(CAUSABAS >= "I20", CAUSABAS < "I26") |>
collect()Further reading
- SIM on DATASUS (
datasus.saude.gov.br) - SINASC vignette for live birth data
- Census vignette for population denominators