Skip to contents

Overview

The healthbR package provides access to eight DATASUS information systems, covering mortality, live births, hospital admissions, outpatient production, notifiable diseases, the health facility registry, vaccination data, and primary care coverage:

Module Function Source document Granularity Years
SIM sim_data() Declaracao de Obito (DO) Annual/UF 1996–2024
SINASC sinasc_data() Declaracao de Nascido Vivo (DN) Annual/UF 1996–2024
SIH sih_data() AIH (Autorizacao de Internacao Hospitalar) Monthly/UF 2008–2024
SIA sia_data() BPA / APAC Monthly/type/UF 2008–2024
SINAN sinan_data() Ficha de Notificacao Annual/National 2007–2024
CNES cnes_data() Cadastro de Estabelecimentos Monthly/type/UF 2005–2024
SI-PNI sipni_data() PNI (doses, cobertura, microdados) Annual/UF 1994–2025
SISAB sisab_data() Cobertura da Atencao Primaria Monthly 2007–present

All seven modules share the same infrastructure:

  • DBC decompression: .dbc files (compressed DBF) are decompressed internally using vendored C code – no external dependencies required.
  • FTP download: files are fetched from ftp.datasus.gov.br with automatic retry and exponential backoff.
  • Cache: downloaded data is cached locally in Parquet (if arrow is installed) or .rds format.
  • Consistent API: every module exposes *_years(), *_info(), *_variables(), *_dictionary(), *_data(), *_cache_status(), and *_clear_cache().

Getting started

Common helper functions

Each module provides the same set of helper functions. Here is a quick tour using SIM as an example:

# available years
sim_years()
#> [1] 1996 1997 1998 ... 2023

# module information (data source, key variables, usage tips)
sim_info()

# list all variables with descriptions
sim_variables()

# search for a specific variable
sim_variables(search = "causa")

# data dictionary with category labels
sim_dictionary("SEXO")

The same pattern works for sinasc_*(), sih_*(), sia_*(), sinan_*(), cnes_*(), and sipni_*().

SIM – Mortality

The SIM (Sistema de Informacoes sobre Mortalidade) contains individual death records based on the Declaracao de Obito (DO).

Basic download

# all deaths in Acre, 2022
obitos_ac <- sim_data(year = 2022, uf = "AC")
obitos_ac

Filter by cause of death

The cause parameter filters by underlying cause of death (CAUSABAS) using CID-10 prefix matching:

# deaths from acute myocardial infarction (I21)
obitos_iam <- sim_data(year = 2022, uf = "AC", cause = "I21")

# all cardiovascular deaths (chapter I)
obitos_cardio <- sim_data(year = 2022, uf = "AC", cause = "I")

Key variables

Variable Description
CAUSABAS Underlying cause of death (CID-10)
DTOBITO Date of death
SEXO Sex (M = Male, F = Female, I = Unknown)
IDADE Age (encoded: 1st digit = unit, digits 2-3 = value)
CODMUNRES Municipality of residence (IBGE code)

Example: deaths by cause chapter

obitos_ac <- sim_data(year = 2022, uf = "AC")

obitos_ac |>
  mutate(chapter = substr(CAUSABAS, 1, 1)) |>
  count(chapter, sort = TRUE)

SINASC – Live births

The SINASC (Sistema de Informacoes sobre Nascidos Vivos) contains individual live birth records from the Declaracao de Nascido Vivo (DN).

Basic download

nasc_ac <- sinasc_data(year = 2022, uf = "AC")
nasc_ac

Filter by congenital anomaly

The anomaly parameter filters by the CODANOMAL variable using CID-10 prefix matching:

# births with any congenital anomaly (chapter Q)
anomalias <- sinasc_data(year = 2022, uf = "AC", anomaly = "Q")

Key variables

Variable Description
DTNASC Date of birth
SEXO Sex (1 = Male, 2 = Female, 0 = Unknown)
PESO Birth weight (grams)
IDADEMAE Mother’s age
CODMUNRES Municipality of residence (IBGE code)
CODANOMAL Congenital anomaly code (CID-10)

Example: birth weight distribution

nasc_ac <- sinasc_data(year = 2022, uf = "AC")

nasc_ac |>
  mutate(peso_num = as.numeric(PESO)) |>
  filter(!is.na(peso_num), peso_num > 0) |>
  mutate(weight_group = case_when(
    peso_num < 1500 ~ "Very low (<1500g)",
    peso_num < 2500 ~ "Low (1500-2499g)",
    peso_num < 4000 ~ "Normal (2500-3999g)",
    TRUE            ~ "High (>=4000g)"
  )) |>
  count(weight_group)

SIH – Hospital admissions

The SIH (Sistema de Informacoes Hospitalares) contains individual hospital admission records from the AIH (Autorizacao de Internacao Hospitalar). Unlike SIM and SINASC, data is organized monthly.

Basic download

# admissions in Acre, January 2022
intern_jan <- sih_data(year = 2022, month = 1, uf = "AC")
intern_jan

The month parameter

SIH data is monthly – one file per UF per month. Use month to control which months to download:

# single month
sih_data(year = 2022, month = 6, uf = "AC")

# first semester
sih_data(year = 2022, month = 1:6, uf = "AC")

# all 12 months (default when month = NULL -- downloads 12 files per UF)
sih_data(year = 2022, uf = "AC")

Filter by diagnosis

The diagnosis parameter filters by the principal diagnosis (DIAG_PRINC) using CID-10 prefix matching:

# respiratory admissions (chapter J)
resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")

# pneumonia specifically (J12-J18)
pneum <- sih_data(year = 2022, month = 1, uf = "AC",
                  diagnosis = c("J12", "J13", "J14", "J15", "J16", "J17", "J18"))

Key variables

Variable Description
DIAG_PRINC Principal diagnosis (CID-10)
DT_INTER Admission date
SEXO Sex (1 = Male, 3 = Female, 0 = Unknown)
MORTE In-hospital death (1 = Yes, 0 = No)
VAL_TOT Total value (R$)
DIAS_PERM Length of stay (days)

Example: admissions by diagnosis chapter

intern <- sih_data(year = 2022, month = 1, uf = "AC")

intern |>
  mutate(chapter = substr(DIAG_PRINC, 1, 1)) |>
  count(chapter, sort = TRUE)

SIA – Outpatient production

The SIA (Sistema de Informacoes Ambulatoriais) contains outpatient production records. Like SIH, data is monthly, but SIA also has 13 file types covering different categories of outpatient care.

File types

Code Name Description
PA Producao Ambulatorial BPA consolidated (default)
BI Boletim Individualizado BPA individualized
AD APAC Laudos Diversos High-complexity authorizations
AM APAC Medicamentos High-cost medications
AN APAC Nefrologia Nephrology procedures
AQ APAC Quimioterapia Oncology chemotherapy
AR APAC Radioterapia Oncology radiotherapy
AB APAC Cirurgia Bariatrica Bariatric surgery
ACF APAC Confeccao de Fistula Arteriovenous fistula
ATD APAC Tratamento Dialitico Dialysis
AMP APAC Acompanhamento Multiprofissional Multiprofessional follow-up
SAD RAAS Atencao Domiciliar Home care services
PS RAAS Psicossocial CAPS and psychosocial services

Basic download

# outpatient production in Acre, January 2022 (default type = "PA")
ambul_jan <- sia_data(year = 2022, month = 1, uf = "AC")
ambul_jan

# different file type: high-cost medications
med <- sia_data(year = 2022, month = 1, uf = "AC", type = "AM")

Filter by procedure and diagnosis

# filter by SIGTAP procedure code (prefix match on PA_PROC_ID)
consult <- sia_data(year = 2022, month = 1, uf = "AC", procedure = "0301")

# filter by CID-10 diagnosis (prefix match on PA_CIDPRI)
resp <- sia_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")

Key variables (PA type)

Variable Description
PA_PROC_ID Procedure code (SIGTAP)
PA_CIDPRI Principal diagnosis (CID-10)
PA_SEXO Sex (1 = Male, 2 = Female)
PA_IDADE Patient age
PA_VALAPR Approved value (R$)
PA_QTDAPR Approved quantity

Example: production by procedure group

ambul <- sia_data(year = 2022, month = 1, uf = "AC")

ambul |>
  mutate(proc_group = substr(PA_PROC_ID, 1, 2)) |>
  count(proc_group, sort = TRUE)

SINAN – Notifiable diseases

The SINAN (Sistema de Informacao de Agravos de Notificacao) contains individual notification records for 31 compulsorily notifiable diseases. Unlike other DATASUS modules, SINAN files are national (one file per disease per year, covering all of Brazil).

Available diseases

SINAN covers 31 diseases. Use sinan_diseases() to see all available codes:

sinan_diseases()
#> # A tibble: 31 x 3
#>    code  name                      description
#>    <chr> <chr>                     <chr>
#>  1 DENG  Dengue                    Dengue
#>  2 CHIK  Chikungunya               Febre de Chikungunya
#>  3 ZIKA  Zika                      Zika virus
#>  4 TUBE  Tuberculose               Tuberculose
#>  ...

# search for a specific disease
sinan_diseases(search = "sifilis")

Basic download

# dengue notifications, 2022 (default disease)
dengue <- sinan_data(year = 2022)

# tuberculosis notifications, 2020-2022
tb <- sinan_data(year = 2020:2022, disease = "TUBE")

# select specific variables
sinan_data(year = 2022, disease = "DENG",
           vars = c("DT_NOTIFIC", "CS_SEXO", "NU_IDADE_N",
                    "ID_MUNICIP", "CLASSI_FIN"))

Filtering by state

Since files are national, filter by UF after download:

dengue <- sinan_data(year = 2022)

# filter by state of notification
dengue_sp <- dengue |>
  filter(SG_UF_NOT == "35")  # Sao Paulo (IBGE code)

# or by municipality
dengue_rio <- dengue |>
  filter(substr(ID_MUNICIP, 1, 2) == "33")  # Rio de Janeiro state

Key variables

Variable Description
DT_NOTIFIC Notification date
ID_AGRAVO Disease code (CID-10)
CS_SEXO Sex (M = Male, F = Female, I = Unknown)
NU_IDADE_N Age (encoded: 1st digit = unit, digits 2-4 = value)
ID_MUNICIP Municipality of notification (IBGE code)
CLASSI_FIN Final classification (1 = Confirmed, 2 = Discarded)
EVOLUCAO Outcome (1 = Cure, 2 = Death from disease)

Example: confirmed dengue by month

dengue <- sinan_data(year = 2022, disease = "DENG")

dengue |>
  filter(CLASSI_FIN %in% c("1", "5")) |>  # confirmed cases
  mutate(month = substr(DT_NOTIFIC, 4, 5)) |>
  count(month, sort = TRUE)

CNES – Health facility registry

The CNES (Cadastro Nacional de Estabelecimentos de Saude) is the national registry of all health facilities in Brazil. Like SIH and SIA, data is organized monthly (one file per type/UF/month), and there are 13 file types covering different aspects of the registry.

File types

Code Name Description
ST Estabelecimentos Facility registry (default)
LT Leitos Hospital beds
PF Profissional Health professionals
DC Dados Complementares Complementary facility data
EQ Equipamentos Health equipment
SR Servico Especializado Specialized services
HB Habilitacao Facility certifications
EP Equipes Health teams
RC Regra Contratual Contractual rules
IN Incentivos Financial incentives
EE Estab. de Ensino Teaching facilities
EF Estab. Filantropico Philanthropic facilities
GM Gestao e Metas Management and targets

Basic download

# establishments in Acre, January 2023
estab <- cnes_data(year = 2023, month = 1, uf = "AC")

# hospital beds
leitos <- cnes_data(year = 2023, month = 1, uf = "AC", type = "LT")

# health professionals
prof <- cnes_data(year = 2023, month = 1, uf = "AC", type = "PF")

Key variables (ST type)

Variable Description
CNES Facility CNES code
CODUFMUN Municipality (UF + IBGE 6-digit code)
TP_UNID Facility type (22 categories)
VINC_SUS SUS-linked (0 = No, 1 = Yes)
TP_GESTAO Management type (M = Municipal, E = State, D = Dual)
ESFERA_A Administrative sphere (1-4)

Example: facility types in a state

estab <- cnes_data(year = 2023, month = 1, uf = "AC")

estab |>
  count(TP_UNID, sort = TRUE) |>
  left_join(
    cnes_dictionary("TP_UNID") |> select(code, label),
    by = c("TP_UNID" = "code")
  )

SI-PNI – Vaccination data

The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) provides vaccination data from two sources:

  • FTP (1994–2019): Aggregated data with dose counts and coverage rates per municipality/vaccine/age group. Plain .DBF files (not DBC-compressed).
  • OpenDataSUS API (2020–2025): Individual-level microdata with one row per vaccination dose (~47 fields per record).

sipni_data() transparently routes to the correct source based on the requested year.

File types

Code Name Description
DPNI Doses Aplicadas Doses applied per municipality, age group, vaccine, and dose type (FTP, default)
CPNI Cobertura Vacinal Vaccination coverage per municipality and vaccine (FTP)
API Microdados Individual-level microdata via OpenDataSUS (2020+, automatic)

Basic download

# FTP: doses applied in Acre, 2019 (default type = "DPNI")
doses_ac <- sipni_data(year = 2019, uf = "AC")
doses_ac

# FTP: vaccination coverage
cob_ac <- sipni_data(year = 2019, type = "CPNI", uf = "AC")

# API: individual-level microdata, Acre, January 2024
micro_ac <- sipni_data(year = 2024, uf = "AC", month = 1)
micro_ac

Key variables (DPNI)

Variable Description
IMUNO Vaccine code (immunobiological)
QT_DOSE Number of doses applied
DOSE Dose type (1st, 2nd, booster, etc.)
FX_ETARIA Age group (coded)
MUNIC Municipality (IBGE 6-digit code)
ANOMES Year and month (YYYYMM)

Key variables (CPNI)

Variable Description
IMUNO Vaccine code
QT_DOSE Number of doses applied
POP Target population
COBERT Vaccination coverage (%)
MUNIC Municipality (IBGE 6-digit code)

Example: doses by vaccine

doses <- sipni_data(year = 2019, uf = "AC")

doses |>
  group_by(IMUNO) |>
  summarize(total_doses = sum(as.numeric(QT_DOSE), na.rm = TRUE)) |>
  arrange(desc(total_doses)) |>
  left_join(
    sipni_dictionary("IMUNO") |> select(code, label),
    by = c("IMUNO" = "code")
  )

Cross-module analyses

A key strength of healthbR is the ability to combine data from different DATASUS modules and Census denominators in a single workflow. Below are three practical examples.

Mortality rate (SIM + Census)

Calculate the crude cardiovascular mortality rate per 100,000 population:

# step 1: count cardiovascular deaths in Sao Paulo, 2022
obitos_cardio <- sim_data(year = 2022, uf = "SP", cause = "I")
n_obitos <- nrow(obitos_cardio)

# step 2: get population denominator from Census 2022
pop_sp <- censo_populacao(year = 2022, territorial_level = "state") |>
  filter(grepl("Paulo", territorial_unit))

# step 3: calculate rate
taxa_mortalidade <- n_obitos / pop_sp$population * 100000
taxa_mortalidade

Live births to deaths ratio (SINASC + SIM)

Compare the number of live births and deaths in a state:

# births and deaths in Acre, 2022
nascimentos <- sinasc_data(year = 2022, uf = "AC")
obitos <- sim_data(year = 2022, uf = "AC")

razao <- nrow(nascimentos) / nrow(obitos)
razao
#> ratio > 1 means more births than deaths (population growth)

Hospital vs. outpatient care (SIH + SIA)

Compare volumes and costs of respiratory care (CID-10 chapter J) between hospital and outpatient settings:

# hospital admissions for respiratory diseases, January 2022
intern_resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")

# outpatient production for respiratory diseases, January 2022
ambul_resp <- sia_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")

# compare volumes
n_internacoes <- nrow(intern_resp)
n_ambulatorial <- nrow(ambul_resp)

# compare costs
custo_intern <- sum(as.numeric(intern_resp$VAL_TOT), na.rm = TRUE)
custo_ambul <- sum(as.numeric(ambul_resp$PA_VALAPR), na.rm = TRUE)

tibble::tibble(
  setting = c("Hospital (SIH)", "Outpatient (SIA)"),
  records = c(n_internacoes, n_ambulatorial),
  total_cost_brl = c(custo_intern, custo_ambul)
)

Cache and performance

Automatic caching

All DATASUS modules cache downloaded data automatically. When the arrow package is installed, data is saved in Parquet format (fast and compact); otherwise, .rds is used as fallback.

# install arrow for optimized caching (recommended)
install.packages("arrow")

Cache management

Each module provides *_cache_status() and *_clear_cache():

# check what is cached
sim_cache_status()
sih_cache_status()
sia_cache_status()

# clear cache for a specific module
sim_clear_cache()

Tips for managing downloads

  • Use uf to download only the states you need instead of all 27 (SIM, SINASC, SIH, SIA, CNES).
  • Use month (SIH, SIA, CNES) to limit monthly downloads. Downloading a full year for all states requires 324 files per module (27 UFs x 12 months).
  • Use vars to keep only the variables you need, reducing memory usage.
  • SIM and SINASC are annual (one file per UF per year), so a full-year download is 27 files.
  • SINAN files are national (one file per disease per year), so downloads are fast but files can be large.
  • SIH, SIA, and CNES are monthly, so a full-year download is 324 files per type. SIA and CNES each have 13 file types – always filter by type, uf, and month.
  • SI-PNI FTP is annual with plain .DBF files (one per type/UF/year, 1994–2019). API data (2020+) is per-UF/year; use month to limit months.

Additional resources

  • DATASUS TabNet (datasus.saude.gov.br) – online tabulation tool for DATASUS data
  • DATASUS FTP (ftp.datasus.gov.br) – public FTP server with raw data files
  • CID-10 (WHO ICD-10) – International Classification of Diseases, 10th revision
  • SIGTAP (wiki.saude.gov.br/sigtap) – procedure code table for SUS (SIA/SIH)