DATASUS Modules: SIM, SINASC, SIH, SIA, SINAN, CNES, SI-PNI, and SISAB
Source:vignettes/datasus-modules.Rmd
datasus-modules.RmdOverview
The healthbR package provides access to eight DATASUS
information systems, covering mortality, live births, hospital
admissions, outpatient production, notifiable diseases, the health
facility registry, vaccination data, and primary care coverage:
| Module | Function | Source document | Granularity | Years |
|---|---|---|---|---|
| SIM | sim_data() |
Declaracao de Obito (DO) | Annual/UF | 1996–2024 |
| SINASC | sinasc_data() |
Declaracao de Nascido Vivo (DN) | Annual/UF | 1996–2024 |
| SIH | sih_data() |
AIH (Autorizacao de Internacao Hospitalar) | Monthly/UF | 2008–2024 |
| SIA | sia_data() |
BPA / APAC | Monthly/type/UF | 2008–2024 |
| SINAN | sinan_data() |
Ficha de Notificacao | Annual/National | 2007–2024 |
| CNES | cnes_data() |
Cadastro de Estabelecimentos | Monthly/type/UF | 2005–2024 |
| SI-PNI | sipni_data() |
PNI (doses, cobertura, microdados) | Annual/UF | 1994–2025 |
| SISAB | sisab_data() |
Cobertura da Atencao Primaria | Monthly | 2007–present |
All seven modules share the same infrastructure:
- DBC decompression: .dbc files (compressed DBF) are decompressed internally using vendored C code – no external dependencies required.
-
FTP download: files are fetched from
ftp.datasus.gov.brwith automatic retry and exponential backoff. -
Cache: downloaded data is cached locally in Parquet
(if
arrowis installed) or .rds format. -
Consistent API: every module exposes
*_years(),*_info(),*_variables(),*_dictionary(),*_data(),*_cache_status(), and*_clear_cache().
Getting started
Common helper functions
Each module provides the same set of helper functions. Here is a quick tour using SIM as an example:
# available years
sim_years()
#> [1] 1996 1997 1998 ... 2023
# module information (data source, key variables, usage tips)
sim_info()
# list all variables with descriptions
sim_variables()
# search for a specific variable
sim_variables(search = "causa")
# data dictionary with category labels
sim_dictionary("SEXO")The same pattern works for sinasc_*(),
sih_*(), sia_*(), sinan_*(),
cnes_*(), and sipni_*().
SIM – Mortality
The SIM (Sistema de Informacoes sobre Mortalidade) contains individual death records based on the Declaracao de Obito (DO).
Basic download
# all deaths in Acre, 2022
obitos_ac <- sim_data(year = 2022, uf = "AC")
obitos_acFilter by cause of death
The cause parameter filters by underlying cause of death
(CAUSABAS) using CID-10 prefix matching:
SINASC – Live births
The SINASC (Sistema de Informacoes sobre Nascidos Vivos) contains individual live birth records from the Declaracao de Nascido Vivo (DN).
Basic download
nasc_ac <- sinasc_data(year = 2022, uf = "AC")
nasc_acFilter by congenital anomaly
The anomaly parameter filters by the CODANOMAL variable
using CID-10 prefix matching:
# births with any congenital anomaly (chapter Q)
anomalias <- sinasc_data(year = 2022, uf = "AC", anomaly = "Q")Key variables
| Variable | Description |
|---|---|
DTNASC |
Date of birth |
SEXO |
Sex (1 = Male, 2 = Female, 0 = Unknown) |
PESO |
Birth weight (grams) |
IDADEMAE |
Mother’s age |
CODMUNRES |
Municipality of residence (IBGE code) |
CODANOMAL |
Congenital anomaly code (CID-10) |
Example: birth weight distribution
nasc_ac <- sinasc_data(year = 2022, uf = "AC")
nasc_ac |>
mutate(peso_num = as.numeric(PESO)) |>
filter(!is.na(peso_num), peso_num > 0) |>
mutate(weight_group = case_when(
peso_num < 1500 ~ "Very low (<1500g)",
peso_num < 2500 ~ "Low (1500-2499g)",
peso_num < 4000 ~ "Normal (2500-3999g)",
TRUE ~ "High (>=4000g)"
)) |>
count(weight_group)SIH – Hospital admissions
The SIH (Sistema de Informacoes Hospitalares) contains individual hospital admission records from the AIH (Autorizacao de Internacao Hospitalar). Unlike SIM and SINASC, data is organized monthly.
Basic download
# admissions in Acre, January 2022
intern_jan <- sih_data(year = 2022, month = 1, uf = "AC")
intern_janThe month parameter
SIH data is monthly – one file per UF per month. Use
month to control which months to download:
Filter by diagnosis
The diagnosis parameter filters by the principal
diagnosis (DIAG_PRINC) using CID-10 prefix matching:
SIA – Outpatient production
The SIA (Sistema de Informacoes Ambulatoriais) contains outpatient production records. Like SIH, data is monthly, but SIA also has 13 file types covering different categories of outpatient care.
File types
| Code | Name | Description |
|---|---|---|
| PA | Producao Ambulatorial | BPA consolidated (default) |
| BI | Boletim Individualizado | BPA individualized |
| AD | APAC Laudos Diversos | High-complexity authorizations |
| AM | APAC Medicamentos | High-cost medications |
| AN | APAC Nefrologia | Nephrology procedures |
| AQ | APAC Quimioterapia | Oncology chemotherapy |
| AR | APAC Radioterapia | Oncology radiotherapy |
| AB | APAC Cirurgia Bariatrica | Bariatric surgery |
| ACF | APAC Confeccao de Fistula | Arteriovenous fistula |
| ATD | APAC Tratamento Dialitico | Dialysis |
| AMP | APAC Acompanhamento Multiprofissional | Multiprofessional follow-up |
| SAD | RAAS Atencao Domiciliar | Home care services |
| PS | RAAS Psicossocial | CAPS and psychosocial services |
SINAN – Notifiable diseases
The SINAN (Sistema de Informacao de Agravos de Notificacao) contains individual notification records for 31 compulsorily notifiable diseases. Unlike other DATASUS modules, SINAN files are national (one file per disease per year, covering all of Brazil).
Available diseases
SINAN covers 31 diseases. Use sinan_diseases() to see
all available codes:
sinan_diseases()
#> # A tibble: 31 x 3
#> code name description
#> <chr> <chr> <chr>
#> 1 DENG Dengue Dengue
#> 2 CHIK Chikungunya Febre de Chikungunya
#> 3 ZIKA Zika Zika virus
#> 4 TUBE Tuberculose Tuberculose
#> ...
# search for a specific disease
sinan_diseases(search = "sifilis")Basic download
# dengue notifications, 2022 (default disease)
dengue <- sinan_data(year = 2022)
# tuberculosis notifications, 2020-2022
tb <- sinan_data(year = 2020:2022, disease = "TUBE")
# select specific variables
sinan_data(year = 2022, disease = "DENG",
vars = c("DT_NOTIFIC", "CS_SEXO", "NU_IDADE_N",
"ID_MUNICIP", "CLASSI_FIN"))Filtering by state
Since files are national, filter by UF after download:
dengue <- sinan_data(year = 2022)
# filter by state of notification
dengue_sp <- dengue |>
filter(SG_UF_NOT == "35") # Sao Paulo (IBGE code)
# or by municipality
dengue_rio <- dengue |>
filter(substr(ID_MUNICIP, 1, 2) == "33") # Rio de Janeiro stateKey variables
| Variable | Description |
|---|---|
DT_NOTIFIC |
Notification date |
ID_AGRAVO |
Disease code (CID-10) |
CS_SEXO |
Sex (M = Male, F = Female, I = Unknown) |
NU_IDADE_N |
Age (encoded: 1st digit = unit, digits 2-4 = value) |
ID_MUNICIP |
Municipality of notification (IBGE code) |
CLASSI_FIN |
Final classification (1 = Confirmed, 2 = Discarded) |
EVOLUCAO |
Outcome (1 = Cure, 2 = Death from disease) |
CNES – Health facility registry
The CNES (Cadastro Nacional de Estabelecimentos de Saude) is the national registry of all health facilities in Brazil. Like SIH and SIA, data is organized monthly (one file per type/UF/month), and there are 13 file types covering different aspects of the registry.
File types
| Code | Name | Description |
|---|---|---|
| ST | Estabelecimentos | Facility registry (default) |
| LT | Leitos | Hospital beds |
| PF | Profissional | Health professionals |
| DC | Dados Complementares | Complementary facility data |
| EQ | Equipamentos | Health equipment |
| SR | Servico Especializado | Specialized services |
| HB | Habilitacao | Facility certifications |
| EP | Equipes | Health teams |
| RC | Regra Contratual | Contractual rules |
| IN | Incentivos | Financial incentives |
| EE | Estab. de Ensino | Teaching facilities |
| EF | Estab. Filantropico | Philanthropic facilities |
| GM | Gestao e Metas | Management and targets |
SI-PNI – Vaccination data
The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) provides vaccination data from two sources:
- FTP (1994–2019): Aggregated data with dose counts and coverage rates per municipality/vaccine/age group. Plain .DBF files (not DBC-compressed).
- OpenDataSUS API (2020–2025): Individual-level microdata with one row per vaccination dose (~47 fields per record).
sipni_data() transparently routes to the correct source
based on the requested year.
File types
| Code | Name | Description |
|---|---|---|
| DPNI | Doses Aplicadas | Doses applied per municipality, age group, vaccine, and dose type (FTP, default) |
| CPNI | Cobertura Vacinal | Vaccination coverage per municipality and vaccine (FTP) |
| API | Microdados | Individual-level microdata via OpenDataSUS (2020+, automatic) |
Basic download
# FTP: doses applied in Acre, 2019 (default type = "DPNI")
doses_ac <- sipni_data(year = 2019, uf = "AC")
doses_ac
# FTP: vaccination coverage
cob_ac <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
# API: individual-level microdata, Acre, January 2024
micro_ac <- sipni_data(year = 2024, uf = "AC", month = 1)
micro_acKey variables (DPNI)
| Variable | Description |
|---|---|
IMUNO |
Vaccine code (immunobiological) |
QT_DOSE |
Number of doses applied |
DOSE |
Dose type (1st, 2nd, booster, etc.) |
FX_ETARIA |
Age group (coded) |
MUNIC |
Municipality (IBGE 6-digit code) |
ANOMES |
Year and month (YYYYMM) |
Key variables (CPNI)
| Variable | Description |
|---|---|
IMUNO |
Vaccine code |
QT_DOSE |
Number of doses applied |
POP |
Target population |
COBERT |
Vaccination coverage (%) |
MUNIC |
Municipality (IBGE 6-digit code) |
Example: doses by vaccine
doses <- sipni_data(year = 2019, uf = "AC")
doses |>
group_by(IMUNO) |>
summarize(total_doses = sum(as.numeric(QT_DOSE), na.rm = TRUE)) |>
arrange(desc(total_doses)) |>
left_join(
sipni_dictionary("IMUNO") |> select(code, label),
by = c("IMUNO" = "code")
)Cross-module analyses
A key strength of healthbR is the ability to combine
data from different DATASUS modules and Census denominators in a single
workflow. Below are three practical examples.
Mortality rate (SIM + Census)
Calculate the crude cardiovascular mortality rate per 100,000 population:
# step 1: count cardiovascular deaths in Sao Paulo, 2022
obitos_cardio <- sim_data(year = 2022, uf = "SP", cause = "I")
n_obitos <- nrow(obitos_cardio)
# step 2: get population denominator from Census 2022
pop_sp <- censo_populacao(year = 2022, territorial_level = "state") |>
filter(grepl("Paulo", territorial_unit))
# step 3: calculate rate
taxa_mortalidade <- n_obitos / pop_sp$population * 100000
taxa_mortalidadeLive births to deaths ratio (SINASC + SIM)
Compare the number of live births and deaths in a state:
# births and deaths in Acre, 2022
nascimentos <- sinasc_data(year = 2022, uf = "AC")
obitos <- sim_data(year = 2022, uf = "AC")
razao <- nrow(nascimentos) / nrow(obitos)
razao
#> ratio > 1 means more births than deaths (population growth)Hospital vs. outpatient care (SIH + SIA)
Compare volumes and costs of respiratory care (CID-10 chapter J) between hospital and outpatient settings:
# hospital admissions for respiratory diseases, January 2022
intern_resp <- sih_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# outpatient production for respiratory diseases, January 2022
ambul_resp <- sia_data(year = 2022, month = 1, uf = "AC", diagnosis = "J")
# compare volumes
n_internacoes <- nrow(intern_resp)
n_ambulatorial <- nrow(ambul_resp)
# compare costs
custo_intern <- sum(as.numeric(intern_resp$VAL_TOT), na.rm = TRUE)
custo_ambul <- sum(as.numeric(ambul_resp$PA_VALAPR), na.rm = TRUE)
tibble::tibble(
setting = c("Hospital (SIH)", "Outpatient (SIA)"),
records = c(n_internacoes, n_ambulatorial),
total_cost_brl = c(custo_intern, custo_ambul)
)Cache and performance
Automatic caching
All DATASUS modules cache downloaded data automatically. When the
arrow package is installed, data is saved in Parquet format
(fast and compact); otherwise, .rds is used as fallback.
# install arrow for optimized caching (recommended)
install.packages("arrow")Cache management
Each module provides *_cache_status() and
*_clear_cache():
# check what is cached
sim_cache_status()
sih_cache_status()
sia_cache_status()
# clear cache for a specific module
sim_clear_cache()Tips for managing downloads
-
Use
ufto download only the states you need instead of all 27 (SIM, SINASC, SIH, SIA, CNES). -
Use
month(SIH, SIA, CNES) to limit monthly downloads. Downloading a full year for all states requires 324 files per module (27 UFs x 12 months). -
Use
varsto keep only the variables you need, reducing memory usage. - SIM and SINASC are annual (one file per UF per year), so a full-year download is 27 files.
- SINAN files are national (one file per disease per year), so downloads are fast but files can be large.
- SIH, SIA, and CNES are monthly, so a full-year download is 324 files
per type. SIA and CNES each have 13 file types – always filter by
type,uf, andmonth. - SI-PNI FTP is annual with plain .DBF files (one per type/UF/year,
1994–2019). API data (2020+) is per-UF/year; use
monthto limit months.
Additional resources
- DATASUS TabNet (
datasus.saude.gov.br) – online tabulation tool for DATASUS data - DATASUS FTP (
ftp.datasus.gov.br) – public FTP server with raw data files - CID-10 (WHO ICD-10) – International Classification of Diseases, 10th revision
- SIGTAP (
wiki.saude.gov.br/sigtap) – procedure code table for SUS (SIA/SIH)