Skip to contents

Overview

The SINAN (Sistema de Informacao de Agravos de Notificacao) is Brazil’s national notifiable disease surveillance system, managed by the Ministry of Health through DATASUS. It records individual notification forms for compulsory-notification diseases.

The healthbR package provides access to SINAN microdata from the DATASUS FTP:

Feature Details
Coverage National (one file per disease per year)
Diseases 31 notifiable disease codes
Years 2007–2024 (final + preliminary)
Unit One row per notification record
Format .dbc files, decompressed internally

Getting started

Check available years

sinan_years()
#> [1] 2007 2008 2009 ... 2022

sinan_years(status = "all")
#> [1] 2007 2008 ... 2022 2023 2024

Module information

Exploring diseases

SINAN covers 31 notifiable diseases. Use sinan_diseases() to browse them:

# all available diseases
sinan_diseases()

# search by name or code
sinan_diseases(search = "dengue")
sinan_diseases(search = "sifilis")
sinan_diseases(search = "tuberculose")

Common disease codes:

Code Disease
DENG Dengue
CHIK Chikungunya
ZIKA Zika
TUBE Tuberculose
HANS Hanseniase
HEPA Hepatites virais
SIFA Sifilis adquirida
SIFC Sifilis congenita
LEPT Leptospirose
MENI Meningite

Downloading data

Basic download (dengue, single year)

dengue_2022 <- sinan_data(year = 2022)
dengue_2022

Multiple years

tb <- sinan_data(year = 2020:2022, disease = "TUBE")
tb

Selecting variables

# only key variables (faster and less memory)
dengue_key <- sinan_data(
  year = 2022,
  disease = "DENG",
  vars = c("DT_NOTIFIC", "CS_SEXO", "NU_IDADE_N",
           "CS_RACA", "ID_MUNICIP", "CLASSI_FIN")
)

Exploring variables

sinan_variables()
sinan_variables(search = "sexo")
sinan_variables(search = "municipio")

Filtering by state

SINAN files are national (not per-state). To filter by geographic unit, use the SG_UF_NOT (UF of notification) or ID_MUNICIP (municipality code) columns after download:

# filter by UF
dengue_sp <- sinan_data(year = 2022) |>
  filter(SG_UF_NOT == "35")  # 35 = Sao Paulo

# filter by municipality
dengue_rj_capital <- sinan_data(year = 2022) |>
  filter(ID_MUNICIP == "330455")  # Rio de Janeiro capital

Key variables

Variable Description
DT_NOTIFIC Notification date
ID_AGRAVO Disease code (CID-10)
SG_UF_NOT UF of notification (IBGE code)
ID_MUNICIP Municipality of notification (IBGE 6 digits)
CS_SEXO Sex (M/F/I)
NU_IDADE_N Age (encoded: 1st digit = unit, digits 2-3 = value)
CS_RACA Race/color (1=White, 2=Black, 3=Yellow, 4=Brown, 5=Indigenous)
CLASSI_FIN Final classification (1=Confirmed, 2=Discarded)
EVOLUCAO Outcome (1=Cured, 2=Death by disease, 3=Death other causes)
CRITERIO Confirmation criteria (1=Lab, 2=Clinical-epi)

Using the dictionary

# all coded variables
sinan_dictionary()

# specific variable
sinan_dictionary("CS_SEXO")
sinan_dictionary("EVOLUCAO")
sinan_dictionary("CLASSI_FIN")

Preliminary vs. final data

SINAN publishes both final (definitive) and preliminary data. By default, sinan_years() returns only final years:

# final data only (default)
sinan_years(status = "final")

# preliminary data
sinan_years(status = "preliminary")

# both
sinan_years(status = "all")

Preliminary data (2023–2024) may still be revised by the Ministry of Health.

Example: confirmed dengue cases by month

dengue <- sinan_data(year = 2022, disease = "DENG") |>
  filter(CLASSI_FIN %in% c("1", "5")) |>  # confirmed cases

  mutate(month = as.integer(format(DT_NOTIFIC, "%m")))

cases_by_month <- dengue |>
  count(month) |>
  arrange(month)

cases_by_month

Example: tuberculosis by sex and age group

tb <- sinan_data(year = 2022, disease = "TUBE")

# decode age: 4th digit means years
tb_age <- tb |>
  filter(CLASSI_FIN == "1") |>
  mutate(
    age_unit = substr(NU_IDADE_N, 1, 1),
    age_value = as.integer(substr(NU_IDADE_N, 2, 3)),
    age_years = ifelse(age_unit == "4", age_value, NA_integer_),
    age_group = cut(age_years,
                    breaks = c(0, 15, 30, 45, 60, Inf),
                    labels = c("<15", "15-29", "30-44", "45-59", "60+"),
                    right = FALSE)
  )

tb_age |>
  filter(!is.na(age_group)) |>
  count(CS_SEXO, age_group) |>
  tidyr::pivot_wider(names_from = CS_SEXO, values_from = n)

Example: incidence rate with Census denominators

Combine SINAN data with Census population to calculate incidence rates:

# step 1: confirmed dengue by UF
dengue_uf <- sinan_data(year = 2022, disease = "DENG") |>
  filter(CLASSI_FIN %in% c("1", "5")) |>
  count(SG_UF_NOT, name = "cases")

# step 2: population from Census 2022
pop <- censo_populacao(year = 2022, territorial_level = "state")

# step 3: calculate incidence rate per 100,000
# incidence <- dengue_uf |>
#   left_join(pop, by = ...) |>
#   mutate(rate_100k = (cases / population) * 100000) |>
#   arrange(desc(rate_100k))

Smart type parsing

By default, sinan_data() parses columns to appropriate types (dates, integers):

# parsed types (default)
dengue <- sinan_data(year = 2022, disease = "DENG")
class(dengue$DT_NOTIFIC)  # Date
class(dengue$NU_ANO)      # integer

# raw character columns (backward-compatible)
dengue_raw <- sinan_data(year = 2022, disease = "DENG", parse = FALSE)

# override specific columns
dengue_custom <- sinan_data(
  year = 2022,
  col_types = list(DT_NOTIFIC = "character")
)

Cache management

Downloaded data is cached locally for faster future access:

# check cache status
sinan_cache_status()

# clear cache if needed
sinan_clear_cache()

If the arrow package is installed, data is cached in Parquet format for faster loading. You can also use lazy evaluation:

# lazy query (requires arrow)
dengue_lazy <- sinan_data(year = 2022, disease = "DENG", lazy = TRUE)
dengue_lazy |>
  filter(CLASSI_FIN == "1") |>
  select(DT_NOTIFIC, CS_SEXO, NU_IDADE_N, ID_MUNICIP) |>
  collect()

Additional resources