Skip to contents

Overview

The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) is Brazil’s national immunization information system, managed by the Ministry of Health. It tracks vaccination doses applied and coverage rates across the country.

The healthbR package provides access to SI-PNI data from two sources:

Source Years Data type Granularity Format
FTP DATASUS 1994–2019 Aggregated counts Annual per UF .DBF files
OpenDataSUS CSV 2020–2025 Individual-level microdata Monthly national CSV bulk downloads

sipni_data() automatically routes to the correct source based on the requested year.

Data sources comparison

Feature FTP (1994–2019) CSV (2020–2025)
Record type Aggregated (dose counts per municipality/vaccine/age) Individual (one row per vaccination dose)
File types DPNI (doses) or CPNI (coverage) Single type (microdata)
Variables 7–12 per type ~47 per record
File size Small (~100 KB per UF/year) Large (~1.4 GB ZIP per month, national)
Naming UPPERCASE column names snake_case column names

Getting started

Check available years

sipni_years()
#> [1] 1994 1995 ... 2024 2025

Module information

FTP path: doses applied (DPNI)

The default type downloads aggregated dose counts (1994–2019):

# doses applied in Acre, 2019
ac_doses <- sipni_data(year = 2019, uf = "AC")
ac_doses

Key variables (DPNI)

Variable Description
ANO Reference year
UF UF code (IBGE 2 digits)
MUNIC Municipality code (IBGE 6 digits)
IMUNO Immunobiological code
DOSE Dose type (1st, 2nd, booster, etc.)
QT_DOSE Number of doses applied
FX_ETARIA Age group (coded)

Using the dictionary

# vaccine codes
sipni_dictionary("IMUNO")

# dose types
sipni_dictionary("DOSE")

# age groups
sipni_dictionary("FX_ETARIA")

FTP path: vaccination coverage (CPNI)

The CPNI type provides coverage rates per municipality:

# vaccination coverage in Acre, 2019
ac_coverage <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
ac_coverage

Key variables (CPNI)

Variable Description
ANO Reference year
UF UF code (IBGE 2 digits)
MUNIC Municipality code (IBGE 6 digits)
IMUNO Immunobiological code
QT_DOSE Number of doses applied
POP Target population
COBERT Vaccination coverage (%)

CSV path: individual-level microdata (2020+)

For years 2020 and later, SI-PNI provides individual-level microdata (one row per vaccination dose). The type parameter is ignored for these years:

# microdata for Acre, January 2024
ac_micro <- sipni_data(year = 2024, uf = "AC", month = 1)
ac_micro

Key variables (CSV microdata)

Variable Description
sigla_uf_estabelecimento UF of the health facility
codigo_municipio_estabelecimento Municipality (IBGE)
tipo_sexo_paciente Sex (M/F)
numero_idade_paciente Patient age
nome_raca_cor_paciente Race/color (descriptive)
descricao_vacina Vaccine name
descricao_dose_vacina Dose description
data_vacina Vaccination date

Exploring variables

# DPNI variables (FTP)
sipni_variables()

# CPNI variables (FTP)
sipni_variables(type = "CPNI")

# API/CSV variables (2020+)
sipni_variables(type = "API")

# search
sipni_variables(search = "dose")

Month parameter for CSV data

For years >= 2020, each month is a separate ~1.4 GB national CSV file. Use month to select specific months:

# single month
jan <- sipni_data(year = 2024, uf = "AC", month = 1)

# first quarter
q1 <- sipni_data(year = 2024, uf = "AC", month = 1:3)

# all 12 months (default, downloads ~17 GB total)
full_year <- sipni_data(year = 2024, uf = "AC")

For FTP data (1994–2019), the month parameter is ignored because FTP files are annual.

Example: vaccine doses by immunobiological (FTP)

ac_2019 <- sipni_data(year = 2019, uf = "AC")

# decode immunobiological names
imuno_labels <- sipni_dictionary("IMUNO") |>
  select(code, label)

doses_by_vaccine <- ac_2019 |>
  group_by(IMUNO) |>
  summarize(total_doses = sum(as.integer(QT_DOSE), na.rm = TRUE),
            .groups = "drop") |>
  left_join(imuno_labels, by = c("IMUNO" = "code")) |>
  arrange(desc(total_doses))

doses_by_vaccine
# coverage data for Sao Paulo, 2015-2019
sp_cov <- sipni_data(
  year = 2015:2019,
  type = "CPNI",
  uf = "SP"
)

# average coverage by year
sp_cov |>
  group_by(year) |>
  summarize(
    mean_coverage = mean(as.numeric(COBERT), na.rm = TRUE),
    .groups = "drop"
  )

Example: individual-level analysis (2020+)

# COVID-19 vaccinations in Acre, January 2024
ac_jan <- sipni_data(year = 2024, uf = "AC", month = 1)

# vaccines administered
ac_jan |>
  count(descricao_vacina, sort = TRUE)

# doses by sex
ac_jan |>
  count(tipo_sexo_paciente)

# age distribution
ac_jan |>
  mutate(age = as.integer(numero_idade_paciente)) |>
  filter(!is.na(age)) |>
  mutate(age_group = cut(age,
                         breaks = c(0, 5, 12, 18, 30, 60, Inf),
                         right = FALSE)) |>
  count(age_group)

Mixed year requests

When requesting years that span both sources (e.g., 2019 and 2024), sipni_data() fetches from FTP and CSV respectively and combines the results. Note that column names and structure differ between sources:

# this downloads FTP (2019) + CSV (2024)
mixed <- sipni_data(year = c(2019, 2024), uf = "AC", month = 1)

# columns from FTP (UPPERCASE) and CSV (snake_case) are combined
# with NAs where columns don't overlap
names(mixed)

Download tips

  • FTP files (1994–2019) are small (~100 KB each) and download quickly.
  • CSV files (2020+) are large (~1.4 GB per month, national). Start with a single month and UF.
  • The first download of a CSV month caches all 27 UFs. A second request for a different UF from the same month is instant from cache.
  • Multiple months are downloaded concurrently when possible.

Smart type parsing

# parsed types (default)
ac <- sipni_data(year = 2019, uf = "AC")
class(ac$QT_DOSE)  # integer

# raw character columns
ac_raw <- sipni_data(year = 2019, uf = "AC", parse = FALSE)

Cache management

Downloaded data is cached locally for faster future access:

# check cache status
sipni_cache_status()

# clear cache if needed
sipni_clear_cache()

If the arrow package is installed, data is cached in Parquet format. You can also use lazy evaluation:

# lazy query for FTP data (requires arrow)
sipni_lazy <- sipni_data(year = 2019, uf = "AC", lazy = TRUE)
sipni_lazy |>
  filter(QT_DOSE > 0) |>
  select(IMUNO, DOSE, QT_DOSE) |>
  collect()

Additional resources

  • OpenDataSUS (dadosabertos.saude.gov.br)
  • Census vignette for population denominators