Overview
The SI-PNI (Sistema de Informacao do Programa Nacional de Imunizacoes) is Brazil’s national immunization information system, managed by the Ministry of Health. It tracks vaccination doses applied and coverage rates across the country.
The healthbR package provides access to SI-PNI data from
two sources:
| Source | Years | Data type | Granularity | Format |
|---|---|---|---|---|
| FTP DATASUS | 1994–2019 | Aggregated counts | Annual per UF | .DBF files |
| OpenDataSUS CSV | 2020–2025 | Individual-level microdata | Monthly national | CSV bulk downloads |
sipni_data() automatically routes to the correct source
based on the requested year.
Data sources comparison
| Feature | FTP (1994–2019) | CSV (2020–2025) |
|---|---|---|
| Record type | Aggregated (dose counts per municipality/vaccine/age) | Individual (one row per vaccination dose) |
| File types | DPNI (doses) or CPNI (coverage) | Single type (microdata) |
| Variables | 7–12 per type | ~47 per record |
| File size | Small (~100 KB per UF/year) | Large (~1.4 GB ZIP per month, national) |
| Naming | UPPERCASE column names | snake_case column names |
Getting started
Check available years
sipni_years()
#> [1] 1994 1995 ... 2024 2025FTP path: doses applied (DPNI)
The default type downloads aggregated dose counts (1994–2019):
# doses applied in Acre, 2019
ac_doses <- sipni_data(year = 2019, uf = "AC")
ac_dosesKey variables (DPNI)
| Variable | Description |
|---|---|
| ANO | Reference year |
| UF | UF code (IBGE 2 digits) |
| MUNIC | Municipality code (IBGE 6 digits) |
| IMUNO | Immunobiological code |
| DOSE | Dose type (1st, 2nd, booster, etc.) |
| QT_DOSE | Number of doses applied |
| FX_ETARIA | Age group (coded) |
Using the dictionary
# vaccine codes
sipni_dictionary("IMUNO")
# dose types
sipni_dictionary("DOSE")
# age groups
sipni_dictionary("FX_ETARIA")FTP path: vaccination coverage (CPNI)
The CPNI type provides coverage rates per municipality:
# vaccination coverage in Acre, 2019
ac_coverage <- sipni_data(year = 2019, type = "CPNI", uf = "AC")
ac_coverageCSV path: individual-level microdata (2020+)
For years 2020 and later, SI-PNI provides individual-level microdata
(one row per vaccination dose). The type parameter is
ignored for these years:
# microdata for Acre, January 2024
ac_micro <- sipni_data(year = 2024, uf = "AC", month = 1)
ac_microKey variables (CSV microdata)
| Variable | Description |
|---|---|
| sigla_uf_estabelecimento | UF of the health facility |
| codigo_municipio_estabelecimento | Municipality (IBGE) |
| tipo_sexo_paciente | Sex (M/F) |
| numero_idade_paciente | Patient age |
| nome_raca_cor_paciente | Race/color (descriptive) |
| descricao_vacina | Vaccine name |
| descricao_dose_vacina | Dose description |
| data_vacina | Vaccination date |
Exploring variables
# DPNI variables (FTP)
sipni_variables()
# CPNI variables (FTP)
sipni_variables(type = "CPNI")
# API/CSV variables (2020+)
sipni_variables(type = "API")
# search
sipni_variables(search = "dose")Month parameter for CSV data
For years >= 2020, each month is a separate ~1.4 GB national CSV
file. Use month to select specific months:
# single month
jan <- sipni_data(year = 2024, uf = "AC", month = 1)
# first quarter
q1 <- sipni_data(year = 2024, uf = "AC", month = 1:3)
# all 12 months (default, downloads ~17 GB total)
full_year <- sipni_data(year = 2024, uf = "AC")For FTP data (1994–2019), the month parameter is ignored
because FTP files are annual.
Example: vaccine doses by immunobiological (FTP)
ac_2019 <- sipni_data(year = 2019, uf = "AC")
# decode immunobiological names
imuno_labels <- sipni_dictionary("IMUNO") |>
select(code, label)
doses_by_vaccine <- ac_2019 |>
group_by(IMUNO) |>
summarize(total_doses = sum(as.integer(QT_DOSE), na.rm = TRUE),
.groups = "drop") |>
left_join(imuno_labels, by = c("IMUNO" = "code")) |>
arrange(desc(total_doses))
doses_by_vaccineExample: coverage trends over time
# coverage data for Sao Paulo, 2015-2019
sp_cov <- sipni_data(
year = 2015:2019,
type = "CPNI",
uf = "SP"
)
# average coverage by year
sp_cov |>
group_by(year) |>
summarize(
mean_coverage = mean(as.numeric(COBERT), na.rm = TRUE),
.groups = "drop"
)Example: individual-level analysis (2020+)
# COVID-19 vaccinations in Acre, January 2024
ac_jan <- sipni_data(year = 2024, uf = "AC", month = 1)
# vaccines administered
ac_jan |>
count(descricao_vacina, sort = TRUE)
# doses by sex
ac_jan |>
count(tipo_sexo_paciente)
# age distribution
ac_jan |>
mutate(age = as.integer(numero_idade_paciente)) |>
filter(!is.na(age)) |>
mutate(age_group = cut(age,
breaks = c(0, 5, 12, 18, 30, 60, Inf),
right = FALSE)) |>
count(age_group)Mixed year requests
When requesting years that span both sources (e.g., 2019 and 2024),
sipni_data() fetches from FTP and CSV respectively and
combines the results. Note that column names and structure differ
between sources:
# this downloads FTP (2019) + CSV (2024)
mixed <- sipni_data(year = c(2019, 2024), uf = "AC", month = 1)
# columns from FTP (UPPERCASE) and CSV (snake_case) are combined
# with NAs where columns don't overlap
names(mixed)Download tips
- FTP files (1994–2019) are small (~100 KB each) and download quickly.
- CSV files (2020+) are large (~1.4 GB per month, national). Start with a single month and UF.
- The first download of a CSV month caches all 27 UFs. A second request for a different UF from the same month is instant from cache.
- Multiple months are downloaded concurrently when possible.
Smart type parsing
# parsed types (default)
ac <- sipni_data(year = 2019, uf = "AC")
class(ac$QT_DOSE) # integer
# raw character columns
ac_raw <- sipni_data(year = 2019, uf = "AC", parse = FALSE)Cache management
Downloaded data is cached locally for faster future access:
# check cache status
sipni_cache_status()
# clear cache if needed
sipni_clear_cache()If the arrow package is installed, data is cached in
Parquet format. You can also use lazy evaluation:
# lazy query for FTP data (requires arrow)
sipni_lazy <- sipni_data(year = 2019, uf = "AC", lazy = TRUE)
sipni_lazy |>
filter(QT_DOSE > 0) |>
select(IMUNO, DOSE, QT_DOSE) |>
collect()Additional resources
- OpenDataSUS (
dadosabertos.saude.gov.br) - Census vignette for population denominators