Chronic Disease Risk Factors from VIGITEL with healthbR
Source:vignettes/vigitel-telephone-survey.Rmd
vigitel-telephone-survey.RmdOverview
VIGITEL (Vigilancia de Fatores de Risco e Protecao para Doencas Cronicas por Inquerito Telefonico) is an annual telephone survey conducted by the Brazilian Ministry of Health since 2006. It monitors risk and protective factors for chronic non-communicable diseases among adults (18+) in all 26 state capitals and the Federal District.
| Topic | Examples |
|---|---|
| Tobacco | Smoking prevalence, cessation |
| Alcohol | Consumption patterns, binge drinking |
| Diet | Fruit/vegetable intake, ultra-processed foods |
| Physical activity | Leisure, commuting, sedentary behavior |
| Chronic diseases | Diabetes, hypertension, obesity self-report |
| Preventive exams | Mammography, Pap smear, colonoscopy |
Each annual edition interviews approximately 54,000 adults via
landline telephone, with post-stratification weighting
(pesorake) to match the adult population of each city.
Getting started
Check available years
vigitel_years()
#> [1] 2006 2007 2008 ... 2023 2024Downloading data
All years at once
VIGITEL is distributed as a single consolidated file covering 2006–2024. By default, all years are downloaded:
df <- vigitel_data()Specific years
df <- vigitel_data(year = 2020:2024)Select variables
df <- vigitel_data(year = 2024, vars = c("cidade", "sexo", "idade", "pesorake",
"q6", "q7", "q9"))Data format
Two formats are available: Stata (.dta, default) and
CSV. The Stata format preserves variable labels:
df_dta <- vigitel_data(format = "dta") # default, with labels
df_csv <- vigitel_data(format = "csv") # alternativeExample: Smoking prevalence over time
# Download smoking-related variables
df <- vigitel_data(
year = 2006:2024,
vars = c("ano", "cidade", "sexo", "pesorake", "q6")
)
# q6: "Atualmente, o(a) sr(a) fuma?" (1 = sim, 2 = nao)
smoking <- df |>
filter(q6 %in% c("1", "2")) |>
group_by(ano) |>
summarise(
smokers = sum(pesorake[q6 == "1"], na.rm = TRUE),
total = sum(pesorake, na.rm = TRUE),
prevalence = smokers / total * 100
)Example: Obesity by capital city
df <- vigitel_data(
year = 2024,
vars = c("cidade", "sexo", "pesorake", "q8", "q9")
)
# q8 = weight (kg), q9 = height (cm)
# BMI >= 30 = obesity
obesity <- df |>
filter(!is.na(q8), !is.na(q9), q9 > 0) |>
mutate(
bmi = as.numeric(q8) / (as.numeric(q9) / 100)^2,
obese = bmi >= 30
) |>
group_by(cidade) |>
summarise(
prevalence = weighted.mean(obese, as.numeric(pesorake), na.rm = TRUE) * 100
) |>
arrange(desc(prevalence))Cache and performance
Data is automatically cached in partitioned parquet format (when
arrow is installed). Subsequent calls load instantly from
cache:
# First call downloads (~30 seconds)
df <- vigitel_data(year = 2024)
# Second call loads from cache (instant)
df <- vigitel_data(year = 2024)
# Check cache status
vigitel_cache_status()
# Clear cache if needed
vigitel_clear_cache()Lazy evaluation
For large analyses, use lazy evaluation to query without loading all data into memory:
lazy_df <- vigitel_data(lazy = TRUE, backend = "arrow")