Downloads and returns PNADC microdata for the specified module and year(s)
from the IBGE FTP. Data is cached locally to avoid repeated downloads.
When the arrow package is installed, data is cached in parquet format
for faster subsequent reads.
Usage
pnadc_data(
module,
year = NULL,
vars = NULL,
as_survey = FALSE,
cache_dir = NULL,
refresh = FALSE,
lazy = FALSE,
backend = c("arrow", "duckdb")
)Arguments
- module
Character. The module identifier. Use
pnadc_modulesto see available modules. Required.- year
Numeric or vector. Year(s) to download. Use NULL for all available years for the module. Default is NULL.
- vars
Character vector. Variables to select. Use NULL for all variables. Survey design variables (UPA, Estrato, V1028) and key demographic variables are always included. Default is NULL.
- as_survey
Logical. If TRUE, returns a survey design object (requires the
srvyrpackage). Default is FALSE.- cache_dir
Character. Directory for caching downloaded files. Default uses
tools::R_user_dir("healthbR", "cache").- refresh
Logical. If TRUE, re-download even if file exists in cache. Default is FALSE.
- lazy
Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call
dplyr::collect()to materialize the result. Default: FALSE.- backend
Character. Backend for lazy evaluation:
"arrow"(default) or"duckdb". Only used whenlazy = TRUE. DuckDB backend requires the duckdb package.
Details
PNAD Continua (Pesquisa Nacional por Amostra de Domicilios Continua) is a quarterly household survey conducted by IBGE. This function provides access to supplementary modules with health-related content.
Data source
Data is downloaded from the IBGE FTP server:
https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/
Examples
if (FALSE) { # interactive()
# download deficiencia module for 2022
df <- pnadc_data(module = "deficiencia", year = 2022, cache_dir = tempdir())
# download with survey design
svy <- pnadc_data(
module = "deficiencia",
year = 2022,
as_survey = TRUE,
cache_dir = tempdir()
)
# select specific variables
df_subset <- pnadc_data(
module = "deficiencia",
year = 2022,
vars = c("S11001", "S11002"),
cache_dir = tempdir()
)
}