Skip to contents

Downloads and returns PNADC microdata for the specified module and year(s) from the IBGE FTP. Data is cached locally to avoid repeated downloads. When the arrow package is installed, data is cached in parquet format for faster subsequent reads.

Usage

pnadc_data(
  module,
  year = NULL,
  vars = NULL,
  as_survey = FALSE,
  cache_dir = NULL,
  refresh = FALSE,
  lazy = FALSE,
  backend = c("arrow", "duckdb")
)

Arguments

module

Character. The module identifier. Use pnadc_modules to see available modules. Required.

year

Numeric or vector. Year(s) to download. Use NULL for all available years for the module. Default is NULL.

vars

Character vector. Variables to select. Use NULL for all variables. Survey design variables (UPA, Estrato, V1028) and key demographic variables are always included. Default is NULL.

as_survey

Logical. If TRUE, returns a survey design object (requires the srvyr package). Default is FALSE.

cache_dir

Character. Directory for caching downloaded files. Default uses tools::R_user_dir("healthbR", "cache").

refresh

Logical. If TRUE, re-download even if file exists in cache. Default is FALSE.

lazy

Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call dplyr::collect() to materialize the result. Default: FALSE.

backend

Character. Backend for lazy evaluation: "arrow" (default) or "duckdb". Only used when lazy = TRUE. DuckDB backend requires the duckdb package.

Value

A tibble with PNADC microdata, or a srvyr survey design object if as_survey = TRUE.

Details

PNAD Continua (Pesquisa Nacional por Amostra de Domicilios Continua) is a quarterly household survey conducted by IBGE. This function provides access to supplementary modules with health-related content.

Available modules

  • deficiencia: Persons with disabilities (2019, 2022, 2024)

  • habitacao: Housing characteristics (2012-2019, 2022-2024)

  • moradores: General characteristics of residents (2012-2019, 2022-2024)

  • aps: Primary health care (2022)

Survey design variables

For proper statistical analysis with complex survey design, the following variables are always included:

  • UPA: Primary sampling unit

  • Estrato: Stratum

  • V1028: Survey weight

Use as_survey = TRUE to get a properly weighted survey design object for analysis with the srvyr package.

Data source

Data is downloaded from the IBGE FTP server: https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/

Examples

if (FALSE) { # interactive()
# download deficiencia module for 2022
df <- pnadc_data(module = "deficiencia", year = 2022, cache_dir = tempdir())

# download with survey design
svy <- pnadc_data(
  module = "deficiencia",
  year = 2022,
  as_survey = TRUE,
  cache_dir = tempdir()
)

# select specific variables
df_subset <- pnadc_data(
  module = "deficiencia",
  year = 2022,
  vars = c("S11001", "S11002"),
  cache_dir = tempdir()
)
}