Downloads and returns outpatient production microdata from DATASUS FTP. Each row represents one outpatient production record. Data is organized monthly – one .dbc file per type, state (UF), and month.
Usage
sia_data(
year,
type = "PA",
month = NULL,
vars = NULL,
uf = NULL,
procedure = NULL,
diagnosis = NULL,
parse = TRUE,
col_types = NULL,
cache = TRUE,
cache_dir = NULL,
lazy = FALSE,
backend = c("arrow", "duckdb")
)Arguments
- year
Integer. Year(s) of the data. Required.
- type
Character. File type to download. Default:
"PA"(outpatient production). Seesia_info()for all 13 types.- month
Integer. Month(s) of the data (1-12). If NULL (default), downloads all 12 months. Example:
1(January),1:6(first semester).- vars
Character vector. Variables to keep. If NULL (default), returns all available variables. Use
sia_variables()to see available variables.- uf
Character. Two-letter state abbreviation(s) to download. If NULL (default), downloads all 27 states. Example:
"SP",c("SP", "RJ").- procedure
Character. SIGTAP procedure code pattern(s) to filter by (
PA_PROC_ID). Supports partial matching (prefix). If NULL (default), returns all procedures. Example:"0301"(consultations).- diagnosis
Character. CID-10 code pattern(s) to filter by principal diagnosis (
PA_CIDPRI). Supports partial matching (prefix). If NULL (default), returns all diagnoses. Example:"J"(respiratory diseases).- parse
Logical. If TRUE (default), converts columns to appropriate types (integer, double, Date) based on the variable metadata. Use
sia_variables()to see the target type for each variable. Set to FALSE for backward-compatible all-character output.- col_types
Named list. Override the default type for specific columns. Names are column names, values are type strings:
"character","integer","double","date_dmy","date_ymd","date_ym","date". Example:list(PA_VALAPR = "character")to keep PA_VALAPR as character.- cache
Logical. If TRUE (default), caches downloaded data for faster future access.
- cache_dir
Character. Directory for caching. Default:
tools::R_user_dir("healthbR", "cache").- lazy
Logical. If TRUE, returns a lazy query object instead of a tibble. Requires the arrow package. The lazy object supports dplyr verbs (filter, select, mutate, etc.) which are pushed down to the query engine before collecting into memory. Call
dplyr::collect()to materialize the result. Default: FALSE.- backend
Character. Backend for lazy evaluation:
"arrow"(default) or"duckdb". Only used whenlazy = TRUE. DuckDB backend requires the duckdb package.
Value
A tibble with outpatient production microdata. Includes columns
year, month, and uf_source to identify the source
when multiple years/months/states are combined.
Details
Data is downloaded from DATASUS FTP as .dbc files (one per type/state/month). The .dbc format is decompressed internally using vendored C code from the blast library. No external dependencies are required.
SIA data is monthly, so downloading an entire year for all states requires
324 files (27 UFs x 12 months) per type. Use uf and month
to limit downloads.
The SIA has 13 file types. The default "PA" (outpatient production)
is the most commonly used. Use sia_info() to see all types.
See also
sia_info() for file type descriptions,
censo_populacao() for population denominators.
Other sia:
sia_cache_status(),
sia_clear_cache(),
sia_dictionary(),
sia_info(),
sia_variables(),
sia_years()
Examples
if (FALSE) { # interactive()
# all outpatient production in Acre, January 2022
ac_jan <- sia_data(year = 2022, month = 1, uf = "AC")
# filter by procedure code
consult <- sia_data(year = 2022, month = 1, uf = "AC",
procedure = "0301")
# filter by diagnosis (CID-10)
resp <- sia_data(year = 2022, month = 1, uf = "AC",
diagnosis = "J")
# only key variables
sia_data(year = 2022, month = 1, uf = "AC",
vars = c("PA_PROC_ID", "PA_CIDPRI", "PA_SEXO",
"PA_IDADE", "PA_VALAPR"))
# different file type (APAC Medicamentos)
med <- sia_data(year = 2022, month = 1, uf = "AC", type = "AM")
}