| Title: | Analyze Data from the Truth Commission in Colombia |
|---|---|
| Description: | Facilitates use and analysis of data about the armed conflict in Colombia resulting from the joint project between La Jurisdicción Especial para la Paz (JEP), La Comisión para el Esclarecimiento de la Verdad, la Convivencia y la No repetición (CEV), and the Human Rights Data Analysis Group (HRDAG). The data are 100 replicates from a multiple imputation through chained equations as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. With the replicates the user can examine four human rights violations that occurred in the Colombian conflict accounting for the impact of missing fields and fully missing observations. |
| Authors: | Maria Gargiulo [aut, cre], María Juliana Durán [aut], Paula Andrea Amado [aut], Patrick Ball [rev] |
| Maintainer: | Maria Gargiulo <[email protected]> |
| License: | GPL-2 |
| Version: | 1.0.2 |
| Built: | 2026-06-03 08:48:41 UTC |
| Source: | https://github.com/hrdag/verdata |
Combine MSE estimation results for a given stratum calculated using multiple replicate files created using multiple imputation. Combination is done using the standard approach that makes use of the laws of total expectation and total variance.
combine_estimates(stratum_estimates)combine_estimates(stratum_estimates)
stratum_estimates |
A data frame of estimates for a stratum of interest
calculated using |
A data frame row with the point estimate (N_mean) and the
associated 95% uncertainty interval (lower bound is N_025, upper bound is
N_975).
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013). Bayesian Data Analysis, 0 edition. Chapman and Hall/CRC. ISBN 978-0-429-11307-9. doi:10.1201/b16018.
set.seed(19481210) library(dplyr) library(purrr) library(glue) simulate_estimates <- function(stratum_data, replicate_num) { # simulate an imputed stratification variable to determine whether a record # should be considered part of the stratum for estimation stratification_var <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.1, 0.9)) my_stratum <- bind_cols(my_stratum, tibble::tibble(stratification_var)) %>% filter(stratification_var == 1) results <- mse(my_stratum, "my_stratum", K = 4) %>% mutate(replicate = replicate_num) return(results) } in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) my_stratum <- tibble::tibble(in_A, in_B, in_C) replicate_nums <- glue("R{1:20}") estimates <- map_dfr(.x = replicate_nums, .f = ~simulate_estimates(stratum_data = my_stratum, replicate_num = .x)) combine_estimates(estimates)set.seed(19481210) library(dplyr) library(purrr) library(glue) simulate_estimates <- function(stratum_data, replicate_num) { # simulate an imputed stratification variable to determine whether a record # should be considered part of the stratum for estimation stratification_var <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.1, 0.9)) my_stratum <- bind_cols(my_stratum, tibble::tibble(stratification_var)) %>% filter(stratification_var == 1) results <- mse(my_stratum, "my_stratum", K = 4) %>% mutate(replicate = replicate_num) return(results) } in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) my_stratum <- tibble::tibble(in_A, in_B, in_C) replicate_nums <- glue("R{1:20}") estimates <- map_dfr(.x = replicate_nums, .f = ~simulate_estimates(stratum_data = my_stratum, replicate_num = .x)) combine_estimates(estimates)
Combine imputed replicates according to calculate totals. Combination is done using the standard approach that makes use of the laws of total expectation and total variance.
combine_replicates( violation, replicates_obs_data, replicates_data, strata_vars = NULL, conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2 )combine_replicates( violation, replicates_obs_data, replicates_data, strata_vars = NULL, conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2 )
violation |
Violation to be analyzed. Options are "homicidio", "secuestro", "reclutamiento" and "desaparicion". |
replicates_obs_data |
The data frame that results from applying |
replicates_data |
A data frame containing replicates data. |
strata_vars |
Variable with all observations (without missing values). |
conflict_filter |
Filter that indicates if the data is filtered using the "is_conflict" rule. |
forced_dis_filter |
Filter that indicates if the data is filtered using the "is_forced_dis" rule. |
edad_minors_filter |
Optional filter by age ( |
include_props |
A logical value indicating whether or not to include the proportions from the calculations before merging with summary_observed's output. |
digits |
Number of decimal places to round the results to. Default value is 2. |
A data frame with 5 or more columns: name of variable(s), observed
the number of observations in each category for every variable, imp_lo the
lower bound of the 95% confidence interval, imp_hi the upper bound of the
95% confidence interval, and imp_mean the point estimate of the mean value.
local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") replicates_obs_data <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2) tab_combine <- combine_replicates("reclutamiento", replicates_obs_data, replicates_data, strata_vars = 'sexo', conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2)local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") replicates_obs_data <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2) tab_combine <- combine_replicates("reclutamiento", replicates_obs_data, replicates_data, strata_vars = 'sexo', conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2)
Confirm files are identical to the ones published.
confirm_files(replicates_dir, violation, replicate_nums, version)confirm_files(replicates_dir, violation, replicate_nums, version)
replicates_dir |
Directory containing the replicates. The name of the files must include the violation in Spanish and lower case letters (homicidio, secuestro, reclutamiento, desaparicion). |
violation |
Violation being analyzed. Options are "homicidio", "secuestro", "reclutamiento", and "desaparicion". |
replicate_nums |
A numeric vector containing the replicates to be analyzed. Values in the vector should be between 1 and 100 inclusive. |
version |
Version of the data being read in. Options are "v1" or "v2". "v1" is appropriate for replicating the replicating the results of the joint JEP-CEV-HRDAG project. "v2" is appropriate for conducting your new analyses of the conflict in Colombia. |
A data frame row with replicate_num rows and two columns:
replicate_path, a string indicating the path to the replicate checked and
confirmed, a boolean values indicating whether the replicate contents match
the published version.
local_dir <- system.file("extdata", "right", package = "verdata") confirm_files(local_dir, "reclutamiento", c(1, 2), version = "v1")local_dir <- system.file("extdata", "right", package = "verdata") confirm_files(local_dir, "reclutamiento", c(1, 2), version = "v1")
Diccionario de datos para las variables que aparecen en los archivos de las réplicas.
data(diccionario_replicas)data(diccionario_replicas)
Un data frame con 55 filas y 4 variables.
nombre de la variable
tipo de la variable: caracter, numérico, lógico
explicación detallada de la variable
valores posibles de la variable
Proyecto conjunto JEP-CEV-HRDAG.
Variables adicionales que pueden ser útiles para analizar los datos.
data(diccionario_vars_adicional)data(diccionario_vars_adicional)
Un data frame con 11 filas y 4 variables.
nombre de la variable
tipo de la variable: caracter, numérico, lógico
explicación detallada de la variable
valores posibles de la variable
Proyecto conjunto JEP-CEV-HRDAG.
Check whether stratum estimates already exist in pre-calculated files.
estimates_exist(stratum_data_prepped, estimates_dir)estimates_exist(stratum_data_prepped, estimates_dir)
stratum_data_prepped |
A data frame including all records in a stratum of
interest. The data frame should only include the source columns prefixed with
|
estimates_dir |
Directory containing pre-calculated estimates, if you would like to use pre-calculated results. |
A list with two entries, estimates_exist and estimates_path.
estimates_exist is a logical value indicating whether calculations for the
stratum of interest are available in the directory containing the pre-calculated
estimates. If estimates_exist is TRUE, estimates_path will contain the
full file path to the JSON file containing the estimates, otherwise it will
be NA.
in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) estimates_exist(stratum_data_prepped = my_stratum, estimates_dir = "path_to_estimates")in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) estimates_exist(stratum_data_prepped = my_stratum, estimates_dir = "path_to_estimates")
Datos que documentan las estratificaciones necesarias para replicar los resultados del informe metodológico del proyecto conjunto CEV-HRDAG-JEP (versión en español).
data(estratificacion)data(estratificacion)
Un data frame con 31 filas y 4 variables.
el hecho de violencia al analizar
el tipo de análisis que utiliza la estratificación (p.ej., patrones de violencia por año, sexo, etc.)
las variables utilizadas para estratificar las estimaciones
notas adicionales sobre la estratificación; NA si no hay notas
Proyecto conjunto JEP-CEV-HRDAG.
Filter records to replicate results presented in the CEV methodology report.
filter_standard_cev(replicates_data, violation, perp_change = TRUE)filter_standard_cev(replicates_data, violation, perp_change = TRUE)
replicates_data |
A data frame with data from all replicates to be filtered. |
violation |
Violation to be analyzed. Options are "homicidio", "secuestro", "reclutamiento", and "desaparicion". |
perp_change |
A logical value indicating whether victims in years after
2016 with perpetrator values (indicated by |
A filtered data frame.
local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") filter_standard_cev(replicates_data, "reclutamiento", perp_change = TRUE)local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") filter_standard_cev(replicates_data, "reclutamiento", perp_change = TRUE)
Determine valid sources for estimation of a stratum of interest.
get_valid_sources(stratum_data_prepped, min_n = 1)get_valid_sources(stratum_data_prepped, min_n = 1)
stratum_data_prepped |
A data frame with all records in a stratum of interest.
Columns indicating sources should be prefixed with |
min_n |
The minimum number of records that must appear in a source to be
considered valid for estimation. |
A character vector containing the names of the valid sources.
set.seed(19481210) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) get_valid_sources(my_stratum)set.seed(19481210) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) get_valid_sources(my_stratum)
Look up and read in existing estimates from pre-calculated files.
lookup_estimates(stratum_data_prepped, estimates_dir)lookup_estimates(stratum_data_prepped, estimates_dir)
stratum_data_prepped |
A data frame including all records in a stratum of interest.
The data frame should only include the source columns prefixed with |
estimates_dir |
Directory containing pre-calculated estimates, if you would like to use pre-calculated results. Note, setting this option forces the model specification parameters to be identical to those used to calculate the pre-calculated estimates. Do not specify a file path If you would like to use a custom model specification. |
A data frame with one column, N, indicating the results. If the
stratum was not found in the pre-calculated files, N will be NA and the
data frame will have one row. If the stratum was found in the pre-calculated
files, N will contain draws from the posterior distribution of the model
and the data frame will contain 1,000 rows.
in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) lookup_estimates(stratum_data_prepped = my_stratum, estimates_dir = "path_to_estimates")in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) lookup_estimates(stratum_data_prepped = my_stratum, estimates_dir = "path_to_estimates")
Prepare data for estimation and calculate estimates using run_lcmcr.
mse( stratum_data, stratum_name, estimates_dir = NULL, min_n = 1, K = NULL, buffer_size = 10000, sampler_thinning = 1000, seed = 19481210, burnin = 10000, n_samples = 10000, posterior_thinning = 500 )mse( stratum_data, stratum_name, estimates_dir = NULL, min_n = 1, K = NULL, buffer_size = 10000, sampler_thinning = 1000, seed = 19481210, burnin = 10000, n_samples = 10000, posterior_thinning = 500 )
stratum_data |
A data frame including all records in a stratum of interest.
Columns indicating sources should be prefixed with |
stratum_name |
An identifier for the stratum. |
estimates_dir |
File path for the folder containing pre-calculated estimates, if you would like to use pre-calculated results. Note, setting this option forces the model specification parameters to be identical to those used to calculate the pre-calculated estimates. Do not specify a file path If you would like to use a custom model specification. |
min_n |
The minimum number of records that must appear in a source to be
considered valid for estimation. |
K |
The maximum number of latent classes to fit. By default the function
will calculate |
buffer_size |
Size of the tracing buffer. Default value is 10,000. |
sampler_thinning |
Thinning interval for the tracing buffer. Default value is 1,000. |
seed |
Integer seed for the internal random number generator. Default value is 19481210. |
burnin |
Number of burn in iterations. Default value is 10,000. |
n_samples |
Number of samples to be generated. Samples are taken one
every |
posterior_thinning |
Thinning interval for the sampler. Default value is 500. |
A data frame with five columns. validated is a logical value
indicating whether the stratum is estimable, N is the draws from the
posterior distribution (NA if the stratum is not estimable), valid_sources
is a string indicating which sources were used in the estimation, n_obs is
the number of observations on valid lists in the stratum of interest (NA if
the stratum is not estimable), and stratum_name is a stratum identifier.
If the stratum is estimable the return will consist of n_samples divided by
1,000 rows.
set.seed(19481210) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) mse(stratum_data = my_stratum, stratum_name = "my_stratum")set.seed(19481210) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) mse(stratum_data = my_stratum, stratum_name = "my_stratum")
combine_replicates to completed data (includes imputed values).Calculate the proportions of each level of a variable after
applying combine_replicates to completed data (includes imputed values).
proportions_imputed(complete_data, strata_vars, digits = 2)proportions_imputed(complete_data, strata_vars, digits = 2)
complete_data |
A data frame containing the output from |
strata_vars |
A vector of column names identifying the variables to be used for stratification. |
digits |
Number of decimal places to round the results to. Default value is 2. |
A data frame that contains the proportions after applying
combine_replicates.
local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(replicates_dir = local_dir, violation = "reclutamiento", replicate_nums = c(1, 2), version = "v1", crash = TRUE) replicates_obs_data <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE) tab_combine <- combine_replicates("reclutamiento", replicates_obs_data, replicates_data, strata_vars = 'sexo', conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE) prop_data_complete <- proportions_imputed(tab_combine, strata_vars = "sexo", digits = 2)local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(replicates_dir = local_dir, violation = "reclutamiento", replicate_nums = c(1, 2), version = "v1", crash = TRUE) replicates_obs_data <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE) tab_combine <- combine_replicates("reclutamiento", replicates_obs_data, replicates_data, strata_vars = 'sexo', conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE) prop_data_complete <- proportions_imputed(tab_combine, strata_vars = "sexo", digits = 2)
summary_observed to observed values.Calculate the proportions of each level of a variable after applying
summary_observed to observed values.
proportions_observed(obs_data, strata_vars, digits = 2)proportions_observed(obs_data, strata_vars, digits = 2)
obs_data |
A data frame containing the output from |
strata_vars |
A vector of column names identifying the variables to be used for stratification. |
digits |
Number of decimal places to round the results to. Default is 2. |
A data frame that contains the proportions after applying
summary_observed.
local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") tab_observed <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = TRUE, include_props = TRUE) prop_data <- proportions_observed(tab_observed, strata_vars = "sexo", digits = 2)local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") tab_observed <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = TRUE, forced_dis_filter = FALSE, edad_minors_filter = TRUE, include_props = TRUE) prop_data <- proportions_observed(tab_observed, strata_vars = "sexo", digits = 2)
Read replicates in a directory and verify they are identical to the ones published.
read_replicates( replicates_dir, violation, replicate_nums, version, crash = TRUE )read_replicates( replicates_dir, violation, replicate_nums, version, crash = TRUE )
replicates_dir |
A path to the directory containing the replicates. Then file name of each replicate must contain at least the name of the violation in Spanish and lower case letters (homicidio, secuestro, reclutamiento, desaparicion), and the replicate number preceded by "R", (e.g., "R1" for replicate 1). |
violation |
A string indicating the violation being analyzed. Options are "homicidio", "secuestro", "reclutamiento", and "desaparicion". |
replicate_nums |
A numeric vector containing the replicates to be analyzed. Values in the vector should be between 1 and 100 inclusive. |
version |
Version of the data being read in. Options are "v1" or "v2". "v1" is appropriate for replicating the replicating the results of the joint JEP-CEV-HRDAG project. "v2" is appropriate for conducting your new analyses of the conflict in Colombia. |
crash |
A parameter to define whether the function should crash if the content of the file is not identical to the one published. If crash = TRUE (default), it will return error and not read the data, if crash = FALSE, the function will return a warning but still read the data. |
A data frame with the data from all indicated replicates.
local_dir <- system.file("extdata", "right", package = "verdata") read_replicates(local_dir, "reclutamiento", 1, 2, version = "v1")local_dir <- system.file("extdata", "right", package = "verdata") read_replicates(local_dir, "reclutamiento", 1, 2, version = "v1")
Calculate multiple systems estimation estimates using the Bayesian Non-Parametric Latent-Class Capture-Recapture model developed by Daniel Manrique-Vallier (2016).
run_lcmcr( stratum_data_prepped, stratum_name, min_n = 1, K, buffer_size, sampler_thinning, seed, burnin, n_samples, posterior_thinning )run_lcmcr( stratum_data_prepped, stratum_name, min_n = 1, K, buffer_size, sampler_thinning, seed, burnin, n_samples, posterior_thinning )
stratum_data_prepped |
A data frame with all records in the stratum of interest
documented by sources considered valid for estimation (i.e., there should be
no rows with all 0's). Columns indicating sources should be prefixed with
|
stratum_name |
An identifier for the stratum. |
min_n |
The minimum number of records that must appear in a source to be
considered valid for estimation. |
K |
The maximum number of latent classes to fit. |
buffer_size |
Size of the tracing buffer. |
sampler_thinning |
Thinning interval for the tracing buffer. |
seed |
Integer seed for the internal random number generator. |
burnin |
Number of burn in iterations. |
n_samples |
Number of samples to be generated. Samples are taken one
every |
posterior_thinning |
Thinning interval for the sampler. |
A data frame with four columns and n_samples divided by 1,000 rows.
N is the draws from the posterior distribution, valid_sources is a string
indicating which sources were used in the estimation, n_obs is the number of
observations in the stratum of interest, and stratum_name is the stratum
identifier.
Manrique‐Vallier D (2016). “Bayesian population size estimation using Dirichlet process mixtures.” Biometrics, 72(4), 1246–1254. doi:10.1111/biom.12502.
set.seed(19481210) library(dplyr) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) run_lcmcr(stratum_data_prepped = my_stratum, stratum_name = "my_stratum", K = 4, buffer_size = 10000, sampler_thinning = 1000, seed = 19481210, burnin = 10000, n_samples = 10000, posterior_thinning = 500)set.seed(19481210) library(dplyr) in_A <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.45, 0.65)) in_B <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5)) in_C <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.75, 0.25)) in_D <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(1, 0)) my_stratum <- tibble::tibble(in_A, in_B, in_C, in_D) %>% dplyr::mutate(rs = rowSums(.)) %>% dplyr::filter(rs >= 1) %>% dplyr::select(-rs) run_lcmcr(stratum_data_prepped = my_stratum, stratum_name = "my_stratum", K = 4, buffer_size = 10000, sampler_thinning = 1000, seed = 19481210, burnin = 10000, n_samples = 10000, posterior_thinning = 500)
Data documenting the stratifications used to replicate the results of the methodological report of the joint JEP-CEV-HRDAG project (version in English).
data(stratification)data(stratification)
A data frame with 31 rows and 4 variables.
the human rights violation being analyzed
the type of analysis the stratification was used for (e.g., patterns of violence by year, sex, etc.)
the variables used to stratify the estimates
additional notes about the stratification; NA if no notes
Joint JEP-CEV-HRDAG project.
Summary statistics for observed data.
summary_observed( violation, replicates_data, strata_vars = NULL, conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2 )summary_observed( violation, replicates_data, strata_vars = NULL, conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2 )
violation |
Violation to be analyzed. Options are "homicidio", "secuestro", "reclutamiento", and "desaparicion". |
replicates_data |
Data frame containing replicate data. |
strata_vars |
Variable to be analyzed. Before imputation this variable may have missing values. |
conflict_filter |
Filter that indicates if the data is filtered by the rule "is_conflict" or not. |
forced_dis_filter |
Filter that indicates if the data is filter by the rule "is_forced_dis" or not. |
edad_minors_filter |
Optional filter by age ("edad") < 18. |
include_props |
A logical value indicating whether or not to include the proportions from the calculations. |
digits |
Number of decimal places to round the results to. Default is 2. |
A data frame with two or more columns, (1) name of variable(s) and (2) the number of observations in each of the variable's categories.
local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") tab_observed <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2)local_dir <- system.file("extdata", "right", package = "verdata") replicates_data <- read_replicates(local_dir, "reclutamiento", c(1, 2), version = "v1") tab_observed <- summary_observed("reclutamiento", replicates_data, strata_vars = "sexo", conflict_filter = FALSE, forced_dis_filter = FALSE, edad_minors_filter = FALSE, include_props = FALSE, digits = 2)