Skip to contents

Takes a creel_estimates object produced with variance = "taylor" and re-estimates using replicate weights (bootstrap or jackknife) to produce a side-by-side comparison of standard errors. A cli_warn() is issued for any row where the two SEs diverge by more than divergence_threshold.

Usage

compare_variance(
  x,
  replicate_method = c("bootstrap", "jackknife"),
  conf_level = 0.95,
  divergence_threshold = 0.1,
  ...
)

Arguments

x

A creel_estimates object with variance_method = "taylor". Must have been created with a design stored in x$design.

replicate_method

Character. Replicate variance method to use for comparison. One of "bootstrap" (default) or "jackknife".

conf_level

Numeric confidence level (default: 0.95). Passed to the replicate estimation call.

divergence_threshold

Numeric. Fraction by which replicate SE may differ from Taylor SE before a warning is issued (default: 0.10 = 10\ A warning fires for any group where |se_replicate / se_taylor - 1| > divergence_threshold.

...

Additional arguments passed to the underlying estimator.

Value

A creel_variance_comparison S3 object (a tibble subclass) with columns:

se_taylor

Taylor linearization SE from the original estimate.

se_replicate

Replicate-weight SE from the re-estimation.

divergence_ratio

Ratio se_replicate / se_taylor. NA when se_taylor == 0.

diverges_flag

Logical. TRUE when |divergence_ratio - 1| > divergence_threshold.

Group columns (if any) are preserved. The full tibble is returned invisibly via print(). Use as.data.frame() or standard tibble methods for further processing.

Method

The function extracts the Taylor SE from x$estimates$se, then calls the same estimator that produced x (resolved via x$method) with variance = replicate_method. The re-estimation uses x$design and the grouping variables from x$by_vars.

Divergence is computed as: $$ratio = se_{replicate} / se_{taylor}$$ $$diverges = |ratio - 1| > threshold$$

A ratio substantially different from 1 indicates that the Taylor approximation may be unreliable for this design (e.g., sparse strata, non-linear estimator). Replication-based variance is generally more robust but slower to compute.

References

Wolter, K.M. 2007. Introduction to Variance Estimation, 2nd ed. Springer.

Lumley, T. 2010. Complex Surveys: A Guide to Analysis Using R. Wiley.

Examples

data("example_counts", package = "tidycreel")
data("example_interviews", package = "tidycreel")
cal <- unique(example_counts[, c("date", "day_type")])
design <- creel_design(cal, date = date, strata = day_type)
design <- suppressWarnings(add_counts(design, example_counts))
design <- suppressWarnings(add_interviews(
  design, example_interviews,
  catch = catch_total, effort = hours_fished,
  trip_status = trip_status, trip_duration = trip_duration
))
#>  No `n_anglers` provided — assuming 1 angler per interview.
#>  Pass `n_anglers = <column>` to use actual party sizes for angler-hour
#>   normalization.
#>  Added 22 interviews: 17 complete (77%), 5 incomplete (23%)
taylor_est <- suppressWarnings(estimate_catch_rate(design))
#>  Using complete trips for CPUE estimation
#>   (n=17, 77.3% of 22 interviews) [default]
cmp <- suppressWarnings(compare_variance(taylor_est))
#>  Using complete trips for CPUE estimation
#>   (n=17, 100% of 17 interviews) [default]
print(cmp)
#> 
#> ── Variance Comparison: Taylor vs. bootstrap ───────────────────────────────────
#> Divergence threshold: 10%
#>  All rows within threshold.
#> 
#> # A tibble: 1 × 4
#>   se_taylor se_replicate divergence_ratio diverges_flag
#>       <dbl>        <dbl>            <dbl> <lgl>        
#> 1     0.114        0.115             1.01 FALSE