
Compare Taylor linearization vs. replicate variance for creel estimates
Source:R/compare-variance.R
compare_variance.RdTakes a creel_estimates object produced with
variance = "taylor" and re-estimates using replicate weights
(bootstrap or jackknife) to produce a side-by-side comparison of standard
errors. A cli_warn() is issued for any row where the two SEs diverge
by more than divergence_threshold.
Usage
compare_variance(
x,
replicate_method = c("bootstrap", "jackknife"),
conf_level = 0.95,
divergence_threshold = 0.1,
...
)Arguments
- x
A
creel_estimatesobject withvariance_method = "taylor". Must have been created with adesignstored inx$design.- replicate_method
Character. Replicate variance method to use for comparison. One of
"bootstrap"(default) or"jackknife".- conf_level
Numeric confidence level (default: 0.95). Passed to the replicate estimation call.
- divergence_threshold
Numeric. Fraction by which replicate SE may differ from Taylor SE before a warning is issued (default: 0.10 = 10\ A warning fires for any group where
|se_replicate / se_taylor - 1| > divergence_threshold.- ...
Additional arguments passed to the underlying estimator.
Value
A creel_variance_comparison S3 object (a tibble subclass)
with columns:
- se_taylor
Taylor linearization SE from the original estimate.
- se_replicate
Replicate-weight SE from the re-estimation.
- divergence_ratio
Ratio
se_replicate / se_taylor.NAwhense_taylor == 0.- diverges_flag
Logical.
TRUEwhen|divergence_ratio - 1| > divergence_threshold.
Group columns (if any) are preserved. The full tibble is returned
invisibly via print(). Use as.data.frame() or standard
tibble methods for further processing.
Method
The function extracts the Taylor SE from x$estimates$se, then calls
the same estimator that produced x (resolved via x$method)
with variance = replicate_method. The re-estimation uses
x$design and the grouping variables from x$by_vars.
Divergence is computed as: $$ratio = se_{replicate} / se_{taylor}$$ $$diverges = |ratio - 1| > threshold$$
A ratio substantially different from 1 indicates that the Taylor approximation may be unreliable for this design (e.g., sparse strata, non-linear estimator). Replication-based variance is generally more robust but slower to compute.
References
Wolter, K.M. 2007. Introduction to Variance Estimation, 2nd ed. Springer.
Lumley, T. 2010. Complex Surveys: A Guide to Analysis Using R. Wiley.
See also
Other "Reporting & Diagnostics":
adjust_nonresponse(),
check_completeness(),
flag_outliers(),
season_summary(),
standardize_species(),
summarize_by_angler_type(),
summarize_by_day_type(),
summarize_by_method(),
summarize_by_species_sought(),
summarize_by_trip_length(),
summarize_cws_rates(),
summarize_hws_rates(),
summarize_length_freq(),
summarize_refusals(),
summarize_successful_parties(),
summarize_trips(),
summary.creel_estimates(),
validate_creel_data(),
validate_design(),
validate_incomplete_trips(),
validation_report(),
write_estimates()
Examples
data("example_counts", package = "tidycreel")
data("example_interviews", package = "tidycreel")
cal <- unique(example_counts[, c("date", "day_type")])
design <- creel_design(cal, date = date, strata = day_type)
design <- suppressWarnings(add_counts(design, example_counts))
design <- suppressWarnings(add_interviews(
design, example_interviews,
catch = catch_total, effort = hours_fished,
trip_status = trip_status, trip_duration = trip_duration
))
#> ℹ No `n_anglers` provided — assuming 1 angler per interview.
#> ℹ Pass `n_anglers = <column>` to use actual party sizes for angler-hour
#> normalization.
#> ℹ Added 22 interviews: 17 complete (77%), 5 incomplete (23%)
taylor_est <- suppressWarnings(estimate_catch_rate(design))
#> ℹ Using complete trips for CPUE estimation
#> (n=17, 77.3% of 22 interviews) [default]
cmp <- suppressWarnings(compare_variance(taylor_est))
#> ℹ Using complete trips for CPUE estimation
#> (n=17, 100% of 17 interviews) [default]
print(cmp)
#>
#> ── Variance Comparison: Taylor vs. bootstrap ───────────────────────────────────
#> Divergence threshold: 10%
#> ✔ All rows within threshold.
#>
#> # A tibble: 1 × 4
#> se_taylor se_replicate divergence_ratio diverges_flag
#> <dbl> <dbl> <dbl> <lgl>
#> 1 0.114 0.115 1.01 FALSE