
Spatially Stratified Estimation with Sections
Source:vignettes/section-estimation.Rmd
section-estimation.RmdIntroduction
A spatially stratified creel survey divides the lake into discrete geographic sections — for example, North, Central, and South — each managed as a separate reporting unit. Each section has its own observed count data and its own set of angler interviews, so effort levels and catch rates can differ materially between sections.
add_sections() is the right tool when the survey design
explicitly stratifies by section: when different field crews
patrol different areas, when section-level estimates are required in the
final report, or when the lake is large enough that assuming uniform
effort across the full water body would be misleading. If you are simply
curious about spatial patterns but did not design the survey with
sections in mind, use the standard creel_design() workflow
without add_sections().
Building a Sectioned Design
library(tidycreel)
data(example_sections_calendar)
data(example_sections_counts)
data(example_sections_interviews)
sections_df <- data.frame(
section = c("North", "Central", "South"),
stringsAsFactors = FALSE
)
design <- creel_design(example_sections_calendar, date = date, strata = day_type)
design <- add_sections(design, sections_df, section_col = section)
design <- add_counts(design, example_sections_counts)
design <- add_interviews(
design, example_sections_interviews,
catch = catch_total,
effort = hours_fished,
harvest = catch_kept,
trip_status = trip_status,
trip_duration = trip_duration
)
#> ℹ No `n_anglers` provided — assuming 1 angler per interview.
#> ℹ Pass `n_anglers = <column>` to use actual party sizes for angler-hour
#> normalization.
#> ℹ Added 27 interviews: 27 complete (100%), 0 incomplete (0%)
print(design)
#>
#> ── Creel Survey Design ─────────────────────────────────────────────────────────
#> Type: "instantaneous"
#> Date column: date
#> Strata: day_type
#> Calendar: 12 days (2024-06-03 to 2024-06-21)
#> day_type: 2 levels
#> Counts: 36 observations
#> PSU column: date
#> Count type: "instantaneous"
#> Survey: <survey.design2> (constructed)
#> Interviews: 27 observations
#> Type: "access"
#> Catch: catch_total
#> Effort: hours_fished
#> Harvest: catch_kept
#> Trip status: 27 complete, 0 incomplete
#> Survey: <survey.design2> (constructed)
#> Sections: 3 registered
#> North
#> Central
#> SouthThe design now shows three registered sections (North, Central, South). All downstream estimators automatically detect sections and return per-section output.
Per-Section Effort Estimation
estimate_effort() detects the registered sections and
returns one row per section plus a .lake_total row that
aggregates across sections. The lake-total standard error accounts for
cross-section covariance (described in the Variance Aggregation section
below).
effort_est <- estimate_effort(design)
print(effort_est$estimates)
#> # A tibble: 4 × 10
#> section estimate se se_between se_within ci_lower ci_upper n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 North 269 12.3 12.3 0 242. 296. 12
#> 2 Central 472 19.0 19.0 0 430. 514. 12
#> 3 South 105 9.18 9.18 0 84.6 125. 12
#> 4 .lake_total 846 39.4 NA NA 758. 934. 36
#> # ℹ 2 more variables: prop_of_lake_total <dbl>, data_available <lgl>Central has the highest fishing effort among the three sections,
while South has the lowest. The prop_of_lake_total column
shows each section’s share of total angler-hours for the survey
period.
Per-Section Catch Rate
Catch rate (CPUE, fish per angler-hour) is a ratio estimator. Ratios
are not additive across sections: you cannot average North’s rate and
South’s rate to produce a valid lake-wide CPUE. For this reason,
estimate_catch_rate() on a sectioned design returns one row
per section with no .lake_total row.
cpue_est <- estimate_catch_rate(design)
#> ℹ Using complete trips for CPUE estimation
#> (n=27, 100% of 27 interviews) [default]
print(cpue_est$estimates)
#> # A tibble: 3 × 7
#> section estimate se ci_lower ci_upper n data_available
#> <chr> <dbl> <dbl> <dbl> <dbl> <int> <lgl>
#> 1 North 1.06 0.0710 0.922 1.20 9 TRUE
#> 2 Central 1.51 0.0179 1.47 1.54 9 TRUE
#> 3 South 2.45 0.0287 2.39 2.51 9 TRUENotice there is no
.lake_totalrow. Catch rate is a ratio estimator and ratios are not additive — South’s high catch rate cannot simply be averaged with North’s low rate to produce a valid lake-wide CPUE. To estimate the lake-wide catch rate, callestimate_catch_rate()on a design without section registration.
# Lake-wide catch rate: build an unsectioned design using the same data
design_nosections <- creel_design(example_sections_calendar, date = date, strata = day_type)
design_nosections <- add_counts(design_nosections, example_sections_counts)
design_nosections <- add_interviews(
design_nosections, example_sections_interviews,
catch = catch_total, effort = hours_fished,
harvest = catch_kept, trip_status = trip_status, trip_duration = trip_duration
)
#> ℹ No `n_anglers` provided — assuming 1 angler per interview.
#> ℹ Pass `n_anglers = <column>` to use actual party sizes for angler-hour
#> normalization.
#> ℹ Added 27 interviews: 27 complete (100%), 0 incomplete (0%)
estimate_catch_rate(design_nosections)$estimates
#> ℹ Using complete trips for CPUE estimation
#> (n=27, 100% of 27 interviews) [default]
#> # A tibble: 1 × 5
#> estimate se ci_lower ci_upper n
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1.77 0.118 1.54 2.00 27Per-Section and Lake-Wide Total Catch
Total catch combines effort and catch rate within each section:
TC_i = E_i × CPUE_i. The lake-wide total is
sum(TC_i) — not
E_total × CPUE_pooled. With
aggregate_sections = TRUE (the default), a
.lake_total row is appended to the output.
catch_est <- estimate_total_catch(design, aggregate_sections = TRUE)
print(catch_est$estimates)
#> # A tibble: 4 × 8
#> section estimate se ci_lower ci_upper n prop_of_lake_total
#> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 North 285. 23.1 240. 331. 9 0.228
#> 2 Central 711. 29.9 653. 770. 9 0.567
#> 3 South 257. 22.7 213. 302. 9 0.205
#> 4 .lake_total 1254. 44.1 1156. 1352. 3 1
#> # ℹ 1 more variable: data_available <lgl>South has a high catch rate but the lowest effort, which limits its
contribution to the lake total. Central has a moderate catch rate
combined with the highest effort, making it the largest contributor. The
prop_of_lake_total column shows each section’s percentage
contribution to the lake-wide total catch.
harvest_est <- estimate_total_harvest(design, aggregate_sections = TRUE)
print(harvest_est$estimates)
#> # A tibble: 4 × 8
#> section estimate se ci_lower ci_upper n prop_of_lake_total
#> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 North 165. 15.8 134. 196. 9 0.204
#> 2 Central 466. 23.8 419. 512. 9 0.576
#> 3 South 178. 15.9 147. 210. 9 0.221
#> 4 .lake_total 809. 32.7 736. 882. 3 1
#> # ℹ 1 more variable: data_available <lgl>Variance Aggregation — Why method = "correlated" Is the
Default
In a standard shared-calendar creel design, the field crew works the entire lake on each survey day. Every section is counted and interviewed on the same calendar days: if a day is sampled, all three sections are observed; if a day is not sampled, none are.
Sharing survey days creates cross-section covariance in the sampling errors. In practice this covariance tends to be negative: on high-effort days all sections run high together, and on low-effort days they run low together. The shared-calendar variance formula accounts for this covariance and produces a narrower lake-total standard error than simply adding the section standard errors in quadrature.
method = "independent" is appropriate only when sections
are surveyed by separate, uncoordinated crews — for example, when the
South section is run by Crew A on different days from the North section
(Crew B). In that case the sections have truly independent sampling
errors and Cochran (1977, §5.2) additivity applies:
SE_total = sqrt(sum(SE_h^2)).
Rule of thumb: If your crew drives the entire lake
on each survey day, use the default method = "correlated".
If different sections have separate, non-overlapping calendars, use
method = "independent".
# Default: accounts for cross-section covariance (shared-calendar designs)
estimate_effort(design, method = "correlated")$estimates
#> # A tibble: 4 × 10
#> section estimate se se_between se_within ci_lower ci_upper n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 North 269 12.3 12.3 0 242. 296. 12
#> 2 Central 472 19.0 19.0 0 430. 514. 12
#> 3 South 105 9.18 9.18 0 84.6 125. 12
#> 4 .lake_total 846 39.4 NA NA 758. 934. 36
#> # ℹ 2 more variables: prop_of_lake_total <dbl>, data_available <lgl>
# Alternative: Cochran 5.2 additivity (independent crews, non-overlapping calendars)
estimate_effort(design, method = "independent")$estimates
#> # A tibble: 4 × 10
#> section estimate se se_between se_within ci_lower ci_upper n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 North 269 12.3 12.3 0 242. 296. 12
#> 2 Central 472 19.0 19.0 0 430. 514. 12
#> 3 South 105 9.18 9.18 0 84.6 125. 12
#> 4 .lake_total 846 24.4 NA NA 792. 900. 36
#> # ℹ 2 more variables: prop_of_lake_total <dbl>, data_available <lgl>Missing Section Warning
If a registered section has no count data on any survey day,
estimate_effort() produces an NA row with
data_available = FALSE and emits a warning. This prevents
silent omission of sections that should have been observed.
north_central_counts <- example_sections_counts[
example_sections_counts$section != "South",
]
design_missing <- creel_design(example_sections_calendar, date = date, strata = day_type)
design_missing <- add_sections(design_missing, sections_df, section_col = section)
design_missing <- suppressWarnings(add_counts(design_missing, north_central_counts))
estimate_effort(design_missing, missing_sections = "warn")$estimates
#> Warning: 1 missing section(s) in count data.
#> ! Section(s) not found: "South"
#> ℹ Inserting NA row(s) with data_available = FALSE.
#> # A tibble: 4 × 10
#> section estimate se se_between se_within ci_lower ci_upper n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 North 269 12.3 12.3 0 242. 296. 12
#> 2 Central 472 19.0 19.0 0 430. 514. 12
#> 3 South NA NA NA NA NA NA 0
#> 4 .lake_total 741 31.1 NA NA 672. 810. 24
#> # ℹ 2 more variables: prop_of_lake_total <dbl>, data_available <lgl>To turn warnings into errors (for automated pipelines), use
missing_sections = "error".