flag_outliers() identifies extreme values in a numeric column of a
data frame using Tukey's IQR fence method. Flagged rows are annotated
with is_outlier, outlier_reason, fence_low, and fence_high
columns. A cli summary of flagged rows is emitted.
Arguments
- data
A
data.framecontaining the column to check.- col
Bare column name (unquoted) to check for outliers.
- k
Numeric IQR multiplier (default: 1.5). Larger values produce wider fences and fewer flags. Tukey's standard values are 1.5 (mild outliers) and 3.0 (extreme outliers).
- na.rm
Logical. Remove
NAvalues before computing quantiles (default:TRUE).
Value
The input data with four additional columns appended:
- is_outlier
Logical —
TRUEif the row is outside the fence.- outlier_reason
Character — brief description of why it is flagged (e.g.
"above fence_high (23.5)"), or""if not flagged.- fence_low
Numeric — lower fence value (same for all rows).
NAwhenn < 4.- fence_high
Numeric — upper fence value (same for all rows).
NAwhenn < 4.
Details
Method: Tukey's IQR fence. $$\text{fence\_low} = Q_1 - k \times IQR$$ $$\text{fence\_high} = Q_3 + k \times IQR$$
Values below fence_low or above fence_high are flagged. When
n < 4, there is insufficient data to estimate the IQR reliably;
fences are set to NA and no rows are flagged.
See also
add_interviews(), estimate_catch_rate()
Other "Reporting & Diagnostics":
adjust_nonresponse(),
check_completeness(),
compare_variance(),
season_summary(),
standardize_species(),
summarize_by_angler_type(),
summarize_by_day_type(),
summarize_by_method(),
summarize_by_species_sought(),
summarize_by_trip_length(),
summarize_cws_rates(),
summarize_hws_rates(),
summarize_length_freq(),
summarize_refusals(),
summarize_successful_parties(),
summarize_trips(),
summary.creel_estimates(),
validate_creel_data(),
validate_design(),
validate_incomplete_trips(),
validation_report(),
write_estimates()
Examples
df <- data.frame(
interview_id = 1:8,
effort = c(1.0, 1.5, 2.0, 1.8, 1.2, 1.9, 2.1, 15.0)
)
flag_outliers(df, col = effort)
#> ℹ 1 of 8 values flagged as outliers in effort (k = 1.5, fence: [0.525, 2.925]).
#> interview_id effort is_outlier outlier_reason fence_low fence_high
#> 1 1 1.0 FALSE 0.525 2.925
#> 2 2 1.5 FALSE 0.525 2.925
#> 3 3 2.0 FALSE 0.525 2.925
#> 4 4 1.8 FALSE 0.525 2.925
#> 5 5 1.2 FALSE 0.525 2.925
#> 6 6 1.9 FALSE 0.525 2.925
#> 7 7 2.1 FALSE 0.525 2.925
#> 8 8 15.0 TRUE above fence_high (2.925) 0.525 2.925
