Skip to contents

Generated and minified files

scc can detect files that were auto-generated or minified. Pass gen = TRUE to identify generated files and min = TRUE to identify minified files. The resulting tibble still contains every file, so only the generated / minified flags change.

You can filter to the flagged rows to see what was caught:

flagged <- scc_by_file(rlang_path, min_gen = TRUE)
flagged <- flagged[flagged$generated | flagged$minified, ]
flagged[, c("language", "filename", "generated", "minified")] |>
  gt::gt() |>
  gt::tab_header(
    title    = "Files flagged by scc",
    subtitle = "generated and/or minified",
    preheader = "rlang package")
Files flagged by scc
generated and/or minified
language filename generated minified
R (gen) import-standalone-defer.R TRUE FALSE
SVG (min) lifecycle-stable.svg FALSE TRUE
SVG (min) lifecycle-archived.svg FALSE TRUE
SVG (min) lifecycle-deprecated.svg FALSE TRUE
SVG (min) lifecycle-experimental.svg FALSE TRUE
SVG (min) lifecycle-retired.svg FALSE TRUE
SVG (min) lifecycle-superseded.svg FALSE TRUE
SVG (min) lifecycle-defunct.svg FALSE TRUE
SVG (min) lifecycle-maturing.svg FALSE TRUE
SVG (min) lifecycle-soft-deprecated.svg FALSE TRUE
SVG (min) lifecycle-questioning.svg FALSE TRUE

rlang finds one generated source file (do not edit standalone helper) plus the SVG lifecycle badges, which scc considers minified because they’re single very long lines.

Excluding generated or minified files

no_gen = TRUE (implies gen = TRUE) drops detected generated files from the totals, no_min = TRUE does the same for minified files, and no_min_gen = TRUE removes both in one pass.

To see the effect, compare the per-language file counts before and after:

baseline <- scc(rlang_path, auto_print_scc = FALSE)
dropped  <- scc(rlang_path, no_min_gen = TRUE, auto_print_scc = FALSE)
merge(
  baseline[, c("language", "files")],
  dropped[,  c("language", "files")],
  by = "language", suffixes = c("_before", "_after"), all = TRUE
) |>
  gt::gt() |>
  gt::tab_header(
    title    = "File counts before vs after no_min_gen",
    subtitle = "any difference = files excluded by gen/min detection",
    preheader = "rlang package")
File counts before vs after no_min_gen
any difference = files excluded by gen/min detection
language files_before files_after
C 69 69
C Header 70 70
C++ 2 2
C++ Header 1 1
License 1 1
Makefile 1 1
Markdown 52 52
R 156 155
SVG 10 NA
TOML 1 1
YAML 8 8

The SVG row drops by 10 (the lifecycle badges) and R drops by 1 (the standalone helper). The rest are unchanged because nothing else triggered detection.

Custom generation markers

By default scc looks for "do not edit" and "<auto-generated />" in the first lines of a file. In an R package the most common machine-written files are the .Rd files in man/ (every one starts with % Generated by roxygen2: do not edit by hand).

scc doesn’t recognize the .Rd extension out of the box. We can use the count_as argument to teach it that .Rd files should be counted as Markdown and then point generated_markers at roxygen2’s signature line:

roxygen <- scc_by_file(rlang_path,
                       count_as          = "rd:Markdown",
                       gen               = TRUE,
                       generated_markers = "Generated by roxygen2")
roxygen <- roxygen[roxygen$generated, c("language", "filename", "generated")]
cat("rlang .Rd files flagged as generated:", nrow(roxygen), "\n")
#  rlang .Rd files flagged as generated: 217
head(roxygen, 5) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Custom marker: roxygen2-generated docs",
    subtitle = '`generated_markers = "Generated by roxygen2"`',
    preheader = "rlang package")
Custom marker: roxygen2-generated docs
`generated_markers = "Generated by roxygen2"`
language filename generated
Markdown (gen) set_names.Rd TRUE
Markdown (gen) cnd_muffle.Rd TRUE
Markdown (gen) arg_match.Rd TRUE
Markdown (gen) env_print.Rd TRUE
Markdown (gen) englue.Rd TRUE

Minified-line threshold

A file is considered minified when its average bytes per line exceeds a threshold (default 255). Lower it to catch less-aggressively minified files:

scc(rlang_path, no_min = TRUE, min_gen_line_length = 10L,
    auto_print_scc = FALSE) |>
   gt::gt() |>
    gt::tab_header(
    title = "Minified-line threshold",
    subtitle = "threshold lowered to 10 bytes/line",
    preheader = "rlang package")
Minified-line threshold
threshold lowered to 10 bytes/line
language files lines code comments blanks complexity weighted_complexity bytes uloc
TOML 1 0 0 0 0 0 0 0 1

Duplicate detection

no_duplicates = TRUE removes files whose content is identical to another file already counted, keeping only one copy in the totals.

scc(rlang_path, no_duplicates = TRUE, auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Duplicate-file detection",
    subtitle = "no_duplicates = TRUE",
    preheader = "rlang package")
Duplicate-file detection
no_duplicates = TRUE
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 51 13411 11320 0 2091 0 0.000000 425800 4751
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Excluding files and directories

By directory name

exclude_dir takes a character vector of directory names (not full paths) to skip entirely. The scc defaults (.git, .hg, .svn) are always applied; this extends them.

scc(rlang_path, exclude_dir = c("tests", "src"), auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Exclude directories",
    subtitle = 'exclude_dir = c("tests", "src")',
    preheader = "rlang package")
Exclude directories
exclude_dir = c("tests", "src")
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 72 25574 12552 10678 2344 1980 15.77438 676971 14079
Markdown 12 5089 3628 0 1461 0 0.00000 181846 2596
YAML 8 635 505 26 104 0 0.00000 15354 399
SVG 10 10 10 0 0 0 0.00000 9687 10
License 1 2 2 0 0 0 0.00000 43 2
TOML 1 0 0 0 0 0 0.00000 0 1

By file name

exclude_file skips files whose name (not full path) matches any element:

scc(rlang_path, exclude_file = c("LICENSE", "testthat.R"),
    auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Exclude file names",
    subtitle = 'exclude_file = c("LICENSE", "testthat.R")',
    preheader = "rlang package")
Exclude file names
exclude_file = c("LICENSE", "testthat.R")
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 154 43160 27036 10925 5199 2239 8.281551 1152064 23243
Markdown 51 13416 11327 0 2089 0 0.000000 425484 4738
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
TOML 1 0 0 0 0 0 0.000000 0 1

By path pattern

not_match accepts a character vector of regex patterns. Any file whose full path matches is excluded. Each element becomes a separate --not-match flag.

scc(rlang_path, not_match = c("test_", "_generated"),
    auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Exclude by path pattern",
    subtitle = 'not_match = c("test_", "_generated")',
    preheader = "rlang package")
Exclude by path pattern
not_match = c("test_", "_generated")
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

By default scc skips symbolic links. Set include_symlinks = TRUE to count them (no effect on Windows).

scc(rlang_path, include_symlinks = TRUE, auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Counting symlinked files",
    subtitle = "include_symlinks = TRUE",
    preheader = "rlang package")
Counting symlinked files
include_symlinks = TRUE
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Ignore files

scc respects .gitignore, .ignore, .gitmodules, and .sccignore by default. Use the corresponding no_* flags to disable each:

scc(rlang_path, no_gitignore = TRUE, auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Disabling .gitignore rules",
    subtitle = "no_gitignore = TRUE",
    preheader = "rlang package")
Disabling .gitignore rules
no_gitignore = TRUE
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1
scc(rlang_path, no_ignore = TRUE)  # ignore .ignore rules
scc(rlang_path, no_gitmodule = TRUE)  # ignore .gitmodules
scc(rlang_path, no_scc_ignore = TRUE)  # ignore .sccignore

count_ignore = TRUE also counts the ignore files themselves as source:

scc(rlang_path, count_ignore = TRUE, auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Counting ignore files as source",
    subtitle = "count_ignore = TRUE",
    preheader = "rlang package")
Counting ignore files as source
count_ignore = TRUE
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
gitignore 4 20 20 0 0 0 0.000000 200 20
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Large-file thresholds

no_large = TRUE drops files that exceed a byte or line-count threshold. The defaults are 1 000 000 bytes and 40 000 lines; override with large_byte_count and large_line_count.

# skip files larger than 50 000 bytes or 1 000 lines
scc(rlang_path, no_large = TRUE,
    large_byte_count = 50000L, large_line_count = 1000L,
    auto_print_scc   = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Large-file thresholds",
    subtitle = "large_byte_count = 50,000; large_line_count = 1,000",
    preheader = "rlang package")
Large-file thresholds
large_byte_count = 50,000; large_line_count = 1,000
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 149 34624 21753 8670 4201 1667 7.663311 926757 18683
C 67 10934 8200 781 1953 1586 19.341463 289756 5568
Markdown 48 6176 5418 0 758 0 0.000000 167584 1902
C Header 69 3506 2664 162 680 203 7.620120 80905 1812
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Analysing multiple paths

Both scc() and scc_by_file() accept a character vector of paths, so you can compare directories side by side:

scc(c(file.path(rlang_path, "R"), file.path(rlang_path, "tests")),
    auto_print_scc = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Analysing multiple paths",
    subtitle = "rlang/R + rlang/tests combined",
    preheader = "rlang package")
Analysing multiple paths
rlang/R + rlang/tests combined
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 155 42899 26993 10745 5161 2239 8.294743 1144171 23115
Markdown 40 8348 7716 0 632 0 0.000000 244710 2162
C 2 41 33 0 8 0 0.000000 1259 28
Makefile 1 11 7 0 4 2 28.571429 184 8

Output display flags

These flags exist on the scc CLI for shaping the tabular console output. glockr always requests JSON, so the JSON formatter ignores them — they’re accepted purely as passthrough and never change the tibble’s shape or values. They’re listed here for completeness and so the function signatures stay symmetric with scc.

Flag Default Effect on scc’s tabular display
percent = TRUE FALSE Adds percentage columns
size_unit NULL ("si") Size unit: "si", "binary", "mixed", or "xkcd-*"
no_hborder = TRUE FALSE Removes horizontal borders
no_size = TRUE FALSE Hides the size-calculation summary line
character = TRUE FALSE Adds max/mean characters-per-line columns
verbose = TRUE FALSE Verbose processing log (stderr)
debug = TRUE FALSE Full debug log (stdout — parse_scc_json() strips DEBUG/VERBOSE lines before parsing)

Passing any combination still returns the standard tibble:

ident <- identical(
  scc(rlang_path, auto_print_scc = FALSE),
  scc(rlang_path,
      percent    = TRUE,
      size_unit  = "binary",
      no_hborder = TRUE,
      no_size    = TRUE,
      character  = TRUE,
      auto_print_scc = FALSE)
)
cat("display-flag combination yields the same tibble:", ident, "\n")
#  display-flag combination yields the same tibble: TRUE

Performance tuning

On very large repositories you can control the number of worker goroutines and queue sizes used by scc. The defaults work well in most cases; adjust only when profiling shows a bottleneck.

Parameter scc default Role
directory_walker_job_workers 8 Workers scanning directories
file_process_job_workers 12 Workers collecting file stats
file_gc_count 10000 Files parsed before GC runs
file_list_queue_size 12 Queue for discovered files
file_summary_job_queue_size 12 Queue for processed-file stats
scc(rlang_path,
    directory_walker_job_workers = 4L,
    file_process_job_workers     = 4L,
    file_gc_count                = 5000L,
    auto_print_scc               = FALSE) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Performance-tuning passthrough",
    subtitle = "workers = 4; file_gc_count = 5,000",
    preheader = "rlang package")
Performance-tuning passthrough
workers = 4; file_gc_count = 5,000
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Supported languages

scc_languages() returns every language scc recognizes, along with the file extensions it maps to that language.

langs <- scc_languages()
head(langs, 10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Languages recognized by scc",
    subtitle = "first 10 rows of scc_languages()",
    preheader = "scc_languages()")
Languages recognized by scc
first 10 rows of scc_languages()
language extensions
ABAP abap
ABNF abnf
ActionScript as
Ada ada,adb,ads,pad
Agda agda
Alchemist crn
Alex x
Algol 68 a68
Alloy als
Amber ab

Search for a specific language:

langs[grepl("^R$", langs$language), ] |>
  gt::gt() |>
  gt::tab_header(
    title    = "Lookup: R",
    subtitle = 'grepl("^R$", langs$language)',
    preheader = "scc_languages()")
Lookup: R
grepl("^R$", langs$language)
language extensions
R r