Complexity metrics
scc’s complexity column is an
approximation of cyclomatic
complexity — not a true measurement. It’s computed without parsing
the code into an AST: while scc is already scanning each
file for line classification, it also increments a per-file counter
every time it sees a token that typically introduces a branch. The exact
token set depends on the language; in Java, for example, each of
for, if, switch,
while, else, ||,
&&, !=, and == adds
1.
A few caveats from scc’s own README worth carrying into any analysis:
-
Same-language comparisons only. The counter is
calculated by looking for the branch tokens of a particular language;
languages with fewer such tokens (or that rely on recursion instead of
loops) score lower without being intrinsically simpler. Use
complexityto rank files / projects within the same language, not to compare R against C. - No AST, by design. The trade-off is speed: the count is essentially free during the regular pass, but recursive methods or other non-branching constructs are not picked up.
-
Practical use. The most common pattern is finding
the most complex files in a project. With
glockrthat’s:
r_files <- scc_by_file(rlang_path, include_ext = "r")
r_files[order(-r_files$complexity),
c("filename", "code", "complexity")] |>
head(10) |>
gt::gt() |>
gt::tab_header(
title = "Top 10 most complex .R files",
subtitle = "raw branch-token count",
preheader = "rlang package")| Top 10 most complex .R files | ||
| raw branch-token count | ||
| filename | code | complexity |
|---|---|---|
| cnd-abort.R | 691 | 152 |
| trace.R | 807 | 134 |
| standalone-cli.R | 351 | 131 |
| deparse.R | 819 | 111 |
| standalone-vctrs.R | 505 | 107 |
| call.R | 436 | 92 |
| standalone-types-check.R | 483 | 72 |
| standalone-obj-type.R | 242 | 67 |
| cnd-entrace.R | 213 | 61 |
| utils.R | 302 | 59 |
The complexity column is the raw branch-token count per
record.
Weighted complexity
weighted_complexity reports cyclomatic complexity
per 100 lines of code, using the same formula scc
applies in its wide tabular output:
weighted_complexity = (complexity / code) * 100 # 0 when code == 0
Note that scc’s JSON formatter does not
populate WeightedComplexity (the field is emitted as
0 regardless of input), so glockr computes the
value itself from complexity and code. For
scc_by_file() this matches scc’s per-file
value exactly; for scc() the same formula is applied to
each language’s aggregate totals.
Sorting the same files by weighted_complexity instead
surfaces small-but-dense files that the raw count under-emphasises (a
30-line file with complexity = 11 will rank higher than a 300-line file
with complexity = 15):
r_files[order(-r_files$weighted_complexity),
c("filename", "code", "complexity", "weighted_complexity")] |>
head(10) |>
gt::gt() |>
gt::tab_header(
title = "Top 10 by weighted complexity",
subtitle = "complexity per 100 lines of code",
preheader = "rlang package (.R files)")| Top 10 by weighted complexity | |||
| complexity per 100 lines of code | |||
| filename | code | complexity | weighted_complexity |
|---|---|---|---|
| standalone-cli.R | 351 | 131 | 37.32194 |
| utils-cli-tree.R | 111 | 35 | 31.53153 |
| operators.R | 29 | 9 | 31.03448 |
| utils-encoding.R | 37 | 11 | 29.72973 |
| cnd-entrace.R | 213 | 61 | 28.63850 |
| expr.R | 155 | 43 | 27.74194 |
| standalone-obj-type.R | 242 | 67 | 27.68595 |
| error-backtrace-empty.R | 16 | 4 | 25.00000 |
| raw.R | 9 | 2 | 22.22222 |
| cnd-abort.R | 691 | 152 | 21.99711 |
Unique lines of code
uloc (Unique Lines Of Code) is enabled by default — pass
uloc = FALSE to skip the calculation. Despite the name,
scc’s per-record ULOC is the count of unique
non-blank lines, which means comments count
toward ULOC. SLOC (the code column) is just the
raw count of non-blank, non-comment lines and includes duplicates.
The clearest way to see the difference is two minimal fixtures bundled with the package:
# inst/test/file/code.R — pure code, 5 lines, two duplicate pairs
x <- 1
y <- 2
x <- 1
y <- 2
z <- 3
# inst/test/file/code_w_comments.R — same code, plus three unique comments
# step 1
x <- 1
y <- 2
# step 2
x <- 1
y <- 2
# step 3
z <- 3Running scc_by_file() on both:
fixtures <- c(
system.file("test/file/code.R", package = "glockr"),
system.file("test/file/code_w_comments.R", package = "glockr")
)
scc_by_file(fixtures, dryness = TRUE)[, c("filename", "lines", "code",
"comments", "uloc", "dryness")] |>
gt::gt() |>
gt::tab_header(
title = "SLOC vs ULOC on minimal fixtures",
subtitle = "both files contain the same 5 code lines",
preheader = "inst/test/file/")| SLOC vs ULOC on minimal fixtures | |||||
| both files contain the same 5 code lines | |||||
| filename | lines | code | comments | uloc | dryness |
|---|---|---|---|---|---|
| /tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code.R | 5 | 5 | 0 | 3 | 0.60 |
| /tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code_w_comments.R | 8 | 5 | 3 | 6 | 0.75 |
Two takeaways:
-
code.R: SLOC = 5 but ULOC = 3. The linesx <- 1andy <- 2each appear twice; the unique-line set is{x <- 1, y <- 2, z <- 3}. -
code_w_comments.R: Same SLOC = 5, but ULOC = 6. The unique-non-blank set is{# step 1, # step 2, # step 3, x <- 1, y <- 2, z <- 3}— three distinct comments expand ULOC above SLOC.
So ULOC can be either smaller or larger than SLOC depending on the mix of duplication and unique comments.
ULOC at scale
The same calculation aggregates across an entire codebase:
scc(rlang_path) |>
gt::gt() |>
gt::tab_header(
title = "Succinct Code Counter",
subtitle = "per-language ULOC",
preheader = "rlang package")| Succinct Code Counter | |||||||||
| per-language ULOC | |||||||||
| language | files | lines | code | comments | blanks | complexity | weighted_complexity | bytes | uloc |
|---|---|---|---|---|---|---|---|---|---|
| R | 156 | 43171 | 27043 | 10926 | 5202 | 2239 | 8.279407 | 1152271 | 23250 |
| Markdown | 52 | 13437 | 11344 | 0 | 2093 | 0 | 0.000000 | 426556 | 4755 |
| C | 69 | 13173 | 10016 | 864 | 2293 | 1827 | 18.240815 | 348962 | 6679 |
| C Header | 70 | 8272 | 5322 | 1750 | 1200 | 677 | 12.720782 | 265714 | 4689 |
| YAML | 8 | 635 | 505 | 26 | 104 | 0 | 0.000000 | 15354 | 399 |
| C++ | 2 | 25 | 21 | 0 | 4 | 0 | 0.000000 | 492 | 17 |
| C++ Header | 1 | 26 | 21 | 0 | 5 | 0 | 0.000000 | 429 | 21 |
| SVG | 10 | 10 | 10 | 0 | 0 | 0 | 0.000000 | 9687 | 10 |
| Makefile | 1 | 11 | 7 | 0 | 4 | 2 | 28.571429 | 184 | 8 |
| License | 1 | 2 | 2 | 0 | 0 | 0 | 0.000000 | 43 | 2 |
| TOML | 1 | 0 | 0 | 0 | 0 | 0 | 0.000000 | 0 | 1 |
DRYness
The dryness column is opt-in: pass
dryness = TRUE to either scc() or
scc_by_file() and a dryness column appears in
the returned tibble, computed locally as uloc / lines
(matching scc’s DRYness % formula in its
tabular output, applied per record instead of project-wide). Values
close to 1 mean a file or language is mostly unique non-blank content;
values closer to 0 mean heavy duplication, comments, or blanks. Since
the formula needs uloc, passing dryness = TRUE
also auto-promotes uloc to TRUE even if you
explicitly set it to FALSE.
scc(rlang_path, dryness = TRUE)[, c("language", "lines", "code",
"uloc", "dryness")] |>
gt::gt() |>
gt::tab_header(
title = "Per-language DRYness",
subtitle = "dryness = uloc / lines",
preheader = "rlang package")| Per-language DRYness | ||||
| dryness = uloc / lines | ||||
| language | lines | code | uloc | dryness |
|---|---|---|---|---|
| R | 43171 | 27043 | 23250 | 0.5385560 |
| Markdown | 13437 | 11344 | 4755 | 0.3538736 |
| C | 13173 | 10016 | 6679 | 0.5070219 |
| C Header | 8272 | 5322 | 4689 | 0.5668520 |
| YAML | 635 | 505 | 399 | 0.6283465 |
| C++ | 25 | 21 | 17 | 0.6800000 |
| C++ Header | 26 | 21 | 21 | 0.8076923 |
| SVG | 10 | 10 | 10 | 1.0000000 |
| Makefile | 11 | 7 | 8 | 0.7272727 |
| License | 2 | 2 | 2 | 1.0000000 |
| TOML | 0 | 0 | 1 | 0.0000000 |
For rlang, the R language scores about 0.54 — roughly
half of all physical R lines are unique non-blank content, with the
remainder being either duplicates, blank lines, or repeated comment
lines.
Language remapping
Below are arguments for including unknown file types and languages.
Unknown-extension files
count_as maps file extensions to known scc
languages using the format "ext:language" (or several
mappings separated by commas). Below we include Quarto and RMarkdown as
Markdown.
scc(rlang_path, count_as = "qmd:Markdown,rmd:Markdown") |>
gt::gt() |>
gt::tab_header(
title = "Succinct Code Counter",
subtitle = "including .Qmd/.Rmd",
preheader = "rlang package")| Succinct Code Counter | |||||||||
| including .Qmd/.Rmd | |||||||||
| language | files | lines | code | comments | blanks | complexity | weighted_complexity | bytes | uloc |
|---|---|---|---|---|---|---|---|---|---|
| R | 156 | 43171 | 27043 | 10926 | 5202 | 2239 | 8.279407 | 1152271 | 23250 |
| Markdown | 74 | 16210 | 13213 | 0 | 2997 | 0 | 0.000000 | 525576 | 5840 |
| C | 69 | 13173 | 10016 | 864 | 2293 | 1827 | 18.240815 | 348962 | 6679 |
| C Header | 70 | 8272 | 5322 | 1750 | 1200 | 677 | 12.720782 | 265714 | 4689 |
| YAML | 8 | 635 | 505 | 26 | 104 | 0 | 0.000000 | 15354 | 399 |
| C++ | 2 | 25 | 21 | 0 | 4 | 0 | 0.000000 | 492 | 17 |
| C++ Header | 1 | 26 | 21 | 0 | 5 | 0 | 0.000000 | 429 | 21 |
| SVG | 10 | 10 | 10 | 0 | 0 | 0 | 0.000000 | 9687 | 10 |
| Makefile | 1 | 11 | 7 | 0 | 4 | 2 | 28.571429 | 184 | 8 |
| License | 1 | 2 | 2 | 0 | 0 | 0 | 0.000000 | 43 | 2 |
| TOML | 1 | 0 | 0 | 0 | 0 | 0 | 0.000000 | 0 | 1 |
Remap by content
remap_unknown inspects only unrecognized files and
assigns a language when a header string matches.
scc(rlang_path, remap_unknown = "# R script:R") |>
gt::gt() |>
gt::tab_header(
title = "Succinct Code Counter",
subtitle = "remap unknown # R script:R",
preheader = "rlang package")| Succinct Code Counter | |||||||||
| remap unknown # R script:R | |||||||||
| language | files | lines | code | comments | blanks | complexity | weighted_complexity | bytes | uloc |
|---|---|---|---|---|---|---|---|---|---|
| R | 156 | 43171 | 27043 | 10926 | 5202 | 2239 | 8.279407 | 1152271 | 23250 |
| Markdown | 52 | 13437 | 11344 | 0 | 2093 | 0 | 0.000000 | 426556 | 4755 |
| C | 69 | 13173 | 10016 | 864 | 2293 | 1827 | 18.240815 | 348962 | 6679 |
| C Header | 70 | 8272 | 5322 | 1750 | 1200 | 677 | 12.720782 | 265714 | 4689 |
| YAML | 8 | 635 | 505 | 26 | 104 | 0 | 0.000000 | 15354 | 399 |
| C++ | 2 | 25 | 21 | 0 | 4 | 0 | 0.000000 | 492 | 17 |
| C++ Header | 1 | 26 | 21 | 0 | 5 | 0 | 0.000000 | 429 | 21 |
| SVG | 10 | 10 | 10 | 0 | 0 | 0 | 0.000000 | 9687 | 10 |
| Makefile | 1 | 11 | 7 | 0 | 4 | 2 | 28.571429 | 184 | 8 |
| License | 1 | 2 | 2 | 0 | 0 | 0 | 0.000000 | 43 | 2 |
| TOML | 1 | 0 | 0 | 0 | 0 | 0 | 0.000000 | 0 | 1 |
remap_all does the same for every file, overriding the
extension-based detection.
scc(rlang_path, remap_all = "# R script:R") |>
gt::gt() |>
gt::tab_header(
title = "Succinct Code Counter",
subtitle = "remap all # R script:R",
preheader = "rlang package")| Succinct Code Counter | |||||||||
| remap all # R script:R | |||||||||
| language | files | lines | code | comments | blanks | complexity | weighted_complexity | bytes | uloc |
|---|---|---|---|---|---|---|---|---|---|
| R | 156 | 43171 | 27043 | 10926 | 5202 | 2239 | 8.279407 | 1152271 | 23250 |
| Markdown | 52 | 13437 | 11344 | 0 | 2093 | 0 | 0.000000 | 426556 | 4755 |
| C | 69 | 13173 | 10016 | 864 | 2293 | 1827 | 18.240815 | 348962 | 6679 |
| C Header | 70 | 8272 | 5322 | 1750 | 1200 | 677 | 12.720782 | 265714 | 4689 |
| YAML | 8 | 635 | 505 | 26 | 104 | 0 | 0.000000 | 15354 | 399 |
| C++ | 2 | 25 | 21 | 0 | 4 | 0 | 0.000000 | 492 | 17 |
| C++ Header | 1 | 26 | 21 | 0 | 5 | 0 | 0.000000 | 429 | 21 |
| SVG | 10 | 10 | 10 | 0 | 0 | 0 | 0.000000 | 9687 | 10 |
| Makefile | 1 | 11 | 7 | 0 | 4 | 2 | 28.571429 | 184 | 8 |
| License | 1 | 2 | 2 | 0 | 0 | 0 | 0.000000 | 43 | 2 |
| TOML | 1 | 0 | 0 | 0 | 0 | 0 | 0.000000 | 0 | 1 |