Skip to contents

Complexity metrics

scc’s complexity column is an approximation of cyclomatic complexity — not a true measurement. It’s computed without parsing the code into an AST: while scc is already scanning each file for line classification, it also increments a per-file counter every time it sees a token that typically introduces a branch. The exact token set depends on the language; in Java, for example, each of for, if, switch, while, else, ||, &&, !=, and == adds 1.

A few caveats from scc’s own README worth carrying into any analysis:

  • Same-language comparisons only. The counter is calculated by looking for the branch tokens of a particular language; languages with fewer such tokens (or that rely on recursion instead of loops) score lower without being intrinsically simpler. Use complexity to rank files / projects within the same language, not to compare R against C.
  • No AST, by design. The trade-off is speed: the count is essentially free during the regular pass, but recursive methods or other non-branching constructs are not picked up.
  • Practical use. The most common pattern is finding the most complex files in a project. With glockr that’s:
r_files <- scc_by_file(rlang_path, include_ext = "r")
r_files[order(-r_files$complexity),
        c("filename", "code", "complexity")] |>
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 most complex .R files",
    subtitle = "raw branch-token count",
    preheader = "rlang package")
Top 10 most complex .R files
raw branch-token count
filename code complexity
cnd-abort.R 691 152
trace.R 807 134
standalone-cli.R 351 131
deparse.R 819 111
standalone-vctrs.R 505 107
call.R 436 92
standalone-types-check.R 483 72
standalone-obj-type.R 242 67
cnd-entrace.R 213 61
utils.R 302 59

The complexity column is the raw branch-token count per record.

Weighted complexity

weighted_complexity reports cyclomatic complexity per 100 lines of code, using the same formula scc applies in its wide tabular output:

weighted_complexity = (complexity / code) * 100      # 0 when code == 0

Note that scc’s JSON formatter does not populate WeightedComplexity (the field is emitted as 0 regardless of input), so glockr computes the value itself from complexity and code. For scc_by_file() this matches scc’s per-file value exactly; for scc() the same formula is applied to each language’s aggregate totals.

Sorting the same files by weighted_complexity instead surfaces small-but-dense files that the raw count under-emphasises (a 30-line file with complexity = 11 will rank higher than a 300-line file with complexity = 15):

r_files[order(-r_files$weighted_complexity),
        c("filename", "code", "complexity", "weighted_complexity")] |>
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 by weighted complexity",
    subtitle = "complexity per 100 lines of code",
    preheader = "rlang package (.R files)")
Top 10 by weighted complexity
complexity per 100 lines of code
filename code complexity weighted_complexity
standalone-cli.R 351 131 37.32194
utils-cli-tree.R 111 35 31.53153
operators.R 29 9 31.03448
utils-encoding.R 37 11 29.72973
cnd-entrace.R 213 61 28.63850
expr.R 155 43 27.74194
standalone-obj-type.R 242 67 27.68595
error-backtrace-empty.R 16 4 25.00000
raw.R 9 2 22.22222
cnd-abort.R 691 152 21.99711

Unique lines of code

uloc (Unique Lines Of Code) is enabled by default — pass uloc = FALSE to skip the calculation. Despite the name, scc’s per-record ULOC is the count of unique non-blank lines, which means comments count toward ULOC. SLOC (the code column) is just the raw count of non-blank, non-comment lines and includes duplicates.

The clearest way to see the difference is two minimal fixtures bundled with the package:

# inst/test/file/code.R           — pure code, 5 lines, two duplicate pairs
x <- 1
y <- 2
x <- 1
y <- 2
z <- 3
# inst/test/file/code_w_comments.R — same code, plus three unique comments
# step 1
x <- 1
y <- 2
# step 2
x <- 1
y <- 2
# step 3
z <- 3

Running scc_by_file() on both:

fixtures <- c(
  system.file("test/file/code.R",            package = "glockr"),
  system.file("test/file/code_w_comments.R", package = "glockr")
)
scc_by_file(fixtures, dryness = TRUE)[, c("filename", "lines", "code",
                                          "comments", "uloc", "dryness")] |>
  gt::gt() |>
  gt::tab_header(
    title    = "SLOC vs ULOC on minimal fixtures",
    subtitle = "both files contain the same 5 code lines",
    preheader = "inst/test/file/")
SLOC vs ULOC on minimal fixtures
both files contain the same 5 code lines
filename lines code comments uloc dryness
/tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code.R 5 5 0 3 0.60
/tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code_w_comments.R 8 5 3 6 0.75

Two takeaways:

  1. code.R: SLOC = 5 but ULOC = 3. The lines x <- 1 and y <- 2 each appear twice; the unique-line set is {x <- 1, y <- 2, z <- 3}.
  2. code_w_comments.R: Same SLOC = 5, but ULOC = 6. The unique-non-blank set is {# step 1, # step 2, # step 3, x <- 1, y <- 2, z <- 3} — three distinct comments expand ULOC above SLOC.

So ULOC can be either smaller or larger than SLOC depending on the mix of duplication and unique comments.

ULOC at scale

The same calculation aggregates across an entire codebase:

scc(rlang_path) |>
  gt::gt() |>
  gt::tab_header(
    title = "Succinct Code Counter",
    subtitle = "per-language ULOC",
    preheader = "rlang package")
Succinct Code Counter
per-language ULOC
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

DRYness

The dryness column is opt-in: pass dryness = TRUE to either scc() or scc_by_file() and a dryness column appears in the returned tibble, computed locally as uloc / lines (matching scc’s DRYness % formula in its tabular output, applied per record instead of project-wide). Values close to 1 mean a file or language is mostly unique non-blank content; values closer to 0 mean heavy duplication, comments, or blanks. Since the formula needs uloc, passing dryness = TRUE also auto-promotes uloc to TRUE even if you explicitly set it to FALSE.

scc(rlang_path, dryness = TRUE)[, c("language", "lines", "code",
                                    "uloc", "dryness")] |>
  gt::gt() |>
  gt::tab_header(
    title = "Per-language DRYness",
    subtitle = "dryness = uloc / lines",
    preheader = "rlang package")
Per-language DRYness
dryness = uloc / lines
language lines code uloc dryness
R 43171 27043 23250 0.5385560
Markdown 13437 11344 4755 0.3538736
C 13173 10016 6679 0.5070219
C Header 8272 5322 4689 0.5668520
YAML 635 505 399 0.6283465
C++ 25 21 17 0.6800000
C++ Header 26 21 21 0.8076923
SVG 10 10 10 1.0000000
Makefile 11 7 8 0.7272727
License 2 2 2 1.0000000
TOML 0 0 1 0.0000000

For rlang, the R language scores about 0.54 — roughly half of all physical R lines are unique non-blank content, with the remainder being either duplicates, blank lines, or repeated comment lines.

Language remapping

Below are arguments for including unknown file types and languages.

Unknown-extension files

count_as maps file extensions to known scc languages using the format "ext:language" (or several mappings separated by commas). Below we include Quarto and RMarkdown as Markdown.

scc(rlang_path, count_as = "qmd:Markdown,rmd:Markdown") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "including .Qmd/.Rmd", 
    preheader = "rlang package")
Succinct Code Counter
including .Qmd/.Rmd
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 74 16210 13213 0 2997 0 0.000000 525576 5840
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

Remap by content

remap_unknown inspects only unrecognized files and assigns a language when a header string matches.

scc(rlang_path, remap_unknown = "# R script:R") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "remap unknown # R script:R", 
    preheader = "rlang package")
Succinct Code Counter
remap unknown # R script:R
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1

remap_all does the same for every file, overriding the extension-based detection.

scc(rlang_path, remap_all = "# R script:R") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "remap all # R script:R", 
    preheader = "rlang package")
Succinct Code Counter
remap all # R script:R
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 156 43171 27043 10926 5202 2239 8.279407 1152271 23250
Markdown 52 13437 11344 0 2093 0 0.000000 426556 4755
C 69 13173 10016 864 2293 1827 18.240815 348962 6679
C Header 70 8272 5322 1750 1200 677 12.720782 265714 4689
YAML 8 635 505 26 104 0 0.000000 15354 399
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
Makefile 1 11 7 0 4 2 28.571429 184 8
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 0 0 0 0 0 0.000000 0 1