Code Metrics • glockr

Complexity metrics

scc’s complexity column is an approximation of cyclomatic complexity — not a true measurement. It’s computed without parsing the code into an AST: while scc is already scanning each file for line classification, it also increments a per-file counter every time it sees a token that typically introduces a branch. The exact token set depends on the language; in Java, for example, each of for, if, switch, while, else, ||, &&, !=, and == adds 1.

A few caveats from scc’s own README worth carrying into any analysis:

Same-language comparisons only. The counter is calculated by looking for the branch tokens of a particular language; languages with fewer such tokens (or that rely on recursion instead of loops) score lower without being intrinsically simpler. Use complexity to rank files / projects within the same language, not to compare R against C.
No AST, by design. The trade-off is speed: the count is essentially free during the regular pass, but recursive methods or other non-branching constructs are not picked up.
Practical use. The most common pattern is finding the most complex files in a project. With glockr that’s:

r_files <- scc_by_file(rlang_path, include_ext = "r")
r_files[order(-r_files$complexity),
        c("filename", "code", "complexity")] |>
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 most complex .R files",
    subtitle = "raw branch-token count",
    preheader = "rlang package")

filename	code	complexity
Top 10 most complex .R files
raw branch-token count
cnd-abort.R	691	152
trace.R	807	134
standalone-cli.R	351	131
deparse.R	819	111
standalone-vctrs.R	505	107
call.R	436	92
standalone-types-check.R	483	72
standalone-obj-type.R	242	67
cnd-entrace.R	213	61
utils.R	302	59

The complexity column is the raw branch-token count per record.

Weighted complexity

weighted_complexity reports cyclomatic complexity per 100 lines of code, using the same formula scc applies in its wide tabular output:

weighted_complexity = (complexity / code) * 100      # 0 when code == 0

Note that scc’s JSON formatter does not populate WeightedComplexity (the field is emitted as 0 regardless of input), so glockr computes the value itself from complexity and code. For scc_by_file() this matches scc’s per-file value exactly; for scc() the same formula is applied to each language’s aggregate totals.

Sorting the same files by weighted_complexity instead surfaces small-but-dense files that the raw count under-emphasises (a 30-line file with complexity = 11 will rank higher than a 300-line file with complexity = 15):

r_files[order(-r_files$weighted_complexity),
        c("filename", "code", "complexity", "weighted_complexity")] |>
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 by weighted complexity",
    subtitle = "complexity per 100 lines of code",
    preheader = "rlang package (.R files)")

filename	code	complexity	weighted_complexity
Top 10 by weighted complexity
complexity per 100 lines of code
standalone-cli.R	351	131	37.32194
utils-cli-tree.R	111	35	31.53153
operators.R	29	9	31.03448
utils-encoding.R	37	11	29.72973
cnd-entrace.R	213	61	28.63850
expr.R	155	43	27.74194
standalone-obj-type.R	242	67	27.68595
error-backtrace-empty.R	16	4	25.00000
raw.R	9	2	22.22222
cnd-abort.R	691	152	21.99711

Unique lines of code

uloc (Unique Lines Of Code) is enabled by default — pass uloc = FALSE to skip the calculation. Despite the name, scc’s per-record ULOC is the count of unique non-blank lines, which means comments count toward ULOC. SLOC (the code column) is just the raw count of non-blank, non-comment lines and includes duplicates.

The clearest way to see the difference is two minimal fixtures bundled with the package:

# inst/test/file/code.R           — pure code, 5 lines, two duplicate pairs
x <- 1
y <- 2
x <- 1
y <- 2
z <- 3

# inst/test/file/code_w_comments.R — same code, plus three unique comments
# step 1
x <- 1
y <- 2
# step 2
x <- 1
y <- 2
# step 3
z <- 3

Running scc_by_file() on both:

fixtures <- c(
  system.file("test/file/code.R",            package = "glockr"),
  system.file("test/file/code_w_comments.R", package = "glockr")
)
scc_by_file(fixtures, dryness = TRUE)[, c("filename", "lines", "code",
                                          "comments", "uloc", "dryness")] |>
  gt::gt() |>
  gt::tab_header(
    title    = "SLOC vs ULOC on minimal fixtures",
    subtitle = "both files contain the same 5 code lines",
    preheader = "inst/test/file/")

filename	lines	code	comments	uloc	dryness
SLOC vs ULOC on minimal fixtures
both files contain the same 5 code lines
/tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code.R	5	5	0	3	0.60
/tmp/Rtmpq62Dlh/temp_libpath21737312bbbd/glockr/test/file/code_w_comments.R	8	5	3	6	0.75

Two takeaways:

code.R: SLOC = 5 but ULOC = 3. The lines x <- 1 and y <- 2 each appear twice; the unique-line set is {x <- 1, y <- 2, z <- 3}.
code_w_comments.R: Same SLOC = 5, but ULOC = 6. The unique-non-blank set is {# step 1, # step 2, # step 3, x <- 1, y <- 2, z <- 3} — three distinct comments expand ULOC above SLOC.

So ULOC can be either smaller or larger than SLOC depending on the mix of duplication and unique comments.

ULOC at scale

The same calculation aggregates across an entire codebase:

scc(rlang_path) |>
  gt::gt() |>
  gt::tab_header(
    title = "Succinct Code Counter",
    subtitle = "per-language ULOC",
    preheader = "rlang package")

language	files	lines	code	comments	blanks	complexity	weighted_complexity	bytes	uloc
Succinct Code Counter
per-language ULOC
R	156	43171	27043	10926	5202	2239	8.279407	1152271	23250
Markdown	52	13437	11344	0	2093	0	0.000000	426556	4755
C	69	13173	10016	864	2293	1827	18.240815	348962	6679
C Header	70	8272	5322	1750	1200	677	12.720782	265714	4689
YAML	8	635	505	26	104	0	0.000000	15354	399
C++	2	25	21	0	4	0	0.000000	492	17
C++ Header	1	26	21	0	5	0	0.000000	429	21
SVG	10	10	10	0	0	0	0.000000	9687	10
Makefile	1	11	7	0	4	2	28.571429	184	8
License	1	2	2	0	0	0	0.000000	43	2
TOML	1	0	0	0	0	0	0.000000	0	1

DRYness

The dryness column is opt-in: pass dryness = TRUE to either scc() or scc_by_file() and a dryness column appears in the returned tibble, computed locally as uloc / lines (matching scc’s DRYness % formula in its tabular output, applied per record instead of project-wide). Values close to 1 mean a file or language is mostly unique non-blank content; values closer to 0 mean heavy duplication, comments, or blanks. Since the formula needs uloc, passing dryness = TRUE also auto-promotes uloc to TRUE even if you explicitly set it to FALSE.

scc(rlang_path, dryness = TRUE)[, c("language", "lines", "code",
                                    "uloc", "dryness")] |>
  gt::gt() |>
  gt::tab_header(
    title = "Per-language DRYness",
    subtitle = "dryness = uloc / lines",
    preheader = "rlang package")

language	lines	code	uloc	dryness
Per-language DRYness
dryness = uloc / lines
R	43171	27043	23250	0.5385560
Markdown	13437	11344	4755	0.3538736
C	13173	10016	6679	0.5070219
C Header	8272	5322	4689	0.5668520
YAML	635	505	399	0.6283465
C++	25	21	17	0.6800000
C++ Header	26	21	21	0.8076923
SVG	10	10	10	1.0000000
Makefile	11	7	8	0.7272727
License	2	2	2	1.0000000
TOML	0	0	1	0.0000000

For rlang, the R language scores about 0.54 — roughly half of all physical R lines are unique non-blank content, with the remainder being either duplicates, blank lines, or repeated comment lines.

Language remapping

Below are arguments for including unknown file types and languages.

Unknown-extension files

count_as maps file extensions to known scc languages using the format "ext:language" (or several mappings separated by commas). Below we include Quarto and RMarkdown as Markdown.

scc(rlang_path, count_as = "qmd:Markdown,rmd:Markdown") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "including .Qmd/.Rmd", 
    preheader = "rlang package")

language	files	lines	code	comments	blanks	complexity	weighted_complexity	bytes	uloc
Succinct Code Counter
including .Qmd/.Rmd
R	156	43171	27043	10926	5202	2239	8.279407	1152271	23250
Markdown	74	16210	13213	0	2997	0	0.000000	525576	5840
C	69	13173	10016	864	2293	1827	18.240815	348962	6679
C Header	70	8272	5322	1750	1200	677	12.720782	265714	4689
YAML	8	635	505	26	104	0	0.000000	15354	399
C++	2	25	21	0	4	0	0.000000	492	17
C++ Header	1	26	21	0	5	0	0.000000	429	21
SVG	10	10	10	0	0	0	0.000000	9687	10
Makefile	1	11	7	0	4	2	28.571429	184	8
License	1	2	2	0	0	0	0.000000	43	2
TOML	1	0	0	0	0	0	0.000000	0	1

Remap by content

remap_unknown inspects only unrecognized files and assigns a language when a header string matches.

scc(rlang_path, remap_unknown = "# R script:R") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "remap unknown # R script:R", 
    preheader = "rlang package")

language	files	lines	code	comments	blanks	complexity	weighted_complexity	bytes	uloc
Succinct Code Counter
remap unknown # R script:R
R	156	43171	27043	10926	5202	2239	8.279407	1152271	23250
Markdown	52	13437	11344	0	2093	0	0.000000	426556	4755
C	69	13173	10016	864	2293	1827	18.240815	348962	6679
C Header	70	8272	5322	1750	1200	677	12.720782	265714	4689
YAML	8	635	505	26	104	0	0.000000	15354	399
C++	2	25	21	0	4	0	0.000000	492	17
C++ Header	1	26	21	0	5	0	0.000000	429	21
SVG	10	10	10	0	0	0	0.000000	9687	10
Makefile	1	11	7	0	4	2	28.571429	184	8
License	1	2	2	0	0	0	0.000000	43	2
TOML	1	0	0	0	0	0	0.000000	0	1

remap_all does the same for every file, overriding the extension-based detection.

scc(rlang_path, remap_all = "# R script:R") |>
  gt::gt() |> 
  gt::tab_header(
    title = "Succinct Code Counter", 
    subtitle = "remap all # R script:R", 
    preheader = "rlang package")

language	files	lines	code	comments	blanks	complexity	weighted_complexity	bytes	uloc
Succinct Code Counter
remap all # R script:R
R	156	43171	27043	10926	5202	2239	8.279407	1152271	23250
Markdown	52	13437	11344	0	2093	0	0.000000	426556	4755
C	69	13173	10016	864	2293	1827	18.240815	348962	6679
C Header	70	8272	5322	1750	1200	677	12.720782	265714	4689
YAML	8	635	505	26	104	0	0.000000	15354	399
C++	2	25	21	0	4	0	0.000000	492	17
C++ Header	1	26	21	0	5	0	0.000000	429	21
SVG	10	10	10	0	0	0	0.000000	9687	10
Makefile	1	11	7	0	4	2	28.571429	184	8
License	1	2	2	0	0	0	0.000000	43	2
TOML	1	0	0	0	0	0	0.000000	0	1