glockr: An R Wrapper around Succinct Code Counter

Written in Go

R
Packages
Author

Martin Frigaard

Published

May 24, 2026

Last week I built glockr, an R package that wraps scc, a “very fast accurate code counter with complexity calculations.” Code metrics are a fascination of mine, and now that code is increasingly written by both machines and humans, I wanted a set of tools I could use for comparison.

Background

Adam Tornhill wrote two books on code metrics: Software Design X-Rays and Your Code as a Crime Scene, 2ed. Both are excellent reads, and were the first exposure I had to any measurements around code complexity.1 The approach to metrics and data visualizations Tornhill outlines has definitely improved the way I think about writing (and maintaining code). He covers topics like essential and accidental complexity, and how often code is changed (i.e., measured via version control) is usually as important as any metrics.

The R ecosystem has some great packages for assessing code quality, too:2

  1. cloc (‘Count Lines of Code’) developed by Bob Rudis, which wraps the ‘Perl’ command-line utility.3
  2. pkgstats is a ‘static code analysis tool’ from ROpenSci for R packages.
  3. Luke Tierney’s codetools package also has excellent code analysis tools for R.

What does glockr do?

glockr wraps the scc command-line tool in an R package (much like cloc). scc is written in pure Go, so it’s much faster than the alternatives.4

Install glockr from GitHub:

install.packages('remotes')
remotes::install_github("mjfrigaard/glockr")

We’ll also install gt and dplyr for data manipulation and pretty tables.

install.packages(c('gt', 'dplyr'))
library(glockr)
library(gt)
library(dplyr)

glockr computes many of the same values of the cloc package, but also includes the weighted complexity, unique lines of code (uloc), and Constructive Cost Model (cocomo) estimates.

Verify that scc is installed with:

scc_version()
## [1] "scc version 3.7.0"

The primary function for glockr is scc(). We’ll use the popular rlang package as an example:5

pkg_path <- file.path("rlang/")
rlang_scc <- scc(pkg_path) 
dplyr::glimpse(rlang_scc)
## Rows: 11
## Columns: 10
## $ language            <chr> "R", "Markdown", "C"…
## $ files               <int> 163, 53, 70, 72, 9, …
## $ lines               <int> 44571, 16935, 14014,…
## $ code                <int> 28038, 13866, 10960,…
## $ comments            <int> 11152, 0, 872, 1777,…
## $ blanks              <int> 5381, 3069, 2182, 11…
## $ complexity          <int> 2323, 0, 1851, 691, …
## $ weighted_complexity <dbl> 8.285184, 0.000000, …
## $ bytes               <int> 1188357, 496932, 376…
## $ uloc                <int> 23837, 5358, 6931, 4…

The raw counts in the standard output from scc() includes the following columns:

  1. language: Language of the file
  2. files: Number of files
  3. lines: Total physical line count across all files (code + comments + blanks)
  4. code: Lines of source code (excludes blanks and comments)
  5. comments: Number of total comments
  6. blanks: Number of total blank lines
  7. bytes: total file size in bytes across all files in that language

Complexity & Weighted Complexity

The complexity calculation in scc is an approximation of the standard cyclomatic complexity:6

The reason it’s an approximation is that it’s calculated almost for free from a CPU point of view (since its a cheap lookup when counting), whereas a real cyclomatic complexity count would need to parse the code. It gives a reasonable guess in practice though even if it fails to identify recursive methods. The goal was never for it to be exact.

In short when scc is looking through what it has identified as code if it notices what are usually branch conditions it will increment a counter.

This approximation is an important limitation if your goal is to compare projects written in different languages. However, if your questions on code quality center around 1) “what are the most complex files in this project?” and – by extension – 2) “what code should be refactored?”, these metrics work well.7

Below is the output from scc() for the rlang source files (sorted by complexity).

rlang_scc |> 
    dplyr::arrange(desc(complexity)) |> 
    gt::gt() |> 
    gt::tab_header(
      title = "Succinct Code Count (rlang package)", 
      subtitle = "Sorted by complexity")
Succinct Code Count (rlang package)
Sorted by complexity
language files lines code comments blanks complexity weighted_complexity bytes uloc
R 163 44571 28038 11152 5381 2323 8.285184 1188357 23837
C 70 14014 10960 872 2182 1851 16.888686 376444 6931
C Header 72 8210 5315 1777 1118 691 13.000941 270950 4809
Makefile 1 11 7 0 4 2 28.571429 184 8
Markdown 53 16935 13866 0 3069 0 0.000000 496932 5358
YAML 9 735 587 35 113 0 0.000000 17940 449
C++ 2 25 21 0 4 0 0.000000 492 17
C++ Header 1 26 21 0 5 0 0.000000 429 21
SVG 10 10 10 0 0 0 0.000000 9687 10
License 1 2 2 0 0 0 0.000000 43 2
TOML 1 2 2 0 0 0 0.000000 34 2

This tells us – not surprisingly – that .R code files are the most complex files in this project.

A more useful output is the most complex files in the rlang package, which we can do using:

r_files <- scc_by_file(pkg_path, include_ext = "r")
r_files |> 
  dplyr::arrange(desc(complexity)) |> 
  dplyr::select(
    c(filename, code, complexity)
  ) |> 
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 most complex .R files",
    subtitle = "raw branch-token count",
    preheader = "rlang package")
Top 10 most complex .R files
raw branch-token count
filename code complexity
cnd-abort.R 691 152
standalone-cli.R 432 135
trace.R 807 134
deparse.R 819 111
standalone-vctrs.R 505 107
call.R 451 92
standalone-obj-type.R 206 64
cnd-entrace.R 213 61
utils.R 302 59
expr.R 157 52

If we compare this to the weighted complexity ((complexity / code) * 100), we can see a few of the top files change:

r_files <- scc_by_file(pkg_path, include_ext = "r")
r_files |> 
  dplyr::arrange(desc(weighted_complexity)) |> 
  dplyr::select(
    c(filename, code, complexity, weighted_complexity)
  ) |> 
  head(10) |>
  gt::gt() |>
  gt::tab_header(
    title    = "Top 10 most complex (weighted) .R files",
    subtitle = "Weighted Complexity",
    preheader = "rlang package")
Top 10 most complex (weighted) .R files
Weighted Complexity
filename code complexity weighted_complexity
operators.R 32 11 34.37500
expr.R 157 52 33.12102
standalone-cli.R 432 135 31.25000
standalone-obj-type.R 206 64 31.06796
utils-cli-tree.R 113 35 30.97345
utils-encoding.R 37 11 29.72973
cnd-entrace.R 213 61 28.63850
error-backtrace-empty.R 16 4 25.00000
raw.R 9 2 22.22222
cnd-abort.R 691 152 21.99711

We won’t spend too much time on these measures, because as you’ll see below, scc has quite a few code metrics to choose from.

Unique lines of code (uloc)

The unique lines of code (ULOC) counts unique non-blank lines including comments. The scc author argues – with help8 – that this is better than relying on SLOC (source lines of code):

Compared to SLOC, not only are blank lines discounted, but so are close-brace lines and other repetitive code such as common includes. On the other hand, ULOC counts comments, which require just as much maintenance as the code around them does, while avoiding inflating the result with license headers which appear in every file, for example.

For comparison, SLOC (the value in the code column) is just the raw count of non-blank, non-comment lines and includes duplicates. uloc is automatically included with both scc() and scc_by_file().

The best way to see the differences is with the two minimal code files:

# pure code, 5 lines, two duplicate pairs
# save as code.R
x <- 1
y <- 2
x <- 1
y <- 2
z <- 3
# same code, plus three unique comments
# save as code_w_comments.R 
# step 1
x <- 1
y <- 2
# step 2
x <- 1
y <- 2
# step 3
z <- 3

Now we can compare these with scc_by_file() and view the lines, code, comments and uloc columns.

fixtures <- c(
  file.path("code.R"),
  file.path("code_w_comments.R")
)
scc_by_file(fixtures) |>
  dplyr::select(c(filename, lines, 
  code, comments, uloc)) |> 
  gt::gt() |>
  gt::tab_header(
    title    = "SLOC vs ULOC",
    subtitle = "Both files contain the same 5 code lines",
    preheader = "Local ULOC comparison")
SLOC vs ULOC
Both files contain the same 5 code lines
filename lines code comments uloc
code.R 5 5 0 3
code_w_comments.R 9 5 4 7

DRYness percentage

The DRYness % can be added with dryness = TRUE. This is computed locally as uloc / lines, which matches scc’s DRYness % formula, applied per record instead of project-wide.

scc_by_file(fixtures, dryness = TRUE) |>
  dplyr::select(c(filename, lines, 
  code, comments, uloc, dryness)) |> 
  gt::gt() |>
  gt::tab_header(
    title    = "SLOC vs ULOC + DRYness",
    subtitle = "Both files contain the same 5 code lines",
    preheader = "Local SLOC, ULOC & DRYness comparison")
SLOC vs ULOC + DRYness
Both files contain the same 5 code lines
filename lines code comments uloc dryness
code.R 5 5 0 3 0.6000000
code_w_comments.R 9 5 4 7 0.7777778

DRYness values close to 1 mean a file or language is mostly unique non-blank content and values closer to 0 mean heavy duplication, comments, or blanks.

COCOMO cost estimates

scc includes a COCOMO 81 model that estimates project effort and cost from SLOC. When cocomo = TRUE, the scc() output becomes a list and the COCOMO output is in the cocomo tibble.

rlang_cocomo <- scc(pkg_path, cocomo = TRUE)
## # A tibble: 11 × 10
##    language   files lines  code comments blanks
##    <chr>      <int> <int> <int>    <int>  <int>
##  1 R            163 44571 28038    11152   5381
##  2 Markdown      53 16935 13866        0   3069
##  3 C             70 14014 10960      872   2182
##  4 C Header      72  8210  5315     1777   1118
##  5 YAML           9   735   587       35    113
##  6 C++            2    25    21        0      4
##  7 C++ Header     1    26    21        0      5
##  8 SVG           10    10    10        0      0
##  9 Makefile       1    11     7        0      4
## 10 License        1     2     2        0      0
## 11 TOML           1     2     2        0      0
## # ℹ 4 more variables: complexity <int>,
## #   weighted_complexity <dbl>, bytes <int>,
## #   uloc <int>
## # A tibble: 3 × 3
##   metric                    project_type value    
##   <chr>                     <chr>        <chr>    
## 1 Estimated Cost to Develop organic      $1,948,3…
## 2 Estimated Schedule Effort organic      17.72 mo…
## 3 Estimated People Required organic      9.77

The scc() output is in the scc tibble:

names(rlang_cocomo)
## [1] "scc"    "cocomo"

We can customize the COCOMO model with the following arguments:

  1. avg_wage: annual salary in local currency
  2. cocomo_project_type: “organic”, “semi-detached”, or “embedded”
  3. eaf: effort adjustment factor
  4. overhead: corporate overhead multiplier
  5. currency_symbol: symbol shown in the value column
  6. auto_print_scc: skip auto-printing the language tibble

NOTE: These arguments only work when cocomo = TRUE

embedded <- scc(pkg_path,
    cocomo = TRUE,         
    avg_wage = 120000L,        
    cocomo_project_type = "embedded",     
    eaf = 1.1,            
    overhead = 2.0,            
    currency_symbol = "$",            
    auto_print_scc = FALSE)  
## # A tibble: 3 × 3
##   metric                    project_type value    
##   <chr>                     <chr>        <chr>    
## 1 Estimated Cost to Develop embedded     $10,525,…
## 2 Estimated Schedule Effort embedded     18.57 mo…
## 3 Estimated People Required embedded     28.35
embedded$cocomo |> 
  gt::gt() |> 
  gt::tab_header(
    title = "COCOMO", 
    subtitle = "embedded project", 
    preheader = "rlang package")
COCOMO
embedded project
metric project_type value
Estimated Cost to Develop embedded $10,525,311
Estimated Schedule Effort embedded 18.57 months
Estimated People Required embedded 28.35

Why not use COCOMO II ???

scc() only gives us raw structural numbers (lines, code, complexity, bytes, etc.), but COCOMO II needs 22 judgement inputs we can’t derive from source code:

  • The current COCOMO project types ("organic" / "semi-detached" / "embedded") don’t translate to COCOMO II.
  • COCOMO II uses 5 scale factors to esitmate organizational/process attributes
  • COCOMO II also uses 17 effort multipliers for product, platform, personnel, and project context
    • Every one of these is a six-level human rating about the team, requirements stability, reuse strategy, tool maturity, schedule pressure, etc.
  • The only one that might map would be CPLX (product complexity), but COCOMO II’s CPLX is architectural/algorithmic complexity, not McCabe’s cyclomatic.

Visualizing code complexity

In Your Code as a Crime Scene, Tornhill covers a collection of methods for visualizing code, but my favorite is his use of circle packing for identifying “hotspots.” These are described below:

Each circle represents a part of the system. The more complex a module, as measured by lines of code, the larger the circle. We then use color to serve as an attention magnet for the most critical property: change.” - Adam Tornhill (Your Code as a Crime Scene, 2ed)

Tornhill’s Hotspots from Your Code as a Crime Scene, 2ed

Tornhill’s Hotspots from Your Code as a Crime Scene, 2ed

The glockr package doesn’t have data on code changes (yet), but we can build circle packer graphs using weighted complexity and DRYness.

We’ll start by making each circle’s area tied to the file’s weighted_complexity, meaning the files that ‘pack more branching per line’ are larger.

Next, the color fill aesthetic will be the file’s DRYness % (uloc / lines), with darker hues meaning more duplication relative to line counts, and lighter hues meaning more unique, non-blank content.

We’ll start with a fresh dataset using scc_by_file(), setting dryness to TRUE.

r_files <- scc_by_file(pkg_path, include_ext = "r", dryness = TRUE)

We can exclude files with zero weighted_complexity because there is nothing to size the circle by:

r_files <- dplyr::filter(r_files, weighted_complexity > 0)
dplyr::glimpse(r_files)
## Rows: 114
## Columns: 14
## $ language            <chr> "R", "R", "R", "R", …
## $ filename            <chr> "c-lib.R", "aaa-topi…
## $ location            <chr> "/Users/mjfrigaard/p…
## $ lines               <int> 413, 93, 355, 551, 2…
## $ code                <int> 312, 65, 280, 63, 23…
## $ comments            <int> 26, 10, 12, 472, 3, …
## $ blanks              <int> 75, 18, 63, 16, 40, …
## $ complexity          <int> 21, 4, 49, 6, 5, 14,…
## $ weighted_complexity <dbl> 6.7307692, 6.1538462…
## $ bytes               <int> 9381, 2708, 10127, 1…
## $ uloc                <int> 244, 63, 227, 333, 1…
## $ dryness             <dbl> 0.5907990, 0.6774194…
## $ generated           <lgl> FALSE, FALSE, FALSE,…
## $ minified            <lgl> FALSE, FALSE, FALSE,…

The packcircles::circleProgressiveLayout() can be used to compute the area-proportional circle size:

library(packcircles)
packing <- packcircles::circleProgressiveLayout(
  r_files$weighted_complexity, sizetype = "area"
)

This creates a dataset with x, y, and radius columns (with the same number of observations as our previous r_files dataset).

dplyr::glimpse(packing)
## Rows: 114
## Columns: 3
## $ x      <dbl> -1.46371800, 1.39958211, 0.052866…
## $ y      <dbl> 0.0000000, 0.0000000, -3.5102885,…
## $ radius <dbl> 1.4637180, 1.3995821, 2.3601744, …

The number of rows in packing is important, because it allows us to bind the filename, weighted_complexity, and dryness columns back to the packing data:

packing <- dplyr::bind_cols(
  packing, # circle size data 
  dplyr::select(r_files, # original columns
    filename, weighted_complexity, dryness)
  )
dplyr::glimpse(packing)
## Rows: 114
## Columns: 6
## $ x                   <dbl> -1.46371800, 1.39958…
## $ y                   <dbl> 0.0000000, 0.0000000…
## $ radius              <dbl> 1.4637180, 1.3995821…
## $ filename            <chr> "c-lib.R", "aaa-topi…
## $ weighted_complexity <dbl> 6.7307692, 6.1538462…
## $ dryness             <dbl> 0.5907990, 0.6774194…

The circleLayoutVertices() function from packcircles creates “a data frame of circle vertices for plotting

polygons <- packcircles::circleLayoutVertices(packing, npoints = 60)
dplyr::glimpse(polygons)
## Rows: 6,954
## Columns: 3
## $ x  <dbl> 0.00000000, -0.00801840, -0.03198575,…
## $ y  <dbl> 0.0000000, 0.1530002, 0.3043241, 0.45…
## $ id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

Now we’re going to perform a positional join using the id from polygons and a new id we’ll create in r_files using dplyr::row_number() (and limit the columns to only filename, weighted_complexity, and dryness).

polygons <- polygons |>
  dplyr::left_join(
    r_files |>
      dplyr::mutate(id = dplyr::row_number()) |>
      dplyr::select(
        id, 
        filename, 
        weighted_complexity, 
        dryness),
    by = "id"
  )
dplyr::glimpse(polygons)
## Rows: 6,954
## Columns: 6
## $ x                   <dbl> 0.00000000, -0.00801…
## $ y                   <dbl> 0.0000000, 0.1530002…
## $ id                  <int> 1, 1, 1, 1, 1, 1, 1,…
## $ filename            <chr> "c-lib.R", "c-lib.R"…
## $ weighted_complexity <dbl> 6.730769, 6.730769, …
## $ dryness             <dbl> 0.590799, 0.590799, …
  1. The x, y, and group aesthetics draw the circles, and fill is the dryness percentage value.
  2. The geom_polygon() is used to connect start and end points (and fill the area inside)
  3. scale_fill_viridis_c() is used for ‘filling’ continuous data, and coord_equal() ensures ensures equal unit length on the x-axis and y-axis.
ggplot2::ggplot(polygons,
                ggplot2::aes(
                  x = x, 
                  y = y, 
                  group = id, 
                  fill = dryness)) +
  ggplot2::geom_polygon(
    colour = "white", 
    linewidth = 0.2) + 
  ggplot2::scale_fill_viridis_c(
    name    = "DRYness\n(uloc / lines)",
    option  = "viridis",
    limits  = c(0, 1)
  ) +
  ggplot2::coord_equal() 

We can’t label every file in the circle packer graph, but we can make sure the files with the highest levels of complexity have text labels. We’ll create a top_labels dataset for the top 10 weighted_complexity values.

top_labels <- packing |>
  dplyr::slice_max(weighted_complexity, n = 10)

Given the color palette we’re using, we will make sure our labels have a dark outline (behind white text) with the shadowtext package.

remotes::install_github("GuangchuangYu/shadowtext")

Now we can add the labels and theme_void():

# for the labels
library(shadowtext)
ggplot2::ggplot(polygons,
                ggplot2::aes(
                  x = x, 
                  y = y, 
                  group = id, 
                  fill = dryness)) +
  ggplot2::geom_polygon(
    colour = "white", 
    linewidth = 0.2) +
  # add text labels
  shadowtext::geom_shadowtext(
    data    = top_labels,
    mapping = ggplot2::aes(
      x = x, 
      y = y, 
      label = filename),
    inherit.aes = FALSE,
    size      = 3,
    colour    = "#2B2B2B",     # dark text
    bg.colour = "white",       # white halo so dark text stays legible
    bg.r      = 0.15,
    fontface  = "bold"
  ) +
  ggplot2::scale_fill_viridis_c(
    name    = "DRYness\n(uloc / lines)",
    option  = "viridis",
    limits  = c(0, 1)
  ) +
  ggplot2::coord_equal() +
  ggplot2::theme_void(base_size = 11)

This looks better. I’ll add some finishing touches with lab() and theme():

# for the labels
library(shadowtext)
ggplot2::ggplot(polygons,
                ggplot2::aes(
                  x = x, 
                  y = y, 
                  group = id, 
                  fill = dryness)) +
  ggplot2::geom_polygon(
    colour = "white", 
    linewidth = 0.2) +
  # add text labels
  shadowtext::geom_shadowtext(
    data    = top_labels,
    mapping = ggplot2::aes(x = x, y = y, label = filename),
    inherit.aes = FALSE,
    size      = 3,
    colour    = "#2B2B2B",     # dark text
    bg.colour = "white",       # white halo so dark text stays legible
    bg.r      = 0.15,
    fontface  = "bold"
  ) +
  ggplot2::scale_fill_viridis_c(
    name    = "DRYness\n(uloc / lines)",
    option  = "viridis",
    limits  = c(0, 1)
  ) +
  ggplot2::coord_equal() +
  ggplot2::labs(
    title    = "rlang R files: complexity vs. DRYness",
    subtitle = "circle area = weighted complexity; fill = uloc / lines; top 10 labelled",
    x = "",
    y = "",
    caption  = "scc_by_file(..., dryness = TRUE)"
  ) +
  ggplot2::theme_void(base_size = 11) +
  ggplot2::theme(
    plot.title = ggplot2::element_text(face = "bold"),
    legend.position = "right"
  )

In the Visualization vignette I cover how to convert this into an interactive graph. In future versions, I’ll add the change (Git) data for more interesting hotspot visualizations.

Footnotes

  1. Codacy has a great overview of code metrics in this blog post..↩︎

  2. Other packages worth noting are sloop and lobstr – these don’t calculate metrics, but have been developed to improve users understanding of R and object oriented programming and are covered in Advanced R, 2nd ed..↩︎

  3. See the cloc command-line utility documentation here.↩︎

  4. Check the performance comparison of scc here.↩︎

  5. This is a downloaded a local version of the source files as of 2026-05-26.↩︎

  6. How scc counts lines of code is outlined in the ‘Features’ section of the README.md↩︎

  7. Read the full description of Complexity Estimates.↩︎

  8. Read this discussion here.↩︎