Anatomy of a source code

To bit or not to bit?

Source code is present in pretty much everything we interact with today: phone apps, vehicle GPS, aircraft multimedia or flight controls, technological gizmos, services, video games. It’s everywhere.

And deep down, at machine level, it is indeed represented with the “0” and “1” binary form, as shown in every techno-thriller movie.

But we as developers do not spit out binary code (although I remember typing bytes long ago, to “draw” video game sprites 😊)

No, source code is written using a language, with a vocabulary to learn, and a grammar to follow. And just like the human languages we learn at school, we can end up being (more or less) fluent with a programming language.

A Rosetta Stone for developers

There are many programming languages, from the hundreds tracked by TIOBE, to the thousands listed by The Language List.

You can’t obviously learn them all, but many can be regrouped and categorized, and still represent a great variety of possible ways to make a computer do your bidding.

So, do we have to actually be fluent in each language to be able to grasp any piece of code’s meaning? Could there be some kind of Rosetta Stone, a representation to help understand code without knowledge of the language used, and even compare code from different languages?

Enter Al-Khwarizmi

Al-Khwarizmi not only pioneered algebra, but also our beloved algorithms. And this sequence of instructions is what most, if not all programming languages intend to produce.

If we look at an algorithm, regardless of the supporting language, it can be broken down into abstract elements such as statements, conditions, control flow changes, exception management.

Granted, this representation does not convey the whole code content, but it can produce a visualization of its inner logic, and assist advanced analysis functions.

Go with the flow

But enough exposition, here are some examples.

This is just one statement.
Pretty boring right?
This one is more interesting
And there’s more color!

Now that you get the idea, let’s finish with real applications, showing you source code’s inner beauty.

Pyramids, cities out of a sci-fi movie

And last, a glimpse at your two favorite Operating Systems:

Flow graph of a file Manager function from Windows (3.0)

Floq graph of the 'AppCommandProc' function. It is a 1300 lines long "switch..case" for all edition commands
The ‘AppCommandProc’ function is a 1300 lines long “switch..case” for all edition commands

A GPU Driver function from the Linux Kernel (5.6-rc6)

Flow graph of the 'calculate_bandwidth'. The function is a 2000 lines long extensive graphical algorithm
Flow graph of the ‘calculate_bandwidth’ function. It is a 2000 lines long extensive graphical algorithm

Is a picture worth a thousand bytes?

These visual representations of algorithms are compelling, even mesmerizing. Some of them could be displayed in a ‘binary gallery’ (hey, that’s an idea!)

But appearances aside, there is value associated to these control flow graphs:

  • Numerical data, as metrics related to the algorithm structure (depth, vocabulary frequency, nesting level, etc)
  • The structure of nodes itself, which can be viewed as patterns to analyze and categorize

We will come back to these concepts soon, with interesting use cases and applications!

Further readings

Share:

Share on linkedin
Share on twitter
Share on whatsapp

5 thoughts on “Anatomy of a source code”

    • Thank you for your comment.

      To generate a graph, the algorithm is first analyzed and broken down into elementary components.
      This ‘algorithmic signature’ can then be translated into visual tokens.

      Such tokens represent sequential statements, conditions (If, Switch, While …), control flow change (Break, Continue, Throw …), exceptions (Try, Finally, Catch …)

      Reply
      • Thank you – I was curious whether you know of any tools for generating such flow graphs please?

        (I hope that nobody had to build the Linux graph by hand!)

        Reply
    • No, none of the presented graphs were drawn by hand 🙂
      The graphs were produced by the Squore tool, which among other things embeds a code analyzer and the display features.

      Reply

Leave a Comment

Related Posts

A single source of truth - source code

A single source of truth

Software projects produce lots of artifacts over time, obviously the source code, but also requirement and design docs, test cases, bug reports, CI scripts, installation

The safe & secure software factory

The Safe & Secure Software Factory (SSSF) merges the principles of DevOps, Safety Criticality, Cyber Security and Industrial Manufacturing. SSSF enables you to create a

Software quality a winning team

Software quality: Winning as a team

We have exposed a nice framework in previous posts: raw data produce metrics and indicators, and a rating model evaluates quality for each project component.

Hey there!

Subscribe and get an email every time we’ve got a new quality piece on here.