flour on ravioli dough

All code clones are not equal

Like clones from a famous galactic franchise, source code clones can be dangerous if you let them multiply:

  • They carry bugs, which then also multiply.
  • Clones artificially inflate code volume.
  • Maintenance becomes frustrating and time-consuming.

But unlike the movies, there is no magic command to control them all. There are two main classes and they are tracked and dealt with differently.

The two main classes of code clones

Textual clones

These are “dumb”. Often being introduced as a result of an unfortunate copy-paste. They can be detected by simple text search functions.

Here’s how! To find the interesting ones, you should:

  1. Ignore blank lines and comments
  2. Set a size threshold to avoid finding thousands of small, insignificant clones
  3. Decide what percentage of duplicated code qualifies as a clone

All that being done, you are now able to easily find textual clones. Here’s a quick example:

source code example of a textual code clone
Example of a textual code clone.

Algorithmic clones

These are smart. They reflect architecture mishaps or insufficient knowledge of existing code.

You can’t find them by looking for similar texts, because they hide. Either by changing their function and attributes names or by handling different data types. The smart way to track them is to analyze a symbolic representation of the source code and look for algorithmic duplicates.

Below is a typical algorithmic clone with textual cloning in white and algorithmic cloning marked in yellow.

source code example of an algorithmic code clone
Example of an algorithmic code clone (marked in yellow).

But wait, that’s not all!

Finding all these clones (dumb and smart) doesn’t mean you have to handle them equally.

Depending on your objective, you might want to focus on:

  • Clones with the least bugs, minimizing your effort
  • Or on the contrary the most buggy clones, resulting in a sanitized code
  • Or even clones on heavily tested code, improving reliability and test optimization

Finally, throw into the mix the fact that not all code needs the same level of attention and you’ll understand that battling clones requires not just finding them. Battling clones requires a strategy.

Now what?

We now know that clones are out there, carrying bugs, inflating the code, and potentially making development harder and longer.

We also know what to look for (textual or algorithmic clones), and that we should apply some kind of strategy.

But how?

Fortunately, there are solutions to do just that. We’ll get to this in coming posts – to help you win the clones war. 🤖

References & further reading


Share on linkedin
Share on twitter
Share on whatsapp

Leave a Comment

Related Posts

How to make your developers happy

How to make your developers happy

There’s no shortage of articles on how to improve software quality via process improvement. Today I want to look at things from the other side, from the software developer’s point

Improving code quality

Software quality: To the rescue!

In this post we’ll show what a healthy relationship with code quality looks like. After our introductory post, software quality (‘SQ’ for friends) might seem intimidating, maybe even daunting. But

How SpaceX develops software

How SpaceX develops software

SpaceX, a pioneer in commercial space transportation, most recently successfully took Astronauts to the space station with their Crew Dragon launch vehicle. SpaceX have essentially gone from a blank sheet of paper and text books to


Software quality: Origin story

Software quality is a vast field, which has been the subject of many studies, standards and tools for a long time (if we think in “Software time”). To make it

Hey there!

Subscribe and get an email every time we’ve got a new quality piece on here.