SOURCE CODE FINGERPRINTING

Our Window Finger Print algorithm (WFP) is used to obtain Open Source fingerprints from full files and source code snippets. Based on a widely adoption scanning algorithm, SCANOSS has chosen this algorithm to compare and identify known open source code.

Get in touch

Original source code

Normalized WFP code

Step 1 – Code normalization

Converting source code into fingerprints starts with normalization. During this process all non-alphanumeric characters in the input will be eliminated

Step 2 – Gram fingerprinting

A series of data samples are taken from the normalized code and fingerprinted.

‘Gram’ stands for the amount of bytes required for a set of examples. We’re using a CRC32C checksum to cover for most Intel chipsets.

‘Gram’ fingerprints from the previously normalized code

Open Source Fingerprinting gram examples

Sorted list of Gram Fingerprints

Step 3 – Window fingerprinting

From the ‘Gram’ Fingerprints a series of data samples are selected. The ‘Window’ refers to the amount of ‘Gram’ Fingerprints required.

A CRC32C checksum is then applied to create the first window hash for the file.

Step 4 – Output formatting

WFP fingerprints should be presented in a simple, readable format for humans & machines.

This is why we created the .wfp format. The .wfp file contains a series of declarations followed by the code fingerprints & the original line numbers.

Example .wfp file (gram=15, window=10)

Blog: Open Source Fingerprinting, a study on WFP reliability

As ‘Gram’ and ‘Window’ value are important for the WFP algorithm output and footprint, we took it upon ourselves to find the most accurate values!

Read more

Find our Open Source Fingerprinting algorithm on Github

And be sure to give it a try.

WFP @ Github

Ready to facilitate the next wave of Open Source adoption?

Get in touch


Looking for more informations?
Download our OSS whitepaper.