Compute hamming distance of byte arrays in Scala
Hamming distance is the number of positions in which two different strings are different.
So the distance between both strings would be:
- “abcde” and “abede” - 1
- “momomo” and “nonono” - 3
Implementing this would not be difficult:
def hammingDistance(s1: String, s2: String): Int = s1.zip(s2).count(c => c._1 != c._2)
Basically you zip ("abc".zip("bcd")
produces scala.collection.immutable.IndexedSeq[(Char, Char)] = Vector((a,b), (b,d), (c,e))
) strings together and compare corresponding characters. Simple, right?
However, the task I was working on required comparing individual bits, which are slightly more tricky to get to. I couldn’t find quickly any public snippets that do this, so I wrote my own:
To get the number of different bits of two bytes we only need to XOR them and compute how many bits are set:
For example: 0b10010101 XOR 0b01011101 = 0b11001000
.
To calculate the number of bits set we just shift the XORed byte i
number of times and AND 1
to get rid of other bits.
This method has a performance flaw as it shifts i times for each bit as opposed to shifting once per bit and relying on previous results, but oh well - it’s suitable when high performance is not necessary.
Subscribe via RSS