Track2 Character Encoding
Track2 is an American Banking Association (ABA) format for storing information on the magnetic stripe on your credit card. Track2 is also included in EMV chip results under tag 57.
Example Track2 from a dead card
The semicolon is a header. The next ~19 digits are the card number. The equal sign is a field separator. The next 4 digits are expiration in YYMM format. The question mark is the trailer. And so it goes.
Looks easy to pull apart with a regular expression or a simple parser, right?
Well, yes. But also no.
This is what you’ll actually receive from a card reader, if you’re lucky enough to receive it with the odd parity bits removed.
Track2 encoded bytes in decimal (hint: you can’t print this, much less regex it)
180 8 132 144 1 7 57 65 93 36 8 34 96 0 112 0 243
Track2 is the only one of the 3 track formats with its own character encoding. If you get it with parity bits included you’re looking at ISO 7811 modified 5-bit ASCII encoding.
Validate and remove the parity bits and you get 4-bit binary coded decimal (BCD).
Anything look familiar here? Maybe a base16 alphabet? But that isn’t even the interesting part.
According to Wikipedia the six punctuation characters : ; < = > ? were chosen because they fall into the range 0x30 through 0x3f alongside 0–10 numerals in the ASCII table.
So you can add 0x30 to any of these 4-bit values and get the correct ASCII code.
Let’s try a few in a REPL just to be sure.
iex(12)> 0b0001 + 0x30
iex(14)> ?1 == 0b0001 + 0x30
iex(15)> 0b01110 + 0x30
iex(17)> ?> == 0b01110 + 0x30
Seems legit. Now let’s try it out with a full string and some sloppy C++.
First we will try encoding our example track from ASCII to 4-bit BCD by subtracting 0x30.
Just like hex string encoding, we’ve halved the size of our string. The semicolon was represented by an entire 8-bit byte in ASCII, but we cut it down to 4 bits. Then we took the 4 from the card number, cut it down to 4 bits as well, and shoved it into the same 8-bit byte alongside the semicolon. Rinse & repeat until every character has been converted.
I’ll try to break this down step by step for those unfamiliar with bitwise operations.
- The first time through the loop we get the semicolon which has a binary value of 0b111011.
- We subtract 0x30 from the semicolon which gives us 0b1011. You should recognize this from our 4-bit BCD chart. But since this is stored in an 8-bit integer, its actually 0b00001011.
- We left shift by 4 (<<) to make space for the next character and get 0b10110000.
- Now we’re on the second trip through the loop and we receive the 4 which is 0b110100. The accumulator still contains 0b10110000 because we haven’t reset it yet.
- Subtract 0x30 from 0b110100 to get 0b0100. Once again this should match 4 on our 4-bit BCD chart. But like before its stored in an 8-bit integer so its actually 0b00000100.
- Use bitwise or (|) to “append” those 4 bits to the accumulator resulting in a byte that represents both the semicolon and the 4: 0b10110100.
- Keep looping until every character has been encoded.
Now let’s try decoding our encoded string.
I’ll step through this one too.
- Take the left 4 bits, which happen to represent the semi-colon, using a right shift (>>) by 4 to dump the right 4 bits. 0b10110100 becomes 0b00001011.
- Take the right 4 bits by masking out the left 4 bits with a bitwise and (&). 0b10110100 becomes 0b00000100.
- Add 0x30 to each one to get a valid ASCII code and append them to the output string.
- Rinse and repeat for every byte.
Incidentally this decode method is more elegant in Elixir.
for(<<c::4 <- encoded>>, do: c + 0x30) |> to_string()
One line. I’m not saying Elixir is better. It’s just neat.