A check digit is a form of redundancy check used for error detection on identification numbers, such as bank account numbers, which are used in an application where they will at least sometimes be input manually. It is analogous to a binary parity bit used to check for errors in computer-generated data. It consists of one or more digits (or letters) computed by an algorithm from the other digits (or letters) in the sequence input.
With a check digit, one can detect simple errors in the input of a series of characters (usually digits) such as a single mistyped digit or some permutations of two successive digits.
In choosing a system, a high probability of catching errors is traded off against implementation difficulty; simple check digit systems are easily understood and implemented by humans but do not catch as many errors as complex ones, which require sophisticated programs to implement.
A desirable feature is that left-padding with zeros should not change the check digit. This allows variable length numbers to be used and the length to be changed. If there is a single check digit added to the original number, the system will not always capture multiple errors, such as two replacement errors (12 → 34) though, typically, double errors will be caught 90% of the time (both changes would need to change the output by offsetting amounts).
A very simple check digit method would be to take the sum of all digits (digital sum) modulo 10. This would catch any single-digit error, as such an error would always change the sum, but does not catch any transposition errors (switching two digits) as re-ordering does not change the sum.
A slightly more complex method is to take the weighted sum of the digits, modulo 10, with different weights for each number position.
To illustrate this, for example if the weights for a four digit number were 5, 3, 2, 7 and the number to be coded was 4871, then one would take 5×4 + 3×8 + 2×7 + 7×1 = 65, i.e. 65 modulo 10, and the check digit would be 5, giving 48715.
Systems with weights of 1, 3, 7, or 9, with the weights on neighboring numbers being different, are widely used: for example, 31 31 weights in UPC codes, 13 13 weights in EAN numbers (GS1 algorithm), and the 371 371 371 weights used in United States bank routing transit numbers. This system detects all single-digit errors and around 90% of transposition errors. 1, 3, 7, and 9 are used because they are coprime with 10, so changing any digit changes the check digit; using a coefficient that is divisible by 2 or 5 would lose information (because 5×0 = 5×2 = 5×4 = 5×6 = 5×8 = 0 modulo 10) and thus not catch some single-digit errors. Using different weights on neighboring numbers means that most transpositions change the check digit; however, because all weights differ by an even number, this does not catch transpositions of two digits that differ by 5, (0 and 5, 1 and 6, 2 and 7, 3 and 8, 4 and 9), since the 2 and 5 multiply to yield 10.
The ISBN-10 code instead uses modulo 11, which is prime, and all the number positions have different weights 1, 2, ... 10. This system thus detects all single digit substitution and transposition errors (including jump transpositions), but at the cost of the check digit possibly being 10, represented by "X". (An alternative is simply to avoid using the serial numbers which result in an "X" check digit.) ISBN-13 instead uses the GS1 algorithm used in EAN numbers.
More complicated algorithms include the Luhn algorithm (1954), which captures 98% of single digit transposition errors (it does not detect 90 ↔ 09) and the still more sophisticated Verhoeff algorithm (1969), which catches all single digit substitution and transposition errors, and many (but not all) more complex errors. Similar is another abstract algebra-based method, the Damm algorithm (2004), that too detects all single-digit errors and all adjacent transposition errors. These three methods use a single check digit and will therefore fail to capture around 10% of more complex errors. To reduce this failure rate, it is necessary to use more than one check digit (for example, the modulo 97 check referred to below, which uses two check digits - for the algorithm, see International Bank Account Number) and/or to use a wider range of characters in the check digit, for example letters plus numbers.
For instance, the UPC-A barcode for a box of tissues is "036000241457". The last digit is the check digit "7", and if the other numbers are correct then the check digit calculation must produce 7.
Another example: to calculate the check digit for the following food item "01010101010x".
The final character of a ten-digit International Standard Book Number is a check digit computed so that multiplying each digit by its position in the number (counting from the right) and taking the sum of these products modulo 11 is 0. The digit the farthest to the right (which is multiplied by 1) is the check digit, chosen to make the sum correct. It may need to have the value 10, which is represented as the letter X. For example, take the ISBN 0-201-53082-1: The sum of products is 0×10 + 2×9 + 0×8 + 1×7 + 5×6 + 3×5 + 0×4 + 8×3 + 2×2 + 1×1 = 99 ≡ 0 (mod 11). So the ISBN is valid. Note that positions can also be counted from left, in which case the check digit is multiplied by 10, to check validity: 0×1 + 2×2 + 0×3 + 1×4 + 5×5 + 3×6 + 0×7 + 8×8 + 2×9 + 1×10 = 143 ≡ 0 (mod 11).
ISBN 13 (in use January 2007) is equal to the EAN-13 code found underneath a book's barcode. Its check digit is generated the same way as the UPC except that the even digits are multiplied by 3 instead of the odd digits.
EAN (European Article Number) check digits (administered by GS1) are calculated by summing each of the odd position numbers multiplied by 3 and then by adding the sum of the even position numbers. Numbers are examined going from right to left, so the first odd position is the last digit in the code. The final digit of the result is subtracted from 10 to calculate the check digit (or left as-is if already zero). A GS1 check digit calculator and detailed documentation is online at GS1's website. Another official calculator page shows that the mechanism for GTIN-13 is the same for Global Location Number/GLN.
The NOID Check Digit Algorithm (NCDA), in use since 2004, is designed for application in persistent identifiers and works with variable length strings of letters and digits, called extended digits. It is widely used with the ARK identifier scheme and somewhat used with schemes, such as the Handle System and DOI. An extended digit is constrained to betanumeric characters, which are alphanumerics minus vowels and the letter 'l' (ell). This restriction helps when generating opaque strings that are unlikely to form words by accident and will not contain both O and 0, or l and 1. Having a prime radix of R=29, the betanumeric repertoire permits the algorithm to guarantee detection of single-character and transposition errors for strings less than R=29 characters in length (beyond which it provides a slightly weaker check). The algorithm generalizes to any character repertoire with a prime radix R and strings less than R characters in length.
Notable algorithms include: