Big Numbers

The first library I'll make is something to handle big numbers. This library won't have any immediate consumers (it is the first library, after all), but it's a nicely contained bit of functionality that I know I'll need later. Common encryption and authentication mechanisms are built atop RSA cryptography, Diffie-Hellman key exchange, and Elliptic curve cryptography, all of which use numbers much larger than CPU registers. I know I'll want to perform key exchanges in the near future, so I'll start by satisfying the requirements for Diffie-Hellman.

Digits

Math is commonly performed by loading numbers into CPU/FPU registers and then invoking the desired operation (ADD, DIV, FCOS, FSQRT, etc.), leaving the result in a register. In this case, the size of the number is limited by the size of the register. On modern machines, this is usually in the range of 32-128 bits. In RSA/DH/ECC, the keys are often in the range of 1024-4096 bits. To perform math on these values, the software must use many small (32-128 bit) operations to carry out the larger (1024-4096 bit) operation in steps.

Thankfully, this problem is quite easy to solve. Consider decimal notation: even though it only contains the symbols 0-9, you can denote a number of any size simply by using multiple digits. You can also perform math on these multi-digit numbers to produce multi digit results. For example, addition is performed by adding the rightmost digits of each input number and writing the result as the rightmost digit of the answer. This is repeated for each column to the left, carrying over a 1 if the previous column's sum was greater than the largest possible digit.

This technique is straightforward to implement in software with a simple loop over each “column” of digits. It could be implemented using decimal digits, but the machines I'll be using have at least 32-bit registers, so I can efficiently deal with “digits” of 32 bits at a time (i.e. each digit ranges from 0-2,147,483,647 instead of 0-9). Thus, my BigInteger type has a single field: digits: Vec<u32>.

Operations

Diffie-hellman actually needs just a single operation: modular exponentiation (xy % m, where all three of x, y, and m are large integers). The naive way to evaluate this is to perform xy, then reduce the result % m. Even though the BigInteger type handles integers of any size, there are still memory and time constraints. Typical key exchanges would have values for x and y in the range of 2048 and 1024 bits, respectively. The result of xy (i.e. (22048)21024) would produce a result occupying ~3.5×10311 bits. There's no way to work with (or even store) a number that large.

Thankfully, there are well-known techniques for determining the final value (after the % m) without ever actually computing xy. I'll be using Montgomery modular multiplication, as explanations and pseudocode are readily available. Unfortunately, implementing this algorithm requires many more mathematical operators, so I have to implement all of the common operations anyway: addition, subtraction, multiplication, division, and bi-directional shifting. All of these are simple implementations of the normal pen-and-paper techniques, except for division. For division I use Knuth's “Algorithm D”.

Behavior

Because large integers have variable number of digits, they typically store their content on the heap. For example, performing let difference = something - something_else reads two heap-allocated vectors of digits (something and something_else) and returns a newly-allocated vector of digits containing the difference. It would be advantageous to re-use one of the existing buffers if something or something_else are no longer needed, but doing this in a general-purpose way is quite cumbersome. The API would be messier, as users need to specify which of the inputs can have their buffers reused. Additionally, the implementation would be more complex, as the library needs to consider which of the input buffers are marked as reusable, which of those are large enough to hold the result, and when during the computation each input is no longer needed and can be overwritten.

Thankfully, I don't need any of that flexibility and can tailor each mathematical operation to the type of buffer reuse I need to perform modular exponentiation. Specifically, almost every step of the computation uses an intermediate value only once; nearly all mathematical operations can be written to reuse a specific input to store their result. Out of curiosity, I wanted to try a real-world example to see how the simple implementation compared to a general-purpose one.

To negotiate an SSH session key, both parties select a secret value and then “mix” it with a public value received from their peer. The math works out in such a way that both parties end up with the same result — a shared secret — but an eavesdropper who only sees the publicly shared parts cannot infer the secret.

Mathematical details aside, the important part is that negotiating a shared secret requires two modular exponentiations: one to compute the public value to send, and one to mix in the received public value and produce the shared secret. The following code does this on some real values captured from an SSH negotiation, once using the Rust num-bigint crate and once using the newly-created number library:

fn main() {
    println!(
        "num_bigint   correct: {}, allocations: {}",
        run::<num_bigint::BigUint>(),
        unsafe { num_bigint::INSTANTIATION_COUNT },
    );

    println!(
        "number       correct: {}, allocations: {}",
        run::<number::BigInteger>(),
        unsafe { number::INSTANTIATION_COUNT },
    );
}

fn run<T>() -> bool
where
    TFromBytesBe + ModPow + ToBytesBe,
{
    let base = T::from_bytes_be(&[2]);
    let modulus = T::from_bytes_be(&[
        255255255255255255255255201152181623310419452
        1961989813912822028209412788138103204116211
        1901665919155348174812114252422123914925242
        179205586727484310109952055792255310910981
        194692281331811189894126198244766623316655237
        107112559218224461832372385610725190137159165
        1741593617124753123073401028123622891611940
        124184161991915152218725428852111541052263
        16825336207951311019335220163173150289824386
        321338218715821341711215015010910312537874
        1881524241116108820224331245014494704654
        206592271581194424141343155391311622367162
        14318119793240111768220122243203246149882324
        571497312423414910622921210382415225051621
        11414290138172170104255255255255255255255255
    ]);

    let local_secret = T::from_bytes_be(&[
        11423215621240179145218184161187772321152189
        19310415011617263163114221032414572149232196
        14210123175489172152151022236316282145111
        931482284113822885106122010783904618439
        17213115916168172141482102743423325381118
        78197149246238820619272137249392621423253
        1772201972252091005771243107521871642422049
        10213213217923219225580221141631742785167
    ]);
    let local_contribution = base.modpow(&local_secret, &modulus).to_bytes_be();

    // send local contribution to remote
    // receive remote contribution from remote

    let remote_contribution = T::from_bytes_be(&[
        02402161586170184144193423415817154226117
        528010511720736441022231922024880252152
        162215721811725520517114618013217523615872218
        12112521861124154921954623520123817909758
        7418472113232042203164175302181683923149
        8130219341121802148521221181164267110114
        2324014222422061351161201271958214182191
        12417211525216022114414321549179523318101
        241706413817321922017425915324612218976123
        131120472066989879219518610615610020997109
        58212216602001671278017742211782141199185
        145189162120203025463631862021005187132111
        1264313538744033212924910641141629
        133341182061018257648528622614661952
        4175249102501791062081141531162312113287206
        15200722151527217675208196197241170204113233
    ]);
    let shared_secret = remote_contribution
        .modpow(&local_secret, &modulus)
        .to_bytes_be();

    // ensure both libraries produce correct output
    local_contribution[0] == 161
        && local_contribution[local_contribution.len() - 1] == 111
        && shared_secret[0] == 51 && shared_secret[shared_secret.len() - 1] == 114
}

Both libraries were modified to increment a counter each time a new variable-length integer is instantiated. Running the program above displays the number of allocations from the Rust num-bigint library compared to the new number library:

			⌨780 c ⌨74 a ⌨126 r ⌨145 g ⌨115 o ⌨114   ⌨150 r ⌨95 u ⌨118 n ⌨98   ⌨152 - ⌨130 - ⌨126 r ⌨175 e ⌨80 l ⌨98 e ⌨129 a ⌨94 s ⌨52 e ⌨193 ⏎
			␤ ⎙172   ⎙0   Compiling num-bigint v0.2.1 (C:\compare\num-bigint)⏎
			␤ ⎙1418   ⎙0    Finished release [optimized] target(s) in 1.54s⏎
			␤ ⎙6   ⎙0     Running `target\release\num-bigint.exe`⏎
			␤⏎
			␤ ⎙66 num_bigint   correct: true, allocations: 17379⏎
			␤ ⎙30 number       correct: true, allocations: 26⏎
			␤ ␃
		

As you can see, even though only two operations are invoked (two modular exponentiations), the Montgomery reduction algorithm performs many big-integer operations internally to avoid computing xy. This leaves plenty of room for buffer reuse, without resorting to a cumbersome API or hidden buffer pool that would be required to reuse buffers in a general-purpose library.

Despite the significant difference in allocations, there's not a noticeable difference in execution time when running this isolated sample. Even if a performance difference is present, the key exchange is usually a small part of the session; its primary purpose is to negotiate a key to be used with a less expensive cipher. Overall, this is probably more of a feel-good victory than any big performance gain.

The real benefit of the new number library is in simplicity, making it easy to modify and fast to build; the following comparison shows the number of lines in the big-int-based solution compared to this new number library:

			⌨591 c ⌨123 a ⌨90 r ⌨108 g ⌨78 o ⌨171   ⌨115 i ⌨156 n ⌨58 i ⌨108 t ⌨148 ⏎
			␤ ⎙53 Created binary (application) package ⏎
			␤ ⌨690 p ⌨123 r ⌨109 i ⌨118 n ⌨77 t ⌨236 f ⌨219   ⌨270 " ⌨497 n ⌨72 u ⌨156 m ⌨169 - ⌨95 b ⌨97 i ⌨109 g ⌨92 i ⌨106 n ⌨99 t ⌨431   ⌨60 = ⌨129   ⌨258 \ ⌨370 " ⌨288 0 ⌨180 . ⌨84 2 ⌨106 \ ⌨316 " ⌨180 " ⌨220   ⌨165 > ⌨152 > ⌨193   ⌨189 C ⌨77 a ⌨111 r ⌨74 g ⌨69 o ⌨199 . ⌨108 t ⌨73 o ⌨191 m ⌨178 l ⌨159 ⏎
			␤ ⎙10 ⏎
			␤ ⌨744 c ⌨108 a ⌨104 r ⌨114 g ⌨97 o ⌨153   ⌨134 v ⌨134 e ⌨70 n ⌨113 d ⌨77 o ⌨108 r ⌨124   ⌨129 . ⌨100 / ⌨142 v ⌨141 e ⌨73 n ⌨139 d ⌨62 o ⌨116 r ⌨240 ⏎
			␤ ⎙59   ⎙0    Updating crates.io index⏎
			␤ ⎙583   ⎙0   Vendoring autocfg v1.0.0 (~\.cargo\registry\src\github.com-⏎
			␤       1ecc6299db9ec823\autocfg-1.0.0) to ./vendor\autocfg⏎
			␤ ⎙56   ⎙0   Vendoring num-bigint v0.2.6 (~\.cargo\registry\src\github.com-⏎
			␤       1ecc6299db9ec823\num-bigint-0.2.6) to ./vendor\num-bigint⏎
			␤ ⎙114   ⎙0   Vendoring num-integer v0.1.42 (~\.cargo\registry\src\github.com-⏎
			␤        1ecc6299db9ec823\num-integer-0.1.42) to ./vendor\num-integer⏎
			␤ ⎙37   ⎙0   Vendoring num-traits v0.2.11 (~\.cargo\registry\src\github.com-⏎
			␤        1ecc6299db9ec823\num-traits-0.2.11) to ./vendor\num-traits⏎
			␤ ⎙108 To use vendored sources, add this to your .cargo/config for this project:⏎
			␤⏎
			␤ ⎙0 [source.crates-io]⏎
			␤replace-with = "vendored-sources"⏎
			␤⏎
			␤[source.vendored-sources]⏎
			␤directory = "./vendor"⏎
			␤ ⌨1974 f ⌨67 i ⌨159 n ⌨106 d ⌨376   ⌨135 . ⌨110 / ⌨95 v ⌨132 e ⌨86 n ⌨99 d ⌨83 o ⌨98 r ⌨838   ⌨209 - ⌨230 t ⌨155 y ⌨82 p ⌨120 e ⌨120   ⌨339 f ⌨425   ⌨394 - ⌨100 e ⌨196 x ⌨146 e ⌨155 c ⌨104   ⌨117 w ⌨230 c ⌨516   ⌨94 - ⌨129 l ⌨183   ⌨354 { ⌨62 } ⌨269   ⌨216 + ⌨359   ⌨244 | ⌨189   ⌨102 g ⌨89 r ⌨77 e ⌨52 p ⌨159   ⌨218 t ⌨117 o ⌨72 t ⌨169 a ⌨98 l ⌨630 ⏎
			␤ ⎙14  25554 total⏎
			␤ ⌨669 f ⌨75 i ⌨143 n ⌨114 d ⌨139   ⌨486 ~ ⌨267 / ⌨354 p ⌨105 r ⌨86 o ⌨179 j ⌨65 e ⌨208 c ⌨85 t ⌨244 s ⌨158 / ⌨298 n ⌨69 u ⌨189 m ⌨84 b ⌨79 e ⌨56 r ⌨585   ⌨101 - ⌨104 t ⌨181 y ⌨49 p ⌨132 e ⌨145   ⌨106 f ⌨253   ⌨98 - ⌨89 e ⌨188 x ⌨163 e ⌨166 c ⌨108   ⌨139 w ⌨175 c ⌨443   ⌨184 - ⌨142 l ⌨187   ⌨354 { ⌨97 } ⌨562   ⌨214 + ⌨195   ⌨244 | ⌨112   ⌨106 g ⌨78 r ⌨96 e ⌨43 p ⌨96   ⌨138 t ⌨113 o ⌨68 t ⌨209 a ⌨82 l ⌨191 ⏎
			␤ ⎙15  316 total⏎
			␤ ␃		
< Find more posts at mattblagden.com