Big Numbers

The first library I'll make is something to handle big numbers. This library won't have any immediate consumers (it is the first library, after all), but it's a nicely contained bit of functionality that I know I'll need later. Common encryption and authentication mechanisms are built atop RSA cryptography, Diffie-Hellman key exchange, and Elliptic curve cryptography, all of which use numbers much larger than CPU registers. I know I'll want to perform key exchanges in the near future, so I'll start by satisfying the requirements for Diffie-Hellman.

Digits

Math is commonly performed by loading numbers into CPU/FPU registers and then invoking the desired operation (ADD, DIV, FCOS, FSQRT, etc.), leaving the result in a register. In this case, the size of the number is limited by the size of the register. On modern machines, this is usually in the range of 32-128 bits. In RSA/DH/ECC, the keys are often in the range of 1024-4096 bits. To perform math on these values, the software must use many small (32-128 bit) operations to carry out the larger (1024-4096 bit) operation in steps.

Thankfully, this problem is quite easy to solve. Consider decimal notation: even though it only contains the symbols 0-9, you can denote a number of any size simply by using multiple digits. You can also perform math on these multi-digit numbers to produce multi digit results. For example, addition is performed by adding the rightmost digits of each input number and writing the result as the rightmost digit of the answer. This is repeated for each column to the left, carrying over a 1 if the previous column's sum was greater than the largest possible digit.

This technique is straightforward to implement in software with a simple loop over each “column” of digits. It could be implemented using decimal digits, but the machines I'll be using have at least 32-bit registers, so I can efficiently deal with “digits” of 32 bits at a time (i.e. each digit ranges from 0-2,147,483,647 instead of 0-9). Thus, my BigInteger type has a single field: digits: Vec<u32>.

Operations

Diffie-hellman actually needs just a single operation: modular exponentiation (xy % m, where all three of x, y, and m are large integers). The naive way to evaluate this is to perform xy, then reduce the result % m. Even though the BigInteger type handles integers of any size, there are still memory and time constraints. Typical key exchanges would have values for x and y in the range of 2048 and 1024 bits, respectively. The result of xy (i.e. (22048)21024) would produce a result occupying ~3.5×10311 bits. There's no way to work with (or even store) a number that large.

Thankfully, there are well-known techniques for determining the final value (after the % m) without ever actually computing xy. I'll be using Montgomery modular multiplication, as explanations and pseudocode are readily available. Unfortunately, implementing this algorithm requires many more mathematical operators, so I have to implement all of the common operations anyway: addition, subtraction, multiplication, division, and bi-directional shifting. All of these are simple implementations of the normal pen-and-paper techniques, except for division. For division I use Knuth's “Algorithm D”.

Behavior

Because large integers have variable number of digits, they typically store their content on the heap. For example, performing let difference = something - something_else reads two heap-allocated vectors of digits (something and something_else) and returns a newly-allocated vector of digits containing the difference. It would be advantageous to re-use one of the existing buffers if something or something_else are no longer needed, but doing this in a general-purpose way is quite cumbersome. The API would be messier, as users need to specify which of the inputs can have their buffers reused. Additionally, the implementation would be more complex, as the library needs to consider which of the input buffers are marked as reusable, which of those are large enough to hold the result, and when during the computation each input is no longer needed and can be overwritten.

Thankfully, I don't need any of that flexibility and can tailor each mathematical operation to the type of buffer reuse I need to perform modular exponentiation. Specifically, almost every step of the computation uses an intermediate value only once; nearly all mathematical operations can be written to reuse a specific input to store their result. Out of curiosity, I wanted to try a real-world example to see how the simple implementation compared to a general-purpose one.

To negotiate an SSH session key, both parties select a secret value and then “mix” it with a public value received from their peer. The math works out in such a way that both parties end up with the same result — a shared secret — but an eavesdropper who only sees the publicly shared parts cannot infer the secret.

Mathematical details aside, the important part is that negotiating a shared secret requires two modular exponentiations: one to compute the public value to send, and one to mix in the received public value and produce the shared secret. The following code does this on some real values captured from an SSH negotiation, once using the Rust num-bigint crate and once using the newly-created number library:

fn main() {
    println!(
        "num_bigint   correct: {}, allocations: {}",
        run::<num_bigint::BigUint>(),
        unsafe { num_bigint::INSTANTIATION_COUNT },
    );

    println!(
        "number       correct: {}, allocations: {}",
        run::<number::BigInteger>(),
        unsafe { number::INSTANTIATION_COUNT },
    );
}

fn run<T>() -> bool
where
    TFromBytesBe + ModPow + ToBytesBe,
{
    let base = T::from_bytes_be(&[2]);
    let modulus = T::from_bytes_be(&[
        25525525525525525525525520115218162331041945219619898,
        13912822028209412788138103204116211190166591915534,
        8174812114252422123914925179205586727484310109242,
        952055792255310910981194692281331811189894126198244,
        766623316655237107112559218224461832372385610725190,
        137159165174159361712475312307340102812362289161194,
        0124184161991915152218725428852111541052263168253,
        362079513110193352201631731502898243863213382187158,
        2134171121501501091031253787418815242411161088202,
        2433124501449470465420659227158119442414134315539,
        131162236716214318119793240111768220122243203246149,
        882324571497312423414910622921210382415225051621,
        11414290138172170104255255255255255255255255,
    ]);

    let local_secret = T::from_bytes_be(&[
        11423215621240179145218184161187772321152189193104150,
        1161726316311422103241457214923219614210123175489172,
        1521510222363162821451119314822841138228851061220107,
        8390461843917213115916168172141482102743423325381,
        11878197149246238820619272137249392621423253177220,
        1972252091005771243107521871642422049102132132179232,
        19225580221141631742785167,
    ]);
    let local_contribution = base.modpow(&local_secret, &modulus).to_bytes_be();

    // send local contribution to remote
    // receive remote contribution from remote

    let remote_contribution = T::from_bytes_be(&[
        024021615861701841441934234158171542261175280105,
        117207364410222319220248802521521622157218117255205,
        17114618013217523615872218121125218611241549219546235,
        2012381790975874184721132320422031641753021816839,
        23149813021934112180214852122118116426711011423240,
        14222422061351161201271958214182191124172115252160221,
        144143215491795233181012417064138173219220174259,
        153246122189761231311204720669898792195186106156100,
        2099710958212216602001671278017742211782141199185145,
        189162120203025463631862021005187132111126431353874,
        40332129249106411416291333411820610182576485286,
        226146619524175249102501791062081141531162312113287,
        20615200722151527217675208196197241170204113233,
    ]);
    let shared_secret = remote_contribution
        .modpow(&local_secret, &modulus)
        .to_bytes_be();

    // ensure both libraries produce correct output
    (local_contribution[0] == 161 && local_contribution[local_contribution.len() - 1] == 111)
        && (shared_secret[0] == 51 && shared_secret[shared_secret.len() - 1] == 114)
}

Both libraries were modified to increment a counter each time a new variable-length integer is instantiated. Running the program above displays the number of allocations from the Rust num-bigint library compared to the new number library:

			⌨780 c ⌨74 a ⌨126 r ⌨145 g ⌨115 o ⌨114   ⌨150 r ⌨95 u ⌨118 n ⌨98   ⌨152 - ⌨130 - ⌨126 r ⌨175 e ⌨80 l ⌨98 e ⌨129 a ⌨94 s ⌨52 e ⌨193 ⏎
			␤ ⎙172   ⎙0   Compiling num-bigint v0.2.1 (C:\compare\num-bigint)⏎
			␤ ⎙1418   ⎙0    Finished release [optimized] target(s) in 1.54s⏎
			␤ ⎙6   ⎙0     Running `target\release\num-bigint.exe`⏎
			␤⏎
			␤ ⎙66 num_bigint   correct: true, allocations: 17379⏎
			␤ ⎙30 number       correct: true, allocations: 26⏎
			␤ ␃
		

As you can see, even though only two operations are invoked (two modular exponentiations), the Montgomery reduction algorithm performs many big-integer operations internally to avoid computing xy. This leaves plenty of room for buffer reuse, without resorting to a cumbersome API or hidden buffer pool that would be required to reuse buffers in a general-purpose library.

Despite the significant difference in allocations, there's not a noticeable difference in execution time when running this isolated sample. Even if a performance difference is present, the key exchange is usually a small part of the session; its primary purpose is to negotiate a key to be used with a less expensive cipher. Overall, this is probably more of a feel-good victory than any big performance gain. The real benefit of the new number library is in simplicity, making it easy to modify and fast to build; the following comparison shows the number of lines in each library with their dependencies:

			⌨591 c ⌨123 a ⌨90 r ⌨108 g ⌨78 o ⌨171   ⌨115 i ⌨156 n ⌨58 i ⌨108 t ⌨148 ⏎
			␤ ⎙53 Created binary (application) package ⏎
			␤ ⌨690 p ⌨123 r ⌨109 i ⌨118 n ⌨77 t ⌨236 f ⌨219   ⌨270 " ⌨497 n ⌨72 u ⌨156 m ⌨169 - ⌨95 b ⌨97 i ⌨109 g ⌨92 i ⌨106 n ⌨99 t ⌨431   ⌨60 = ⌨129   ⌨258 \ ⌨370 " ⌨288 0 ⌨180 . ⌨84 2 ⌨106 \ ⌨316 " ⌨180 " ⌨220   ⌨165 > ⌨152 > ⌨193   ⌨189 C ⌨77 a ⌨111 r ⌨74 g ⌨69 o ⌨199 . ⌨108 t ⌨73 o ⌨191 m ⌨178 l ⌨159 ⏎
			␤ ⎙10 ⏎
			␤ ⌨744 c ⌨108 a ⌨104 r ⌨114 g ⌨97 o ⌨153   ⌨134 v ⌨134 e ⌨70 n ⌨113 d ⌨77 o ⌨108 r ⌨124   ⌨129 . ⌨100 / ⌨142 v ⌨141 e ⌨73 n ⌨139 d ⌨62 o ⌨116 r ⌨240 ⏎
			␤ ⎙59   ⎙0    Updating crates.io index⏎
			␤ ⎙583   ⎙0   Vendoring autocfg v1.0.0 (~\.cargo\registry\src\github.com-⏎
			␤       1ecc6299db9ec823\autocfg-1.0.0) to ./vendor\autocfg⏎
			␤ ⎙56   ⎙0   Vendoring num-bigint v0.2.6 (~\.cargo\registry\src\github.com-⏎
			␤       1ecc6299db9ec823\num-bigint-0.2.6) to ./vendor\num-bigint⏎
			␤ ⎙114   ⎙0   Vendoring num-integer v0.1.42 (~\.cargo\registry\src\github.com-⏎
			␤        1ecc6299db9ec823\num-integer-0.1.42) to ./vendor\num-integer⏎
			␤ ⎙37   ⎙0   Vendoring num-traits v0.2.11 (~\.cargo\registry\src\github.com-⏎
			␤        1ecc6299db9ec823\num-traits-0.2.11) to ./vendor\num-traits⏎
			␤ ⎙108 To use vendored sources, add this to your .cargo/config for this project:⏎
			␤⏎
			␤ ⎙0 [source.crates-io]⏎
			␤replace-with = "vendored-sources"⏎
			␤⏎
			␤[source.vendored-sources]⏎
			␤directory = "./vendor"⏎
			␤ ⌨1974 f ⌨67 i ⌨159 n ⌨106 d ⌨376   ⌨135 . ⌨110 / ⌨95 v ⌨132 e ⌨86 n ⌨99 d ⌨83 o ⌨98 r ⌨838   ⌨209 - ⌨230 t ⌨155 y ⌨82 p ⌨120 e ⌨120   ⌨339 f ⌨425   ⌨394 - ⌨100 e ⌨196 x ⌨146 e ⌨155 c ⌨104   ⌨117 w ⌨230 c ⌨516   ⌨94 - ⌨129 l ⌨183   ⌨354 { ⌨62 } ⌨269   ⌨216 + ⌨359   ⌨244 | ⌨189   ⌨102 g ⌨89 r ⌨77 e ⌨52 p ⌨159   ⌨218 t ⌨117 o ⌨72 t ⌨169 a ⌨98 l ⌨630 ⏎
			␤ ⎙14  25554 total⏎
			␤ ⌨669 f ⌨75 i ⌨143 n ⌨114 d ⌨139   ⌨486 ~ ⌨267 / ⌨354 p ⌨105 r ⌨86 o ⌨179 j ⌨65 e ⌨208 c ⌨85 t ⌨244 s ⌨158 / ⌨298 n ⌨69 u ⌨189 m ⌨84 b ⌨79 e ⌨56 r ⌨585   ⌨101 - ⌨104 t ⌨181 y ⌨49 p ⌨132 e ⌨145   ⌨106 f ⌨253   ⌨98 - ⌨89 e ⌨188 x ⌨163 e ⌨166 c ⌨108   ⌨139 w ⌨175 c ⌨443   ⌨184 - ⌨142 l ⌨187   ⌨354 { ⌨97 } ⌨562   ⌨214 + ⌨195   ⌨244 | ⌨112   ⌨106 g ⌨78 r ⌨96 e ⌨43 p ⌨96   ⌨138 t ⌨113 o ⌨68 t ⌨209 a ⌨82 l ⌨191 ⏎
			␤ ⎙15  316 total⏎
			␤ ␃		
< Find more posts at mattblagden.com