The first library I'll make is something to handle big numbers. This library won't have any immediate consumers (it is the first library, after all), but it's a nicely contained bit of functionality that I know I'll need later. Common encryption and authentication mechanisms are built atop RSA cryptography, Diffie-Hellman key exchange, and Elliptic curve cryptography, all of which use numbers much larger than CPU registers. I know I'll want to perform key exchanges in the near future, so I'll start by satisfying the requirements for Diffie-Hellman.
Math is commonly performed by loading numbers into CPU/FPU registers and then invoking the desired operation (ADD, DIV, FCOS, FSQRT, etc.), leaving the result in a register. In this case, the size of the number is limited by the size of the register. On modern machines, this is usually in the range of 32-128 bits. In RSA/DH/ECC, the keys are often in the range of 1024-4096 bits. To perform math on these values, the software must use many small (32-128 bit) operations to carry out the larger (1024-4096 bit) operation in steps.
Thankfully, this problem is quite easy to solve. Consider decimal notation: even though it only contains the symbols 0-9, you can denote a number of any size simply by using multiple digits. You can also perform math on these multi-digit numbers to produce multi digit results. For example, addition is performed by adding the rightmost digits of each input number and writing the result as the rightmost digit of the answer. This is repeated for each column to the left, carrying over a 1 if the previous column's sum was greater than the largest possible digit.
This technique is straightforward to implement in software with a simple loop over each “column”
of digits. It could be implemented using decimal digits, but the machines I'll be using have at least 32-bit
registers, so I can efficiently deal with “digits” of 32 bits at a time (i.e. each digit ranges
from 0-2,147,483,647 instead of 0-9). Thus, my BigInteger
type has a single field:
digits: Vec<u32>
.
Diffie-hellman actually needs just a single operation: modular exponentiation
(xy % m
,
where
all three of x
, y
, and m
are large integers). The naive way to
evaluate this is to perform xy
, then reduce the result % m
. Even
though the BigInteger
type handles integers of any size, there are still memory and time
constraints. Typical key exchanges would have values for x
and y
in the range
of 2048 and 1024 bits, respectively. The result of xy
(i.e.
(22048)21024
) would produce a result occupying
~3.5×10311 bits. There's no way to work with (or even store) a number that large.
Thankfully, there are well-known techniques for determining the final value (after the % m
)
without ever actually computing xy
. I'll be using Montgomery modular
multiplication, as explanations and pseudocode are readily available. Unfortunately, implementing
this algorithm requires many more mathematical operators, so I have to implement all of the common
operations anyway: addition, subtraction, multiplication, division, and bi-directional shifting. All of
these are simple implementations of the normal pen-and-paper techniques, except for division. For division I
use Knuth's “Algorithm D”.
Because large integers have variable number of digits, they typically store their content on the heap. For
example, performing let difference = something - something_else
reads two heap-allocated
vectors of digits (something
and something_else
) and returns a newly-allocated
vector of digits containing the difference. It would be advantageous to re-use one of the existing buffers
if something
or something_else
are no longer needed, but doing this in a
general-purpose way is quite cumbersome. The API would be messier, as users need to specify which
of the inputs can have their buffers reused. Additionally, the implementation would be more complex,
as the library needs to consider which of the input buffers are marked as reusable, which of those are
large enough to hold the result, and when during the computation each input is no longer needed and
can be overwritten.
Thankfully, I don't need any of that flexibility and can tailor each mathematical operation to the type of buffer reuse I need to perform modular exponentiation. Specifically, almost every step of the computation uses an intermediate value only once; nearly all mathematical operations can be written to reuse a specific input to store their result. Out of curiosity, I wanted to try a real-world example to see how the simple implementation compared to a general-purpose one.
To negotiate an SSH session key, both parties select a secret value and then “mix” it with a public value received from their peer. The math works out in such a way that both parties end up with the same result — a shared secret — but an eavesdropper who only sees the publicly shared parts cannot infer the secret.
Mathematical details aside, the important part is that negotiating a shared secret requires
two modular exponentiations: one to compute the public value to send, and one to mix in the
received public value and produce the shared secret. The following code does this on
some real values captured from an SSH negotiation, once using the Rust num-bigint
crate and once using the newly-created number
library:
Both libraries were modified to increment a counter each time a new variable-length integer is instantiated.
Running the program above displays the number of allocations from the Rust num-bigint
library
compared to the new number
library:
⌨780 c ⌨74 a ⌨126 r ⌨145 g ⌨115 o ⌨114 ⌨150 r ⌨95 u ⌨118 n ⌨98 ⌨152 - ⌨130 - ⌨126 r ⌨175 e ⌨80 l ⌨98 e ⌨129 a ⌨94 s ⌨52 e ⌨193 ⏎  ⎙172 ⎙0 Compiling num-bigint v0.2.1 (C:\compare\num-bigint)⏎  ⎙1418 ⎙0 Finished release [optimized] target(s) in 1.54s⏎  ⎙6 ⎙0 Running `target\release\num-bigint.exe`⏎ ⏎  ⎙66 num_bigint correct: true, allocations: 17379⏎  ⎙30 number correct: true, allocations: 26⏎  ␃
As you can see, even though only two operations are invoked (two modular exponentiations), the
Montgomery reduction algorithm performs many big-integer operations internally to avoid computing
xy
. This leaves plenty of room for buffer reuse, without resorting to
a cumbersome API or hidden buffer pool that would be required to reuse buffers in a general-purpose
library.
Despite the significant difference in allocations, there's not a noticeable difference in execution time when running this isolated sample. Even if a performance difference is present, the key exchange is usually a small part of the session; its primary purpose is to negotiate a key to be used with a less expensive cipher. Overall, this is probably more of a feel-good victory than any big performance gain.
The real benefit of the new number
library is in simplicity, making it easy to modify and
fast to build; the following comparison shows the number of lines in the big-int
-based solution
compared to this new number
library:
⌨591 c ⌨123 a ⌨90 r ⌨108 g ⌨78 o ⌨171 ⌨115 i ⌨156 n ⌨58 i ⌨108 t ⌨148 ⏎  ⎙53 Created binary (application) package ⏎  ⌨690 p ⌨123 r ⌨109 i ⌨118 n ⌨77 t ⌨236 f ⌨219 ⌨270 " ⌨497 n ⌨72 u ⌨156 m ⌨169 - ⌨95 b ⌨97 i ⌨109 g ⌨92 i ⌨106 n ⌨99 t ⌨431 ⌨60 = ⌨129 ⌨258 \ ⌨370 " ⌨288 0 ⌨180 . ⌨84 2 ⌨106 \ ⌨316 " ⌨180 " ⌨220 ⌨165 > ⌨152 > ⌨193 ⌨189 C ⌨77 a ⌨111 r ⌨74 g ⌨69 o ⌨199 . ⌨108 t ⌨73 o ⌨191 m ⌨178 l ⌨159 ⏎  ⎙10 ⏎  ⌨744 c ⌨108 a ⌨104 r ⌨114 g ⌨97 o ⌨153 ⌨134 v ⌨134 e ⌨70 n ⌨113 d ⌨77 o ⌨108 r ⌨124 ⌨129 . ⌨100 / ⌨142 v ⌨141 e ⌨73 n ⌨139 d ⌨62 o ⌨116 r ⌨240 ⏎  ⎙59 ⎙0 Updating crates.io index⏎  ⎙583 ⎙0 Vendoring autocfg v1.0.0 (~\.cargo\registry\src\github.com-⏎  1ecc6299db9ec823\autocfg-1.0.0) to ./vendor\autocfg⏎  ⎙56 ⎙0 Vendoring num-bigint v0.2.6 (~\.cargo\registry\src\github.com-⏎  1ecc6299db9ec823\num-bigint-0.2.6) to ./vendor\num-bigint⏎  ⎙114 ⎙0 Vendoring num-integer v0.1.42 (~\.cargo\registry\src\github.com-⏎  1ecc6299db9ec823\num-integer-0.1.42) to ./vendor\num-integer⏎  ⎙37 ⎙0 Vendoring num-traits v0.2.11 (~\.cargo\registry\src\github.com-⏎  1ecc6299db9ec823\num-traits-0.2.11) to ./vendor\num-traits⏎  ⎙108 To use vendored sources, add this to your .cargo/config for this project:⏎ ⏎  ⎙0 [source.crates-io]⏎ replace-with = "vendored-sources"⏎ ⏎ [source.vendored-sources]⏎ directory = "./vendor"⏎  ⌨1974 f ⌨67 i ⌨159 n ⌨106 d ⌨376 ⌨135 . ⌨110 / ⌨95 v ⌨132 e ⌨86 n ⌨99 d ⌨83 o ⌨98 r ⌨838 ⌨209 - ⌨230 t ⌨155 y ⌨82 p ⌨120 e ⌨120 ⌨339 f ⌨425 ⌨394 - ⌨100 e ⌨196 x ⌨146 e ⌨155 c ⌨104 ⌨117 w ⌨230 c ⌨516 ⌨94 - ⌨129 l ⌨183 ⌨354 { ⌨62 } ⌨269 ⌨216 + ⌨359 ⌨244 | ⌨189 ⌨102 g ⌨89 r ⌨77 e ⌨52 p ⌨159 ⌨218 t ⌨117 o ⌨72 t ⌨169 a ⌨98 l ⌨630 ⏎  ⎙14 25554 total⏎  ⌨669 f ⌨75 i ⌨143 n ⌨114 d ⌨139 ⌨486 ~ ⌨267 / ⌨354 p ⌨105 r ⌨86 o ⌨179 j ⌨65 e ⌨208 c ⌨85 t ⌨244 s ⌨158 / ⌨298 n ⌨69 u ⌨189 m ⌨84 b ⌨79 e ⌨56 r ⌨585 ⌨101 - ⌨104 t ⌨181 y ⌨49 p ⌨132 e ⌨145 ⌨106 f ⌨253 ⌨98 - ⌨89 e ⌨188 x ⌨163 e ⌨166 c ⌨108 ⌨139 w ⌨175 c ⌨443 ⌨184 - ⌨142 l ⌨187 ⌨354 { ⌨97 } ⌨562 ⌨214 + ⌨195 ⌨244 | ⌨112 ⌨106 g ⌨78 r ⌨96 e ⌨43 p ⌨96 ⌨138 t ⌨113 o ⌨68 t ⌨209 a ⌨82 l ⌨191 ⏎  ⎙15 316 total⏎  ␃