SSH is the primary way I move files between machines, carry out remote commands, and deploy server instances. Paired with a basic scripting tool, remote administration is kept simple. Apart from system administration, various programming projects benefit from having a small, dependency-free library that can open secure channels between machines.
Implementing SSH poses some interesting design challenges. The SSH protocol consists of several nested sub-protocols to serve different purposes.
The application data is an application-specific stream of bytes. If the application is a shell, the
stream content is quite familiar (e.g. a command to invoke
cat file.txt being sent,
then the content of
file.txt and a process exit-code being received). Although the
application stream is quite recognizable, it's buried beneath several layers of less-familiar SSH
First, the stream is broken into chunks according to the available window space. The windowing mechanism allows the host and client to specify how much application-specific data they are able to receive before negotiating a new window.
The windowed segments are wrapped in a channel identifier to allow for multiple simultaneous streams on a single SSH session (e.g. a shell session, an X11 session, and a file transfer being carried out simultaneously over a single socket). I rarely get much use from this SSH capability, but it seems like a nice feature to have.
Next, the multiplexed data is wrapped into a transport packet; this adds some headers, randomization, and padding to make the content suitable for block-based (rather than stream-based) layers above.
The block-multiple packets are then encrypted using the negotiated cipher, ensuring eavesdroppers cannot see the payloads in transit.
Finally, the encrypted blocks are hashed with a key to prove the data was sent by someone holding the negotiated key, and that it has not been modified by another party in transit.
None of these layers are particularly tricky. The main challenges arise from their composition and interactions; layers are added and removed mid-session, and messages on a deeply-nested layer can alter the behavior of outer layers.
The implementation will be made a bit more complex from my desire to not buffer entire packets as they pass through the layers. The naive way to implement the layering is to have each layer receive a packet of data, perform its transformation (hashing, encrypting, windowing, etc.) and pass it to the next layer. Unfortunately, SSH packets are often quite big:
All implementations MUST be able to process packets with an uncompressed payload length of 32768 bytes or less and a total packet size of 35000 bytes or less (including 'packet_length', 'padding_length', 'payload', 'random padding', and 'mac'). The maximum of 35000 bytes is an arbitrarily chosen value that is larger than the uncompressed length noted above. Implementations SHOULD support longer packets, where they might be needed.
Some transformations aren't trivial to do in-place, so the naive implementation would likely use multiple buffers per connection, each being >35k. That's a bit much.
The diagram above displays layers according to their structural layout, rather than according to when they come into use. In practice, some of the middle layers are used first, with outer and inner layers added as various handshakes progress. Starting with the next post, I'll implement each layer as they are required by an active connection so I can test as I go.