FreeS/WAN -- KLIPS HARDWARE ACCELERATION NOTES (draft 1) ============================================================================= Wed Feb 28 22:47:28 EST 2001 I have once again started working on hardware acceleration of KLIPS. This message is a small subset of thoughts that I have on this topic. I have been quite busy at work to get back with more detail to a few people on this list that were interested in hardware acceleration in KLIPS. I hope to rectify that. I am no less busy at work but I will put some effort outside of work to discuss ways of doing this. While writing this message I tried very hard not to introduce any coding specifics. Consider this slightly more formal then a brainstorm. I would love to hear some feedback. Before I get to the dirty stuff I need to make some of my requirements known. I am working on "Network Security Processors" (NSPs) at Chrysalis-ITS. Luna340 is our first packet processor/encryption engine. I want to alter KLIPS so that hardware encryption is either a simple run-time plug-in requiring no code change (IE loadable crypto modules), or at least make it very simple to alter KLIPS for the reason of hardware acceleration. I find the first alternative beneficial - rebuttals are welcome. There are many ways to make KLIPS access encryption routines. Let's just say, for now, that these routines are present in an crypto engine that provides transparent access to a hardware or software that can handle the task. I think that RBG was going in this direction anyway; although he probably did not envision these to be strictly separated from KLIPS. And being on this topic, I would like to make a case for just that... splitting KLIPS into three distinct parts. Each part would have a well defined and hopefully static interface such that changing one part would not necessarily influence the others. These parts I propose would be: 1) tunnel processing (tunnel database and pfkey interface) 2) protocol processing (ESP/AH packet mangling) 3) crypto processing (crypto functionality) Now, the reason why I want to make the crypto procedures separate is easy to see... so that they can be replaced by hardware equivalents. This would include a unified interface to encrypt, decrypt, sign, and authenticate. Each encryption and authentication algorithm would have a pair of functions that would be used. If nothing else it would make ipsec_rcv, ipsec_tunnel, and tbd_init a bit easier to read. The ESP/AH is a bit more involved; the Luna340 is capable of doing all packet mangling and demangling given an IPSec packet. This means that in one PCI operation you can do 3des and SHA1 as opposed to calling the chip twice to do these operations separately. In a case where a hardware accelerator is capable of packet processing KLIPS would only handle the database of SA's, select the appropriate SA when a new packet arrived for processing, and be an PFKey interface -- all other jobs would be done in hardware. [ Aside: This is a bit controversial as it lessens the work done in KLIPS and hence takes away from the importance of SWAN. I am attempting to utilize the chip to it's fullest potential... this is how I would do that. ] Having this established I will drop the distinction between hardware and software crypto elements unless it is important to do so. Question, what happens when a packet arrives? Well, the tunnel processor locates the appropriate tunnel entry in the tunnel database. It then calls the packet processor to process it. The packet processor selects the appropriate crypto function and executes it. On the completion of the crypto function the packet is returned for further packet processing. Once the packet processing is finished the packet is returned to the tunnel processor which may decide that another tunnel is appropriate for this packet and the cycle continues. Here is the above in a diagram (for the software case): [tunnel processing] [protocol processing] [crypto processing] PLUTO | | tdb_init(tdb, ...) |------------>| | | (pp_t*)pp_init(spi, | | e_alg,e_key,e_len, | | a_alg,a_key,a_len) | |-------------------->| | | | (enc_t*)enc_init(dir,key,len) | | |-------------------->| | | |pp->enc <------------| | | | | | | (auth_t*)auth_init(dir,key,len) | | |-------------------->| | | |pp->auth <-----------| | |tbd->pp <------------| |rc <---------| NIC | ipsec_rcv(skb) | | tdb_lookup(spi,...) |------------>| |tdb <--------| | tdb->pp.fn(tdb->pp,skb) |---------------------------------->| | | | | pp->enc.fn(pp->enc, | | skb+src_ofs, | | src_len, | | skb+dst_ofs, | | dst_len) | |-------------------->| | |rc <-----------------| | | | | pp->auth.fn(pp->auth, | | skb+src_ofs, | | src_len | | skb+dst_ofs, | | dst_len) | |-------------------->| | |rc <-----------------| |rc <-------------------------------| | (loop if unmangled packet is an ESP/AH) | | netif_rx(skb) Anyway, you get the picture. Here are some finer points: * ipsec_rcv will loop for nested tunnels/SA's * ipsec_tunnel will be similar to ipsec_rcv * tbd->pp.fn points to process_esp or process_ah depending on SA. * tbd->enc.fn points to dec_3des. * tbd->auth.fn points to verify_md5, verify_sha1. The point is that each layer makes no guesses how the next one works. For example we know that des encrypt and des decrypt is the same with a different key schedule. However, we do not use this and have two des functions. Advantage of this is that if a new function comes along that cannot use this trick we will still be able to use the layer preceding it; i.e. the processing layer. The major issue about hardware acceleration is this: since most of KLIPS is running in interrupt time (ISR of the NIC driver) it cannot sleep and wait for a device to complete it's computations. Even if it could we would not want to do this since we have better things to do; like servicing other user tasks, etc. What we must do when we dispatch the job on the crypto device is inform the IP stack that the packet was stolen and then detach. This is quite important; if we just returned from ipsec_rcv then the skb would have been deleted while we were still working on it. Here is my interpretation of what will happen if you add hardware to the above picture. [tunnel processing] [protocol processing] [crypto processing] NIC | ipsec_rcv(skb) | | tdb_lookup(spi,...) |------------>| |tdb <--------| | |job = alloc_skb(tdb->pp.jobsize) |job->skb = skb; |job->tdb = tdb; | tdb->pp.fn(tdb->pp,job) |---------------------------------->| | | pp->enc.fn(pp->enc, | | job->skb+src_ofs, | | src_len, | | job->skb+dst_ofs, | | dst_len) | |-------------------->| | | |----> dispatch H/W | |rc <-----------------| |rc <-------------------------------| [ millions of nanoseconds later in code near by... ] H/W interrupt | ipsec_callback(job) | | job->tdb->pp.fn(job->tdb->pp,job) |---------------------------------->| | | pp->auth.fn(pp->auth, | | job->skb+src_ofs, | | src_len | | job->skb+dst_ofs, | | dst_len) | |-------------------->| | | |----> dispatch H/W | |rc <-----------------| |rc <-------------------------------| [ millions of nanoseconds later in code near by... ] H/W interrupt | ipsec_callback(job) | | job->tdb->pp.fn(job->tdb->pp,job) |---------------------------------->| |rc <-------------------------------| | (may need to repeat ipsec_rcv if unmangled packet is an ESP/AH) | | netif_rx(job->skb) | free_skb(job) A bit more involved but pretty much the same idea. The difference is that the continuity is broken twice; once to do encryption and once to do authentication. Here is a list of notes for this diagram: * job is a buffer that stores the information about one transaction; it is used internally to store all local variables that need to be kept around between the dispatch of the operation and the matching interrupt. * jobsize is calculated ahead of time during SA creation and contains the number of bytes used in the protocol processor and crypto processor. It is noteworthy to mention that there may be multiple jobs per tbd. This will happen most frequently on the receiver if the sender is must faster and can swamp its counterpart. For this reason we keep a separate 'job' structure for each packet that comes into KLIPS. OK, it's late. If I continue to write I will probably start babbling. ;) Comments and critique are most appreciated. I would like to spend a few weeks discussing this before I start any code. I will grab some people at work tomorrow and bribe them with beer to spend some time with me looking over this. :) Regards, Bart Trojanowski ============================================================================= Ideas mentioned above are a copyright of Bart Trojanowski. If available, an updated version of this document can be found at: http://www.jukie.net/~bart/linux-ipsec/