In this exercise we will implement a layer 3 forwarding switch that is able to load balance traffic
towards a destination across equal cost paths. To load balance traffic across multiple ports we will implement ECMP (Equal-Cost
Multi-Path) routing. When a packet with multiple candidate paths arrives, our switch should assign the next-hop by hashing some fields from the
header and compute this hash value modulo the number of possible equal paths. For example in the topology below, when s1
has to send
a packet to h2
, the switch should determine the output port by computing: hash(some-header-fields) mod 4
. To prevent out of order packets, ECMP hashing is done on a per-flow basis,
which means that all packets with the same source and destination IP addresses and the same source and destination
ports always hash to the same next hop.
To read more about ECMP see the following page
As usual, we provide you with the following files:
p4app.json
: describes the topology we want to create with the help of mininet and p4-utils package.network.py
: a Python scripts that initializes the topology using Mininet and P4-Utils. One can use indifferentlynetwork.py
orp4app.json
to start the network.p4src/ecmp.p4
: p4 program skeleton to use as a starting point.p4src/includes
: In today's exercise we will split our p4 code in multiple files for the first time. In the includes directory you will findheaders.p4
andparsers.p4
(which also have to be completed).send.py
: a small python script to generate multiple packets with different tcp port.
For this exercise (and next one) we will use a new IP assignment strategy. If you have a look at p4app.json
you will see that
the option is set to mixed
. Therefore, only hosts connected to the same switch will be assigned to the same subnet. Hosts connected
to a different switch will belong to a different /24
subnet. If you use the namings hX
and sX
(e.g h1, h2, s1...), the IP assignment
goes as follows: 10.x.x.y
. Where x
is the switch id (upper and lower bytes), and y
is the host id. For example, in the topology above,
h1
gets 10.0.1.1
and h2
gets 10.0.6.2
.
You can find all the documentation about p4app.json
in the p4-utils
documentation. Also, you can find information about assignment strategies here.
To solve this exercise we have to program our switch such that is able to forward L3 packets when there is one possible next hop or more. For that we will use two tables: in the first table we match the destination IP and depending on whether ECMP has to be applied (for that destination) we set the output port or a ecmp_group. For the later we will apply a second table that maps (ecmp_group, hash_output) to egress ports.
This time you will have to fill the gaps in several files: p4src/ecmp.p4
, p4src/include/headers.p4
and p4src/include/parsers.p4
. Additionally, you will have to create a cli
command file for each switch and name them sX-commands.txt
(see inside the p4app.json
). To keep the workspace clean, we placed all the command files inside sw-commands
directory.
To successfully complete the exercise you have to do the following:
-
Use the header definitions that are already provided.
-
Define the parser that is able to parse packets up to
tcp
. Note that for simplicity we do not considerudp
packets in this exercise. This time you must define the parser in:p4src/include/parsers.p4
. -
Define the deparser. Just emit all the headers in the right order.
-
Define a match-action table that matches the IP destination address of every packet and has three actions:
set_nhop
,ecmp_group
,drop
. Set the drop action as default. -
Define the action
set_nhop
. This action takes 2 parameters: destination mac and egress port. Use the parameters to set the destination mac andegress_spec
. Set the source mac as the previous destination mac (this is not what a real L3 switch would do, we just do it for simplicity). In a more realistic implementation we would create a table that maps egress_ports to each switch interface mac address, however since the source mac address is not very important for this exercise just do this swap). When sending packets from a switch to another switch, the destination address is not very important, and thus you can use a random one. However, keep in mind that when the packet is sent to a host needs to have the right destination MAC address. Finally, decrease the packet's TTL by 1. Note: since we are in a L3 network, when you send packets froms1
tos2
you have to use the dst mac of the switch interface not the mac address of the receiving host, that instead is done in the very last hop. Finally, decrease the packet's TTL by 1. -
Define the action
ecmp_group
. This action takes two parameters, the ecmp group id (14 bits), and the number of next hops (16 bits). This action is one of the key parts of the ECMP algorithm. You have to do several things:- In this action we will compute a hash function. To store the output you need to define a metadata field. Define
ecmp_hash
(14 bits) inside the metadata struct inheaders.p4
. Use thehash
extern function to compute the hash of packets 5-tuple (src ip, dst ip, src port, dst port, protocol). The signature of a hash function is:hash(output_field, (crc16 or crc32), (bit<1>)0, {fields to hash}, (bit<16>)modulo)
. You can find all the available hash functions here. For example, if you want to usecrc32
, your algorithm will be:HashAlgorithm.crc32
. - Define another metadata field and call it
ecmp_group_id
(14 bits). - Finally copy the value of the second action parameter ecmp group in the metadata field you just defined (
ecmp_group_id
) this will be used to match in the second table.
- In this action we will compute a hash function. To store the output you need to define a metadata field. Define
Note: a lot of people asked why theecmp_group_id
is needed. In few words, it allows you to map from one ip address to a set of ports, which does not have to be
the 4 ports we use in this exercise. For example, you could have that for IP1
you use only the upper 2 ports and for IP2
you loadbalance using the two lower ports. Thus, by
creating two ecmp groups you can easily map any destination address to any set of ports.
-
Define the second match-action table used to set
ecmp_groups
to real next hops. The table should haveexact
matches to the metadata fields your defined in the previous step. Thus, it should match to themeta.ecmp_group_id
and then to the output of the hash functionmeta.ecmp_hash
(which will be a value ranging from 0 toNUM_NEXT_HOPS-1
). A match in this table should call theset_nhop
action that you already defined above, a miss should mark the packet to be dropped (setdrop
as default action). This enables us to use any subset of interfaces. For example imagine that in the topology above we haveh2
andh3
( h3 does not exist, but just for the sake of the example) we could define two differentecmp
groups (in the previous table), one that maps to port 2 and 4, and one that maps to port 3 and 5. And then in this table we could add two rules per group, to make the outputs[0,1]
from the hash function match [2,4] and[3,5]
respectively. -
Define the ingress control logic:
- Check if the ipv4 header was parsed (use
isValid
). - Apply the first table.
- If the action
ecmp_group
was called during the first table apply. Call the second table. Note: to know which action was called during an apply you can use a switch statement andaction_run
, to see more information about how to check which action was used, check out the P4 16 specification
- Check if the ipv4 header was parsed (use
-
In this exercise we modify a packet's field for the first time (remember we have to subtract 1 to the ip.ttl field). When doing so, the
ipv4
checksum field need to be updated otherwise other network devices (or receiving hosts) might drop the packet. To do that, thev1model
provides anextern
function that can be called inside theMyComputeChecksum
control to update checksum fields. In this exercise, you do not have to do anything, however just go to theecmp.p4
file and check how theupdate_checksum
is used. -
This time you have to write six
sX-commands.txt
files, one per switch. Note that onlys1
ands6
need to haveecmp
groups installed. For all the other switches setting rules for the first table (using actionset_nhop
) will suffice. Fors1
you have to set a direct next hop towardsh1
, and a ecmp group towardsh2
. Set the ecmp group withid = 1
andnum_hops = 4
. Then define 4 rules that map from 0 to 3 to one of the 4 switch output ports (using the second table).
Once you have the ecmp.p4
program finished (and all the commands.txt
files) you can test its behaviour:
-
Start the topology (this will also compile and load the program).
sudo p4run
or
sudo python network.py
-
Check that you can ping:
mininet> pingall
-
Monitor the 4 links from
s1
that will be used duringecmp
(froms1-eth2
tos1-eth5
). Doing this you will be able to check which path is each flow taking.sudo tcpdump -enn -i s1-ethX
-
Ping between two hosts:
You should see traffic in only 1 or 2 interfaces (due to the return path). Since all the ping packets have the same 5-tuple.
-
Do iperf between two hosts:
You should also see traffic in 1 or 2 interfaces (due to the return path). Since all the packets belonging to the same flow have the same 5-tuple, and thus the hash always returns the same index.
-
Get a terminal in
h1
. Use thesend.py
script.python send.py 10.0.6.2 1000
This will send
tcp syn
packets with random ports. Now you should see packets going to all the interfaces, since each packet will have a different hash.
We have added a small guideline in the documentation section. Use it as a reference when things do not work as expected.