On 06-11-2021 the Scintilla website was unreachable for several hours. Three days later, the same thing happened again, and again a few days later. What happened?
Due to several well-targeted DDoS-attacks, spread over several days, the Scintilla website was down for many hours in a row. During these attacks, Scintilla’s members were not able to sign-up for activities, use their web-mail, access the VPN or access files.
In this edition of the Vonk, you read all about what a DDoS-attack is, how it could cripple Scintilla’s systems and what is being done to prevent these problems in the future. Additionally, we give you a special overview of the network and servers that Scintilla uses for their cloud services.
What is a DDoS-attack?
A DDoS-attack, or distributed denial-of-service attack, is in essence nothing more than overflowing a machine or network by sending many, superfluous requests, in attempt to overload the targeted system1. The idea of such an attack is to send so many “fake” requests to a machine, that legitimate requests can’t get through. Often, this is done by sending these request from many different systems, which the attacker often illegally gained access to.
Such an attack often targets one specific server. However, before reaching a targeted machine, the individual requests need to go over the network the machine is connected to. If the machine is connected to the internet through a switch that is also not able to handle the amount of requests, all other devices connected to that switch might also not be able to connect to the internet. If this is the case, all connected servers are unreachable.
What was the target of the attacks?
According to Willem Mulder, member of Scintilla Operator Team (SOT) and Studenten Net Twente (SNT), the target was the IRC (Internet Relay Chat) server of SNT. IRC is a chat protocol that was widely used for a long time for chatting over the internet, but now is mainly used by hobbyists. The server sends and receives chat messages between clients all over the internet2.
IRC servers are a popular target for DDoS-attacks. “If a network is attacked by a DDoS-attack, you can be almost sure they attack the IRC server,” said Willem Mulder, “I don’t know why, but it seems to be an unwritten rule of the internet!”
Remarkably, the University has a really high band-width (40 Gbps) link to the outside world, too high for simple DDoS-attacks to cause any problems. So why did this particular attack cause so many problems? That’s what I asked Jeroen van Ingen Schenau, Senior Network Engineer at the university.
According to Jeroen, the university’s network is hit by a DDoS-attack several times a week. This is never a real problem, because our routers already provide enough protection to ‘stupid’ DDoS-attacks. With ‘stupid’ attacks he means DDoS-attacks that target a server with a lot of traffic that is clearly not asking for that traffic. In this case, our routers can easily detect these packets3 and filter them out, or drop them, as he calls it.
In the case of the IRC server, Jeroen continued, that is expecting packets with messages, a ‘smart’ DDoS attack would send a lot of packages that simulate normal messages to the server. In this case, it is much harder for the router to detect these malicious packages and they will be relayed further into the network.
But again, this should not have been a problem: the next node in the network is a switch4 that is easily able to handle a large DDoS-attack. Even though the switch is not able to identify and drop malicious packets, it is able to manage higher data flows than the IRC server can handle. It does this by dropping packets at the output buffer. The input port of the switch can handle a much higher flow rate and processes all packets on the input port and sends them to the correct output port. Once the output buffer receives more than it can handle, the superfluous packages are dropped. This way, only the packets that are send to an overloaded port (in this case, the port that connects to the IRC server) are dropped and other traffic is able to continue.
However, due to a bug in the system, packets were dropped at the input buffer instead of the output buffer at the IRC server port on the switch. This meant that instead of letting 10 Gbps worth of traffic flow trough the switch, and only drop packets at an output port that exceeded a flow of 1 Gbps, packets were dropped at random out of the input buffer whenever the output buffer exceeded its capacity. As a result, it was difficult to reach the servers connected to the other ports on the switch and many services went down.
Why was Scintilla a victim of the attacks?
The switch that had the above described problems is called sw2-sein-snt.civ, i.e. switch 2 in the “Seinhuis”5 from SNT. As a result Scintilla was impacted in two ways:
- On one hand, one of the most important servers of Scintilla is connected to the sw2-sein-snt.civ switch. This server is called “Alexia” and runs among others Linscin, SMART3 (database), the email accounts for active members, the LDAP server and the Vonk website. When the switch was overrun by traffic, Alexia lost its uplink and its services were unavailable.
- On the other front, the SNT servers were down as these connect to the same switch. Therefore, SNT Cloud was not available. Scintilla’s website, the STORES website, the wiki, webmail, SATIS, SMART3 (front-end) and the sk_tv_app are among the things that run on SNT Cloud. These services were therefore also not available.
According to Willem, SNT Cloud consists out of three servers that work together to provide redundancy to ensure functionality. However, two of those servers are connected behind sw2-sein-snt.civ, leaving only one server with an uplink. This should still not have been a problem, the remaining server should have been able to handle running on its own. Sadly, this server checks if it still has an uplink by pinging the other two servers. The fact that the other two were unavailable made the remaining server think that it was offline and stop functioning.
What is being done to prevent these problems in the future?
On one hand, the bugs on the switch are being fixed. What exactly caused the problems was unclear. What is known is that it has something to do with application specific integrated circuits (ASICs), so specialized hardware, that prioritize certain TCP flags on packages. This functionality is not considered to be very important, so it has been (manually) disabled. This means the bug will not show its symptoms any more and packets won’t be dropped at the input buffer.
On the other hand, SNT is working to solve their problem with the remaining server thinking it lost connection to the internet. Hopefully, with both problems solved, Scintilla’s servers will remain operational for the foreseeable future.