Researchers at the University of Cambridge have done some analysis on how the PRC’s Great Firewall (GFW) handles the “blocking” or interruption of web page loading midstream when it detects sensitive keywords related to the day after June 3 and certain religious groups. What they discovered is quite surprising, because it indicates that the mechanism is simple, clever, but at the same time, quite straighforward to circumvent. Read on for a layman’s explanation of the technical paper.
For the non-techie, the simple explanation is that the GFW sends a “TCP reset” packet to both the web server supplying the suspicious page and to the client (ie. your computer) loading it. It’s the equivalent of an “emergency stop” packet usually reserved for situations of bad connectivity so that both sides know to disconnect abruptly.
It appears the GFW in PRC cleverly uses this technique so that it can stymie the loading of pages, and so it does not have to actively make subsequent decisions to drop packets by correlating them to previous ones. In techie terms, having to store the history of what has been sent and received is called “state information” as in the technical state of affairs the router must accumulate. (This is not to be confused with State information as with “state secrets” or “enemies of the state”!)
I say it is clever, because this means you need far fewer computers, processing power and memory to implement effective blocking. In fact, GFW operators could use off-the shelf Cisco (or whatever) routers with no modified firmware whatsoever, and just have a set of machines sit on the side detecting keywords, and sending out “TCP resets.” Simple, effective, and with a low impact for network engineering.
Well the researchers realized that because this “TCP reset” was the sole mechanism for cutting off loading the content, the page information (including sensitive information and all) was still being sent through all the way to your client computer in the PRC! But because of the “TCP reset,” the client was simply shutting down reception of such packets so the Web browser never got the content. That is, they were actually travelling down the cable (or over Wifi) to your locale in the PRC, but the computer was ignoring them.
So in their tests, they said – what if we simply instructed the computer to ignore the “TCP reset” and keep loading. Would it work? The answer is: yes. From their blog:
…the keyword detection is not actually being done in large routers on the borders of the Chinese networks, but in nearby subsidiary machines. When these machines detect the keyword, they do not actually prevent the packet containing the keyword from passing through the main router (this would be horribly complicated to achieve and still allow the router to run at the necessary speed). Instead, these subsiduary machines generate a series of TCP reset packets, which are sent to each end of the connection. When the resets arrive, the end-points assume they are genuine requests from the other end to close the connection — and obey. Hence the censorship occurs.
However, because the original packets are passed through the firewall unscathed, if both of the endpoints were to completely ignore the firewall’s reset packets, then the connection will proceed unhindered! We’ve done some real experiments on this — and it works just fine!! Think of it as the Harry Potter approach to the Great Firewall — just shut your eyes and walk onto Platform 9Â¾.
Cool results. One problem – you need both the Web server and the client to ignore “TCP reset” packets to make this workaround effective. The researchers have suggested that making this behavior modification to the “TCP/IP stack” of networking code in routers and operating systems was desirable anyway, and they’re probably right. But that’s quite a tall order to get Microsoft, Apple, Palm, Symbian, and all the other folks with IP networking in their OSes to change. (But interestingly, with open source software like Linux, a patch and recompile of the kernel to do this is quite simple.)
Nevertheless, this does provide some insight into how the GFW manages to be effective in keyword blocking given how much traffic the PRC Internet chokepoints have to handle. It’s the network filtering equivalent of Occam’s Razor – the simplest and most straightforward (and low impact) implementation is the most likely.
…the key point is that changing the TCP/IP stacks to ignore the firewall is almost a no-brainer for the vendor. There are excellent technical reasons for discarding the firewallâ€™s resets as a matter of course. If stack builders did this as standard, then an entire Great Firewall of China mechanism entirely fails to work. That can only, in my view, be a good result.
[Hat tip to: Bruce Schneier]