When you think about a wireless network and multicast together what is your first thought? Painful might be a good first answer. You might even get a couple of chuckles from your colleagues in the wireless space. While multicast has been around for decades and there are clear solutions for wired networks, multicast over a wireless infrastructure can be less straightforward and more painful without the right hardware and proper configuration. There are many documents out there, like the popular and often sited Validated Reference Design for Very high-density 802.11ac networks, However, at HubSpot, we’re a simple, medium-sized enterprise. Certainly not a large-scale wireless network like a sports stadium. How difficult could multicast really be?
At HubSpot, The network team and I were faced with a requirement to scale our company meeting video so that anyone at their desk, anywhere on campus or anywhere on the globe could watch. We would need to do so over the Aruba wireless infrastructure we built, particularly in our Cambridge, MA campus. After much research, we decided to use an in-house multicast application to stream the company meeting to thousands of employees in our Cambridge campus and also across the globe. After reading the Aruba VRD, we began to look at our design, examine our configuration, and gather facts about the multicast stream.
The requirement was that the stream would need to be at a minimum of 720p for experience purposes with many employees having large 4K monitors and high-resolution laptops. The streams had to be free from buffering, glitching, and perform smoothly all while not overtaxing the wireless LAN infrastructure. Our wireless infrastructure contained an array of 7210 Aruba controllers and one hundred fourteen model 315 APs throughout in tunnel model. Our Cambridge, MA campus with around 1900 clients had a pair of 7210 controllers in master/local.
We determined, based on past baseline, that both the controllers and the APs had more than enough horsepower for our needs. We believed we were in great shape and covered all the usual suspects from our research gathered from Aruba, the VRD, and some of my peers in the wireless space. In general, multicast on a small scale has always worked well at HubSpot. “Tech Talks” that had numbered in the 20-90 users were no problem.
The main features enabled are IGMP and DMO (Dynamic Multicast Optimization) among others based on reference documents and research. An explanation of DMO can be found in the referenced Aruba VRD, but at its base, DMO converts multicast streams into unicast streams to be sent to the client and provides some benefits such as allowing the stream to be treated better with regard to quality and reliability of streaming video.
We believed we were ready to go for our first test of the company meeting using our streaming services. We quickly realized as the company meeting started that things were not going well. Immediately, over 200 multicast clients were streaming the company meeting, and we noticed that not only was our data-path CPU solidly at 100%.
We could feel the effects of the traffic even though all monitoring showed no over-utilization of channel space anywhere on the wireless network. The aggregate bandwidth peaked at approximately 220Mbps, and the user count peaked at about 201 users until future clients were either unable to join at all or had very laggy video. Latency grew and seemed to slowly bring down the wireless network in our Cambridge Campus.
After the company meeting, we quickly gathered data and engaged Aruba TAC to try to figure out what went wrong. We could see that the controllers’ data-path CPUs were over-utilized due to a single source address forcing all streams to use only one CPU as the hundreds of streams were converted to unicast.
After pouring over the data, our first thought was that we should have used the D-DMO feature (Distributed Dynamic Multicast Optimization). D-DMO allows the streams to be converted to unicast at the AP rather than the controller. This was a strong first suspect from the start, and we shared our findings with the Aruba team who was sharing some of their brightest people to work with us on how we could not just scale to our present needs, but also consider growth as our campus grows.
At first, I believed that D-DMO was for APs in IAP-mode only. After some research, we found that there is another case when D-DMO is enabled when selecting DMO. That is when you change your tunnel mode to decrypt-tunnel. With decrypt-tunnel, traffic is decrypted at the access point rather than at the controller. One caveat is that you must turn on control-plane security to use decrypt-tunnel.
After making the necessary changes to our infrastructure, we were ready for some tests. We performed several tests using multicast video streams, and we saw initial success but were only able to reach 100 clients. However, we noticed that the data-path CPUs were much lower than the previous baseline the moment the changes went in. Initial baseline was 10-15% and after the change dropped to just under 1% to 1.5% peak. Aruba worked with us using their lab and team of engineers to test this theory and were able to scale to 200 clients. The math indicated that we could scale to 2000-2500 clients before running into data-path CPU issues.
Finally, the day of the company meeting had arrived, and we had all of our bases covered to gather as much data as possible. We were optimistic we were going to be successful. The company meeting started, and we saw the user counts climbing. First 100, then 200, then 300 clients to a max of 310 clients. At 310 clients the controller data-path CPUs peaked at approximately 13%. There were no quality issues reported by HubSpot employees, and the company meeting was successful. In the end, with all of the information regarding high-density designs, the resolution came down to a lesser-known change to allow the processing to happen at the APs instead of the controller.