Testing Siklu Terragraph hardware
/In July of this year, we were selected to field test Siklu’s entrance into the new 60 GHz 802.11ay standard, also referred to as Facebook Terragraph. This is a multi-point technology that has both the benefits of 60 GHz spectrum speed and GPS synchronization to help with self interference. The Terragraph “magic” allows meshing of nodes (called DNs or distribution nodes). I am not going to go into the details of Terragraph or the technology. There are far smarter people than me out there to better describe it. I am going to describe the hardware and our deployment.
We have been eyeing this technology for a few months now. We have a couple locations in our downtown area where we have enough 802.11ad gear deployed that we are seeing self interference. We have gear deployed from Mikrotik, IgniteNet and Kwikbit in both point to point and point to multi-point configurations. Our experience with 802.11ad multi-point has been less than impressive - with Kwikbit being the best of the bunch. We fully understand the limits of the spectrum. We have a hard stop on installations beyond 200 meters in multi-point. We typically run high gain dishes as clients. We don’t have rain fade issues, we have beam forming issues, firmware issues, interference issues (self and otherwise) and generally poor performance. Your mileage will very and I only speak for our deployments. Our luck in point to point is much better with all vendors. It’s the multi-point that is not great.
We have good fiber and 10G licensed backhauls in our core downtown. From here, we were looking to put up 802.11ay gear to serve as multi-point backhauls to our customers around this core area. These customers are small businesses, condo buildings and apartment buildings. We try to provide at least 700 Mbps service to these properties but prefer 1 Gbps or more.
Our hope was one 360 degree 802.11ay radio could replace 4 to 6 existing 802.11ad radios. This would clean up our spectrum and simplify future deployment. It may also allow for future meshing of additional distributions nodes.
Enter Siklu Multihaul TG N366N distribution node (DN) and the T265 client node (CN). The N366 is a 360 degree radio with four 90 degree sector antennas. Each sector antenna could be on a different channel. The equipment supports channels 1-4 but does not support channel bonding as of the writing of this post. The T265 is a 90 degree client radio with a beam forming antenna.
What we liked about the Siklu product is the 360 degree DN radio. Where we were looking to deploy, we already had clients in all directions. Second, the Siklu product did not rely on IPv6 and also supported layer 2 bridging right out of the box. This was in line with our current deployment method. We also have Siklu products in our portfolio and are familiar with their build quality.
For an in depth video review of the unboxing and build quality, see our review here:
Build quality is in line with Siklu’s other products. Firmware is still early. We are running 1.1.0 and will focus on that version for this review.
After some channel planning, we decided to run with the suggested A-B-A-B channel plan for the 4 sectors. We are using channels 2 and 4 in our configuration. We have seven client sites selected with the furthest site being 180 meters from the DN. All within the specifications of the equipment and the technology.
Take a look at our installation video for both the DN and a client site.
Configuration of the system is pretty straightforward. Out of the box, our hardware was running an early release that did not have a mature web interface. So, we logged in via CLI and issued a command to load new firmware via FTP to the radio and then rebooted.
Once 1.1.0 was loaded, all future configuration can be done via the web interface. Very little configuration needs to be done for the system to work. We started with the DN. We first named the DN with a unique 8 character name (can be less) and then rebooted it. Next, we added our management IPs. Like other Siklu gear, the MultiHaul TN product line can have multiple IPs and VLANs on every interface. This allows us to program it using the default IP and not have to reboot and reconfigure our computer to move IPs.
Next, we assigned radio channels to each of the 4 sector antennas. There is one final step on the DN - entering your links but more on that later.
Next, take a T265 out of the box. Again, due to old firmware, we logged in via CLI and upgraded firmware. Now, log in to the web interface. Only one step is required - give the CN a unique 8 character name. That is the the name that the DN will use to connect and authorize the radio. You can get into changing the SSID, encryption keys, etc but we did not and I would not recommend it. You can give the CN an IP as well but it is not required to connect it.
Once you have the 8 character (or less) unique name of the CN, you now build a link in your DN. It is as simple as adding that unique CN name and telling the DN what sector antenna it should see. You can select 1 to 4 of them so if you are not sure, select the antennas you think it will connect to and then edit this once it connects.
That’s basically it. Repeat this for every CN you want to deploy. Just like other Siklu gear, you can get very granular with IPs, VLANs, access ports, etc. You build bridges, add interfaces, untag VLANs, etc. All very Siklu friendly.
Performance
What have we seen for throughput and performance with the system? Well, that has been a bit of a mixed bag. Let me start by saying Siklu engineers and support staff have been nothing short of wonderful to work with. They are responding to support emails at 11:00pm on a Saturday night, weekends, you name it. This is also a product early in its life and one where the manufacturers (Siklu, Cambium, IgniteNet) are taking a radio standard built by Facebook and engineering that into their product with their firmware. Not a simple task.
So, with that in mind, we have had some issues. Our first DN would disconnect clients every few days for no reason that we could discover. That DN was replaced with RMA and the new DN has not had those issues with over 2 months of uptime. Potentially a GPS related hardware issue that impacted a a small number of devices.
Second issue we have seen is throughput related and this is currently being blamed on the Terragraph standard itself - but we are told a firmware fix is pending. What we see is if you have a single CN on a sector, you get ~ 1 Gbps of actual TCP throughput. As soon as you add a second radio to that same sector, your throughput is cut by about 50%. It does not matter if that second radio is actually installed. As soon as it is administratively added to the DN as a link and activated, the throughput is cut.
Here is a TCP speed test from a laptop plugged directly into the T265 CN with a speedtest.net test back to our own server on our network. This is using the speedtest.net app, not the web interface. We are not seeing full ~950 Mbps of the laptop port due to some network congestion on this link.
Now, we administratively turn on a second CN to that same sector on the DN and this is the next speed test:
We can duplicate this result on every sector with every channel and using speedtest.net tests or TCP bandwidth testing in our Mikrotik routers just going through the link itself. It show very consistent 50% drop in throughput as soon as the 2nd (or third) link is activated in the DN on that sector. Just have one link on a sector? That sector will see full speed performance. Does not matter what the other sectors are doing.
If / when this is fixed with a firmware upgrade, I will post that information.
Conclusion
I love the hardware, I love the software and I love the concept of what Terragraph brings. But, from a performance standpoint, we are not there yet. I think that will change and we are leaving this gear up since it fixes our self interference issues, but I look forward to the speed bosts that are promised.
Update November 30, 2021
We have continued to work with Siklu on the speeds and the GPS issues.
On the speeds, firmware 1.1.4 was released and it supports flexible bandwidth control. Running some speed tests with 1.1.4, we are seeing the speed issues improve greatly. We can now just about max out the 1G Ethernet port on our testing laptop running TCP bandwidth tests back to our core test server. I would say the flexible bandwidth is working well.
On the GPS issue, we continue to have DN reboots and CN disconnects up through firmware 1.1.4. Siklu believes we may have faulty hardware on the DN. We have been sent our 4th and 5th DNs for testing. We already have RMAd one, we bought a third as a spare and both the second (RMA replacement) and the third will reboot randomly. Logs are showing it is related to GPS reception. There is a watchdog in the radio that will eventually reboot the DN if it looses GPS sync for a set period of time. We know there are no obstructions blocking GPS signals to this radio. We went so far as to put anti-bird spikes on the top of the radio to keep birds from landing on it and potentially blocking GPS signals. So, they sent us two more DNs so we can RMA #2 and #3 and send them in for support investigation.
However, we have been working on this test since July. We have yet to go more than a week without some piece of hardware rebooting. With snow coming, we are discussing what our next steps will be. Some of these radios will not be physically accessible once there is snow on roofs. We need to decide if we are going to leave these radios up through the winter…
I know at least one other operator with many more radios up than we have and they are not seeing issues like ours. We might just be in a weird spot for this test where some sort of RF issue is impacting these. I can’t explain why we have ongoing troubles and others do not. Siklu engineers are at least publicly baffled as well. I don’t think there is a system wide issue with Siklu Terragraph but we have not had a great experiment at our test location.
January 4, 2022 update
At the end of 2021, we made the decision to remove the Siklu Terragraph equipment from our network. We continued to have issues with both DNs and CNs rebooting on us. Siklu was incredibly responsive to this issue and event sent people onsite to troubleshoot. In the end, we needed to stabilize this part of our network and move on to other projects. We initially thought this test would be a month or so and we would move on. Six months later, we were still climbing on roofs and replacing hardware to try to narrow down what was going on. As a small shop, we had to move on and divert our resources to new projects. We continue to be very happy with our EtherHaul and MultiHaul systems we have in production but we just could'’t get the bugs worked out of the Terragraph at this location. I know other operators that have not had a single issue like ours.