Sonar thinks radio is down - but is not


#1

Explain this scenario… On one of our towers, Sonar Poller believes one of the SMs is down (the rest it thinks are up). It says both ICMP and SNMP are down. However, it is reporting real time SNMP metrics - on the radio that it says is down. I can ping the radio from multiple places and it responds just fine.

What is the poller doing with this one radio? Here is a screen shot of Sonar and a ping along side:

And a shot from Sonar for the very radio it says no SMNP or ICMP data:

Radio is an ePMP Force 180. It is the only radio in our entire system Poller thinks is down (no radios are down). We’ve upgraded firmware in the radio and rebooted it. No effect with Poller.


#2

Is your manual ping coming from the poller server? Can you do a ping from the CPE to the poller server? It’s possible there is a firewall issue between the poller and the CPE. Pinging from a workstation wouldn’t necessarily show that issue. You might look for fail2ban type issues with the firewall on the poller.

That’s my best WAG in the dark.


#3

The graph you posted is only showing SNMP intermittently down, not permanently. I would guess that is just overly short timeouts/retries.

Scott is correct, you need to try a ping from the poller server. I would bet that it isn’t able to reach that radio IP for whatever reason.


#4

Radio can ping the poller just fine. Poller can not ping the radio but it will ping radios on either side of this radios IP. No firewall on the radio and devices other than the poller can ping it fine.

What I did discover is at some point, that poller was upgraded to Ubuntu 18.04 (from 16.x). But, it’s working fine with over a hundred devices - this is the only device it is having an issue with. Our PRTG install has no problem gathering ICMP and SNMP data from that radio so I don’t think it’s an issue with the radio.


#5

Can the poller traceroute to the radio? You haven’t shown any troubleshooting to narrow down the problem.

It could be something as nutty as a netmask issue on the poller. See if traceroute even goes the same way as when pointed at the radios on either side. What is the last octet of the radio you cannot reach? What is the last octet of the radios you can reach?

Are there any NAT rules or more specific routes on the poller or other devices between that could be messing with flow?


#6

You’re right, not super helpful… The poller is connected directly to our router and is on the same /23 subnet that our management interfaces of the radios are on so a trace route fails instantly since the next hop would be the radio.

I rebooted the poller and it returned valid SNMP and ICMP data for about 90 seconds and then stopped again. It continues to poll all other devices on this /23.

There is no firewall on this /23 in our router and the firewall on the poller firewall is only blocking ssh access from any source outside of our /23. No NAT rules in effect since devices are on the same subnet. Poller is getting address via DHCP just as the radios are.


#7

Last octets? In the past, some TCP stacks have had issues with being on what would have been a network or broadcast address in the old class network addressing system. A current TCP/IP stack should not have those issues. We ask for specifics because they matter, often when we didn’t think they would. There was once an issue, before my time, which broke e-mail if the recipient was more than 500 miles away.

Unless you have enough ports for the radios to also be directly connected to the router, there is another piece of equipment involved.

Three possibilities off the top of my head are:

  1. Another device is using the same MAC address.
  2. Another device is using the same IP address.
  3. A switch’s MAC forwarding table may be blowing up.

Does the MAC for the radio in the router’s ARP table stay the same when it works and when it doesn’t? You’ll also have to look at the ARP tables on the poller and the switch. Does the poller’s MAC address stay the same in the radio’s ARP table? What is the router? Firmware? How is the bridge inside the router configured to talk to the poller and the rest of the /23 subnet? Is there a DSTNAT rule in the router to allow you to access the poller from off-net?

But none of this applies to Sonar, so the thread on here should probably just fade away.

Have you tried changing the radio’s IP for a while?


#8

I’ve had specific issues with ePMP radios and the polling system. In my case, it was always with 5 GHz integrated units. [Based on my limited digging there, I have thought there was a poller related bug relative to the integrated units… but without official confirmation.] I have not had any issues with Force 180 radios, though. Are you using Option 82 or do you have multiple addresses associated with the account?


#9

I moved the radio to a different IP address in the same subnet (had it renew a DHCP lease after pulling the MAC from the router table) and it’s polling again. I still think this is a poller issue since that was the only device having this issue but it’s working again…