Mikrotik, Ipv6, Queue trees + PCQs


#1

Trying to implement Ipv6 in sonar, but can’t seem to get it to work within our system. I have ipv6 address lists and mangle setup, but a customer can still get double speed by using ipv4 +ipv6. Anyone have any suggestions on how to get the ipv4 + ipv6 to work well through PCQs on mikrotik?


#2

This issue will always exist with PCQ on MikroTik since there is no way for the queue to know that the IPv6 customer is the same as the IPv4 customer. The queue may not be on a router that is directly connected to the customers, and if that is the case, then there is no way for it to inspect the MAC for the IPv4 and IPv6 packets to somehow associate the customer and realize that it is the same user. It is simply not possible for MikroTik to ‘fix this’ somehow. Eventually it will simply not be an issue again, when customers in the future are on IPv6-only (maybe with NAT64) instead of dual-stack.

Beginning in RouterOS 6.43, there is a new dual-stack simple queue option via DHCP, which is probably the best way of handling this situation short of deploying PPPoE. The dual-stack simple queue option is handled through RADIUS-enabled DHCP and DHCPv6, where the customer does not “authenticate” but instead there is a RADIUS ‘user’ that is the customer’s MAC address, which has their rate limit assigned. Going forward, this will be the only way to have a single rate limit apply to both IPv4 and IPv6 traffic for a single customer without PPPoE.


#3

Thanks for the help! I would prefer not to use a simple queue for each customer, for scalability reasons. I also don’t run a central pppoe for scalability reasons, and we don’t limit at the CPE simply because sonar doesn’t integrate in that fashion very well. We will instead need to use a different router and eventually run everything through nat64. More and more I am believing nat64 is really the only way forward for us. Issue is mikrotik doesn’t support it, so we’ll need a different NAT router(which’ll also do the rate limiting).
The complexity comes from having full redundancy throughout our network with different vendors for our edges, NAT and core routers, as now there’s another router in between that can fail.


#4

Simple queues are perfectly scaleable - we have hundreds of them (600 to be precise) on a single router and the CPU usage is fine. I’m sure we could have double the number of simple queues and the router CPU would only go up 1 or 2%. How many customers do you have per router?

The challenge with something like NAT64 at the moment is that so much Internet traffic is still IPv4-only that there is going to be an enormous load placed on a NAT64 box. I think you are simply not going to be able to have a single box both doing your routing and your NAT64, it could easily overwhelm it.


#5

In fact, simple queues per customer should actually be more scaleable than any kind of queue tree/PCQ setup.


#6

I’ve heard that simple queues hit a brick wall around 1000+ of them or so. I was under the impression(not through experience, but other’s input), that queue trees and pcqs are the most scalable form. not withstanding because a queue tree could also just be a set of simple queues, if you don’t care about priorities. But More so because the PCQ queue type is as low level as you can get for bandwidth shaping on a mikrotik? as I understand it a simple queue is an object that is treated separately, so you have an array of objects(that can have varying speeds, queue types and other things), to iterate over, each object is treated separately. whereas PCQs are inherently an object of arrays(where a set of connections of a similar type can be dealt with in the same way), and optimizations are made on an OS level to deal with that?
Does sonar have the ability to automatically add a mac address into radius, as it does for address lists? We are okay with the current flow of assign inventory item, them assign an IP to inventory item. now we need to copy the mac address into a radius account and then assign an IP to the radius account. how would adding ipv6 affect this flow? considering dhcpv6 isn’t really mac based(at least not in router OS), and sonar cannot assign more than one IP address per MAC. Or perhaps having a radius authenticated MAC is the only way to assign an ipv6 prefix?


#7

Theoretically, if everything were perfect, yes PCQ is more scaleable than simple queues, because as you say, it works at a very low level.

The problem comes in when you factor in the way RouterOS threads individual queues. Whenever you have a queue tree or a simple queue, the top level parent and all children all share a single thread, and therefore on a CCR will only utilize a single CPU core (ex. all queue trees with GLOBAL parent will be processed by a single thread). Because each CPU core is relatively weak on a CCR, you will end up hitting a brick wall on the bandwidth much too early.

When you use simple queues, certainly the overall CPU usage will be higher than using a PCQ queue tree, but because it is a flat list and each is a top-level parent, the CPU usage can be spread across all of the cores instead of only placing the load on one core. In addition, you no longer need mangle rules, which saves you a different source of CPU utilization.

It used to be with RouterOS 5 that adding simple queues would degrade performance greatly, because it used to iterate through the list of simple queues the way you describe, and this iteration was very processor intensive. However it changed in RouterOS 6 so that, instead of iterating through the list of simple queues, the router builds a hash-table for very rapid lookup, such that the number of simple queues that a single router supports is no longer so limited.

Sonar does not currently have the ability to automatically add a mac address into RADIUS to my knowledge, you would do it manually by creating a RADIUS account with the MAC as the username. This feature is very new in Mikrotik, so I have not tested it yet myself (I have only read the wiki pages).

Some routers derive the DUID from the MAC address, if this is the case, the MikroTik can link the MAC with the DUID. Alternatively, the client router may support the RFC where it passes the DUID for v6 via a v4 DHCP option, and the MikroTik can also use this to properly associate the customer for a dual stack queue.


#8

You could also do something like divide your customer IP space into 36 approximately equal size ‘subnets’, and create 36 simple queues, each of which does PCQ for that ‘subnet’. Then you would have PCQ working on all cores on a 36-core router. That would probably be the least CPU intensive option when it comes to queueing, but would still require that you mark the packets with mangle, unless you place the customers into subnets based on their package. That solution wouldn’t give you dual stack, though.


#9

Thanks! in all my reading nowhere did I ever read that the HTB is single threaded. normally when we’re hitting issues was because 100% cpu usage(mangle rules actually). Now I completely understand why global is not suggested to be master.


#10

If you use the per interface queues (like our generator creates) then you’ll avoid the all queues on one core issue. That’s one reason I mentioned earlier making sure you’re not on ‘global’.


#11

yes, but I forgot to mention that we only have one interface going to a 10gbps switch. I will split into VLANs for load leveling


#12

Yes, that helps somewhat, but if your customers are concentrated in just one or two interfaces, and the router is high traffic, it doesn’t necessarily solve the issue. For instance, if your customers are all connected to ether2, your queue tree structure attached to interface ether2 will still be stuck on a single core, and you can still hit a brick wall.


#13

Yeah, if you’re centralizing it all, it’s not a good fix, but it’s generally still better than everything being global.


#14

Yes, I agree completely that is is better than global, but if most customer traffic is going through just one or two interfaces, using PCQ via a series of simple queues can be a better solution since it allows you to divide customers up into groups (based on IP range that they fall into) that will each be assigned to a different thread (and core) without actually having to split them to different interfaces, and therefore you can get a more flexible setup that better distributes the load across multiple cores of a CCR router.


#15

tbh, they(mikrotik) just need to update their linux kernel and use fq_codel… If they’re going through the effort to actually make their TILE processors more scalable, which i doubt They’re probably just going single threaded going forward, and instead investing into better single thread processors(like their ARM line on the 4011).


#16

They are going to update their kernel and support fq_codel in RouterOS 7, but this is taking a while to come out mainly because to avoid having too many changes at once (and too many new bugs at once) they are trying to backport some of the biggest changes (ex. the new bridge code) to ROS 6.

They are also working on new high end service provider routers that will be more powerful than the CCR1072 using some newer architecture (no longer TILE). I expect that will probably have fewer cores that are each much more powerful than a single core on a CCR.


#17

So, Soon™