Cisco Catalyst 9K Output Drops Quick Fix

In this article, we look at a golden command that can help you fix output drops in certain scenarios for Cisco Catalyst 9K switches by adjusting the available shared buffer pool by changing the Softmax Multiplier parameter.

Introduction

Seeing output drops on the interfaces of your Cisco Catalyst switches and routers is never a good time and tracking down the root cause can be tricky. However, there is one command that could mitigate or even solve all of these issues in one go for the Cisco Catalyst 9K family of devices, including the Cisco Catalyst 9200, 9300, and 9500 that I have had access to in my lab environment.

The Problem

After reviewing several of Cisco’s documentation and going through thread after thread on the various networking forums regarding output drops on various platforms, the consensus seems to be that the default amount of buffer memory that interfaces can “borrow” from the shared buffer pool is way too small.

Depending on your switch/router model, this value can be tweaked to allow interfaces to use way more of the shared buffer memory pool to help deal with microbursts, which is one of the major causes of output drops. Microbursts can cause output drops even in situations where the traffic being sent between two interfaces is way less than the interfaces’ line-rate.

As long as you don’t plan on maxing out every interface of your switch at the same time, borrowing buffer memory temporarily from the global pool is a great way for the switch to run efficiently.

You can also run into output drops when traffic is coming into an interface with a higher line-rate and then going out an interface with a lower line-rate.

The image below describes these two scenarios.

Scenario #1

The first image shows a scenario where two clients are trying to send a lot of data to the same server at the same time, causing the interface towards the server to drop packets due to congestion. Since both clients cannot send 1 gigabit per second of traffic to the server at the same time, some frames will end up in the buffer and eventually be discarded by the switch port.

Scenario #2

The second scenario shown below demonstrates that traffic can be dropped when going out from an interface (1 Gbit/s) if the originating traffic came in on an interface that has a higher line-rate (10 Gbit/s).

Even if the data stream itself between the two hosts is way less than 1 Gbit/s, you can still see output drops because of the difference in frame pacing, meaning that frames at the 10 Gbit/s interface arrive 10 times faster than they can go out the 1 Gbit/s interface.

A (Possible) Solution

Changing the available shared pool of buffer memory is done easily using the command “qos queue-softmax-multiplier <X>”, where X is a percentage of total global buffer memory available. This command is configured globally and changes the settings on all interfaces.

The default value is 100 (meaning 100%) and can be increased depending on your specific 9K model. For example, on the C9500-48Y4Q switch running version 17.3.5 in my lab you can tune the softmax multiplier to between 100 all the way up to 4800 while a C9300-48P and a C9200L-48P-4X running code 17.3.4 can only tune the softmax multiplier between 100 and 1200.

Changing the softmax multiplier did not require a reload on the 9500 models of switches, but could be required on other older models or software.

SW-9500(config)# qos queue-softmax-multiplier <100-4800>

So, what value should you increase the softmax multiplier to?

This one is hard to answer, as there isn’t any specific best practice recommendation from Cisco, from what I can see on their website. However, there are some good discussions in their support forums with people that have made it a habit to enable the softmax multiplier by putting it in their standard configuration templates and have seen good results.

In earlier IOS-XE versions the maximum softmax multiplier value for 9500 High-Performance models of switches was 1200, so maybe that’s a good value to start with initially. For the 9200, 9300, and “regular” 9500 models of switches I would not use 1200 (which is their maximum) right off the bat, see if you can sort out your output drops using a lower value like 300 or 400 instead.

In my limited testing in my lab, I tried doing a combination of the two scenarios above by having two clients connected to the left switch in scenario #2 and trying to maximize iPerf TCP sessions from the two hosts on the left to the client on the right side.

Of course, this led to a lot of output drops at the 1-gigabit interface facing the client on the right, but adjusting the softmax multiplier to just “300” almost got rid of all the output drops, allowing the data stream to naturally adjust to the congestion thanks to the TCP windows size control mechanism, instead of having the switch just drop a lot of packets to fit the two data stream coming from the left clients and sending them out the interface to the right client.

Verification

One tricky part about the softmax multiplier command is that it does not show up in the running configuration, not even if you use the “show run all” command.

EDIT: seems that the “qos queue-softmax-multiplier” command does show up in some 9K models and software. If it does not show up for you, continue reading down below.

Instead, you need to use a show command to display the current interface buffers to decipher which multiplier percentage was configured. To see the current interface buffers, use the commands below and look for the first two values in the Softmax column.

You may need to use some modified version of the commands below depending on your switch setup (StackWise Virtual, standard StackWise, or standalone switches), use the “?” after every parameter to see what the next possible command is and I’m sure you’ll figure it out.

In my lab, I mostly used 9500 High-Performance switches in different configurations, so I had to use the commands below.

For non-StackWise Virtual switches

show platform hardware fed active qos queue config interface <interface>

For StackWise Virtual switches

show platform hardware fed switch <active/standby> qos queue config interface <interface>

The default value here is 448 for queue 0 (control packet queue) and 672 for queue 1 (data packet queue) for interface Twe1/0/1, as seen in the image below.

Since the values are 448/672 by default, which means when the softmax multiplier is set to “100”, they should double if we add another 100% to the multiplier using the ”qos queue-softmax-multiplier 200” command.

Re-running the show command, we can now see that the softmax interface buffer size has increased (doubled) to 896 for queue 0 and 1344 for queue 1 for interface Twe1/0/1.

If you are currently suffering from output drops visible using the “show interface <interface> counters errors” command, reset the counters using the “clear counters <interface>”, apply the softmax multiplier change, and then monitor the interface for output drops to see if the configuration change has helped.

References

Cisco has a couple of great articles going into the depts of traffic queues, QoS, and their relevance to the “qos queue-softmax-multiplier” command.

Troubleshoot Output Drops on Catalyst 9000

Catalyst 3850: Troubleshooting Output drops

Cisco Catalyst 9000 Switching Platforms: QoS and Queuing White Paper