Cisco Prime - Application not starting - /var/log is full

Introduction

As far as I can remember, Cisco Prime Infrastructure has always had some weird issues when it comes to its file system and storage management. There have been lots of weird issues with the Prime software not starting up properly if its hard drive or a certain partition is completely full.

The Problem

One of the most common issues I have run into is that the /var/log partition of Prime reaches its maximum limit, which can cause crucial services and processes to crash, leaving you unable to access Prime’s administrative web UI. The /var/log partition mostly contains different kinds of log files that increase in size as time goes by, but it seems the Prime software itself is pretty bad at clearing these files every once in a while to manage their size, so instead the files will grow big enough that the /var/log partition runs out of space completely, and then your Prime-server can start acting weird.

Even restarting your Prime server completely will not fix this issue, regardless of if you are running Prime as a virtual server or on a hardware appliance.

If your Prime-server has filled up its /var/log partition, you will often be informed of this if you reboot your Prime-server and keep an eye on the console output during the boot process. Even though Prime-server usually has a lot of hard disk space to store all the data it collects from your wired and wireless networks, the /var/log partition is relatively small, only about 3 to 4 Gigabytes (GB) in allocated size.

!! Do note that you should only attempt the solution below if your Prime server is not currently covered by a support contract from Cisco TAC. If you are paying for support from Cisco TAC, you should of course contact them for help !!

The Solution

If your Prime-servers /var/log partition is completely full, or very close, the best thing you can do is to manually clear out the log files.

!! Before continuing with the steps below, you must confirm you have proper backups of your Prime-server’s application data or a VMware snapshot of the Prime-server !!

Enable Linux Shell Access

To properly take a look at the Prime-server’s file system, and to perform the solution to potential full partition problems, we need to enable Linux shell level access.

Log into your Prime server using either SSH or Console and an Administrator account.

If you have not enabled root access on your Prime server, you may do so with the command:

prime-inf-01# shell

This command will make you set a root/shell password if you have not done so before, make sure to document this password as you may need it for future use!

Check your /var/LOG partition status

Before we start deleting files to clear up space, let’s confirm that there actually is a problem in the first place.

Head into the Linux shell if you are not already in it:

prime-inf-01# shell

Run the command below to check out the current size of the files in /var/log:

ade # du -sh /var/log*
585M    /var/log/messages
550M    /var/log/secure
2.5G    /var/log/wtmp

In the example above, you can see that the content of /var/log/ is taking up over 3.5 GB of space if you add the files messages, secure, and wtmp together, which apparently is too much, according to the Prime application. I’ve seen Prime servers start acting weird when these files together take up between 3 and 4 GBs in size.

These log files contain different kinds of logs. The secure and wtmp log files contain information about successful and unsuccessful login attempts to the Prime server itself. The messages log contains various system error messages and other critical information that can be useful in the event of other server troubles.

Clear up log files

Before we start deleting log files directly inside the Linux operating system, head back to the Prime applications command line and stop the Prime server.

Type exit to get out of the Linux shell.

ade# exit

Stop the Prime Infrastructure services (this may take a few minutes):

prime-inf-01# ncs stop

Head back into the Linux shell:

prime-inf-01# shell

To clear out these files (which are log files), use the commands below first get root-level privilege using the sudo command, and then execute the following commands to go to the correct directory and write “nothing” into the log files, which will essentially reset their size to zero.

ade # sudo -i
ade # cd /var/log
ade # cat /dev/null > /var/log/messages
ade # cat /dev/null > /var/log/secure
ade # cat /dev/null > /var/log/wtmp

To confirm that the log files have been cleaned up, run the command below:

ade # df -kh

On the image above, it’s been about a month since the files in the /var/log partition were last cleared and it has accumulated some size again. Directly after you run the commands to clear the log files, your /var/ partition will probably take up way less space than what the image above displays.

Now that we are done cleaning up the log files, type exit to get out of the Linux shell.

ade# exit

Start the Prime Infrastructure services (this may take 20+ minutes):

prime-inf-01# ncs start

And you are done! Hopefully, now your Prime-server will start up like normal and the web-UI should be accessible once again.

Final notes

Keep track of your /var/ partitions usage from time to time to avoid running into the problematic situation of having a partition running out of space.

There are commands, like show disks, that can be run directly inside Prime (meaning you do not need to go down to the Linux shell to run it) but the information displayed by it is kind of lacking. It supposedly should show you if any internal filesystem is running out of space, as you can see on the image below.