In today’s complex network of routers, switches, and servers, it can seem like a daunting task to manage all the devices on your network and make sure they’re not only up and running but performing optimally. This is where the Simple Network Management Protocol (SNMP) can help. SNMP was introduced in 1988 to meet the growing need for a standard for managing Internet Protocol (IP) devices. SNMP provides its users with a “simple” set of operations that allows these devices to be managed remotely.
This book is aimed toward system administrators who would like to begin using SNMP to manage their servers or routers, but who lack the knowledge or understanding to do so. We try to give you a basic understanding of what SNMP is and how it works; beyond that, we show you how to put SNMP into practice, using a number of widely available tools. Above all, we want this to be a practical book -- a book that helps you keep track of what your network is doing.
The core of SNMP is a simple set of operations (and the information these operations gather) that gives administrators the ability to change the state of some SNMP-based device. For example, you can use SNMP to shut down an interface on your router or check the speed at which your Ethernet interface is operating. SNMP can even monitor the temperature on your switch and warn you when it is too high.
SNMP usually is associated with managing routers, but it’s important to understand that it can be used to manage many types of devices. While SNMP’s predecessor, the Simple Gateway Management Protocol (SGMP), was developed to manage Internet routers, SNMP can be used to manage Unix systems, Windows systems, printers, modem racks, power supplies, and more. Any device running software that allows the retrieval of SNMP information can be managed. This includes not only physical devices but also software, such as web servers and databases.
Another aspect of network management is network monitoring; that is, monitoring an entire network as opposed to individual routers, hosts, and other devices. Remote Network Monitoring (RMON) was developed to help us understand how the network itself is functioning, as well as how individual devices on the network are affecting the network as a whole. It can be used to monitor not only LAN traffic, but WAN interfaces as well. We discuss RMON in more detail later in this chapter and in Chapter 2.
Before going any further, let’s look at a before-and-after scenario that shows how SNMP can make a difference in an organization.
Let’s say that you have a network of 100 machines running various operating systems. Several machines are file servers, a few others are print servers, another is running software that verifies credit card transactions (presumably from a web-based ordering system), and the rest are personal workstations. In addition, there are various switches and routers that help keep the actual network going. A T1 circuit connects the company to the global Internet, and there is a private connection to the credit card verification system.
What happens when one of the file servers crashes? If it happens in the middle of the workweek, it is likely that the people using it will notice and the appropriate administrator will be called to fix it. But what if it happens after everyone has gone home, including the administrators, or over the weekend?
What if the private connection to the credit card verification system goes down at 10 p.m. on Friday and isn’t restored until Monday morning? If the problem was faulty hardware and could have been fixed by swapping out a card or replacing a router, thousands of dollars in web site sales could have been lost for no reason. Likewise, if the T1 circuit to the Internet goes down, it could adversely affect the amount of sales generated by individuals accessing your web site and placing orders.
These are obviously serious problems -- problems that can conceivably affect the survival of your business. This is where SNMP comes in. Instead of waiting for someone to notice that something is wrong and locate the person responsible for fixing the problem (which may not happen until Monday morning, if the problem occurs over the weekend), SNMP allows you to monitor your network constantly, even when you’re not there. For example, it will notice if the number of bad packets coming through one of your router’s interfaces is gradually increasing, suggesting that the router is about to fail. You can arrange to be notified automatically when failure seems imminent, so you can fix the router before it actually breaks. You can also arrange to be notified if the credit card processor appears to get hung -- you may even be able to fix it from home. And if nothing goes wrong, you can return to the office on Monday morning knowing there won’t be any surprises.
There might not be quite as much glory in fixing problems before they occur, but you and your management will rest more easily. We can’t tell you how to translate that into a higher salary -- sometimes it’s better to be the guy who rushes in and fixes things in the middle of a crisis, rather than the guy who makes sure the crisis never occurs. But SNMP does enable you to keep logs that prove your network is running reliably and show when you took action to avert an impending crisis.
Implementing a network-management system can mean adding more staff to handle the increased load of maintaining and operating such an environment. At the same time, adding this type of monitoring should, in most cases, reduce the workload of your system-administration staff. You will need:
Staff to maintain the management station. This includes ensuring the management station is configured to properly handle events from SNMP-capable devices.
Staff to maintain the SNMP-capable devices. This includes making sure that workstations and servers can communicate with the management station.
Staff to watch and fix the network. This group is usually called a Network Operations Center (NOC) and is staffed 24/7. An alternative to 24/7 staffing is to implement rotating pager duty, where one person is on call at all times, but not necessarily present in the office. Pager duty works only in smaller networked environments, in which a network outage can wait for someone to drive into the office and fix the problem.
There is no way to predetermine how many staff members you will need to maintain a management system. The size of the staff will vary depending on the size and complexity of the network you’re managing. Some of the larger Internet backbone providers have 70 or more people in their NOCs, while others have only one.
Get Essential SNMP now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.