Figure 1-11 shows the core components of MOM 2005 that are involved with producing the end product of MOM—an alert—and the tools you can use to view and act on the alert. To introduce these components, let’s work backward through the system, tracing the path of the alert through the core components to its origin.
MOM 2005 gives you access to all the information it collects through the Operator console (point 1 in Figure 1-11). The Operator console is also where you will manage alerts and perform troubleshooting steps. You can also use the Web console (point 1a in Figure 1-11) for accessing the same information remotely, although you don’t get the same level of functionality.
The Operator console is based on the console that Microsoft’s internal IT group, the Operations and Technology Group (OTG), developed for its own use in working with MOM 2000 SP1.
In Figure 1-12, there are four panes, three of which you can display or hide at your discretion. On the lefthand side is the Alert Views pane. What you select here controls what you see in the middle two panes. In our example, All: Alert Views is selected and the resulting Alerts and Alert Details are shown in the middle two panes. On the far right side is the Tasks pane. When you select an object in the Tasks pane, you can execute that operation against the computer that has the focus in the middle two panes. For example, if you select the Ping object in the Tasks pane, the Ping command will execute against the computer named homemomserver3. You can think of the objects in the Tasks pane as buttons that cause an action to occur.
Alerts and Alert Details panes display all the details that MOM 2005 embeds in an alert. Initially, you’ll make the most use of the information contained on the Properties, Events, Product Knowledge, History, and Company Knowledge tabs.
The Properties tab contains detailed information about the alert, including the description, the name of the rule that created the alert, and the time it first and most recently occurred (in the case of a repeating alert).
The Events tab contains links to all the Windows Event log events (or events from other sources such as scripts or logfiles) that caused the alert to be triggered. This is invaluable when analyzing the causes of an alert.
The Product Knowledge tab contains a summary of the issue that caused the alert, possible technical causes of the issue, and proposed resolution steps. The information on this tab represents what the product team identified as the most likely causes
of the issue and the most likely resolutions. If you saw an alert about an Exchange server, the Product Knowledge tab would have the summary, causes, and resolution steps developed by the Microsoft Exchange team themselves.
The History tab keeps track of the life history of an alert from its creation in the Operator console to its resolution.
The Company Knowledge tab for each alert type starts out blank. Every time you resolve an alert, you will be prompted to record information about that alert. This is a useful way to capture the specific steps that you took to resolve the issue that caused the alert in the context of your environment. Start using this tab right away. Don’t record historical information here because when this alert comes up again, you will want to know the steps for resolutions, such as “stop IIS and reboot server when this problem occurs,” not “this happened because a contractor knocked over the equipment rack on the server floor that holds the Internet-facing router.”
The Web console (point 1a in Figure 1-11 and shown in Figure 1-13) displays only the information essential to managing your environment, alerts, computers in the managed environment, and Windows Events. You will be using this console when the Operator console is not available due to firewall issues or when working remotely. This will let you get enough information to determine if further action is necessary.
In this console, there are three panes: a views pane, a summary pane, and a details pane. The Web console communicates with the Data Access Service through the MOM 2005 Application Programming Interface (API) (point 2a in Figure 1-11). The MOM 2005 API is fully documented in the MOM 2005 SDK, available at http://www.microsoft.com/mom/downloads/sdk/default.mspx. The components of the API include:
MOM Connector Framework Version 2
MOM Managed Code Library
MOM Runtime Library
The DAS (point 2 in Figure 1-11) is the next component in the path back to the origin of an alert. The DAS is a Component Object Model (COM+) application that manages all read/write access to the operations database. The actions of the DAS enable the Operator console and the Web console to render information to the user. Because the DAS manages all communications between the user interfaces, the database, and the MOM service on the management server, it is responsible for enforcing permissions inside of MOM. This is done via COM+ roles and impersonation. When you install MOM 2005, you will have to specify a Windows account that the DAS will use to access the operations database. This account must be assigned the db_owner role in the MOM OnePoint database in SQL and have permit server access as a SQL server security login. Through the DAS and the DAS account, MOM 2005 manages the database by executing the stored procedure used for grooming the database tables, as well as data insert functions. The DAS communicates with the MOM 2005 service (MOMService.exe) over OLEDB.
MOM 2005 keeps two types of information in the operations database:
All configuration data for a management group is stored in the operations database. Because the data is stored centrally, it can be accessed by all the management servers in the management group. Information stored here includes management pack rules and their thresholds, agent configuration settings, and the global settings for the management group, such as which email server to route SMTP traffic through, and security settings.
This is the live data gathered from the agents and includes all events, performance monitor data, and alerts.
Before an alert appears in the Operator or Web consoles, it is written to the operations database. If you look in SQL Enterprise Manager, you will not find a MOM database. The actual name of the operations database is OnePoint, which is a leftover from the original versions of MOM when it was developed and marketed by NetIQ.
The operations database is a SQL 2000 database (point 3 in Figure 1-11) that is best kept small for performance reasons. The largest operations database supported by MOM 2005 is 30 GB, of which you should keep 40 percent as free space. This free space is required for successful execution of the stored procedures, namely the reindexing job.
The DAS communicates with the operations database over TCP/UDP port 1433. This is where the data that composes an alert actually lives. All modifications to that data, either through the user interfaces or through the MOM service, are persisted here until they are groomed out.
On the management server, the MOM service plays two roles: it is the MOM server (point 4 in Figure 1-11) and the MOM agent (point 5 in Figure 1-11). Both of these processes run under the MOMService.exe service on the management server and under the security context of NT Authority/Network Service, Local System on Windows Server 2003, or Local System only on Windows 2000 servers. Running the MOMService.exe process under any other security context is not supported. When the agent on the management server needs to take some action, it spawns an instance of MOMHost.exe that runs under the management server action account. In MOM 2005, the agent can launch multiple instances of MOMHost.exe, which is responsible for executing scripts and running managed code responses. If one of the MOMHost.exe instances hangs, the others and the MOMService.exe are unaffected.
The MOM 2005 server sits between the agents on the managed nodes, the agent on the management server, and the DAS. The MOM server communicates with the DAS using OLEDB calls, and with the agents on the managed nodes over TCP port 1270, RPC 135, and TCP/UDP 445.
The primary responsibility of the MOM server is to manage communications with the agents on the managed nodes. It sends configuration information down to the agents and receives operational data from them. Working with the management server agent , it consolidates the data from the agents on managed nodes and passes it to the DAS for insertion into the operations database. It is also responsible for computer discovery and pushing agents to computers. Alerts travel through the MOM server and can be modified there based on actions taken by the management server agent.
The management server is itself a managed node, and the management server agent is responsible for collecting all the event and performance data from the MOM management server. It then compares this data (event log, performance monitor, WMI data, etc.) to a set of criteria to determine if there is a match. If there is a match, the agent can execute a response. When an agent needs to execute a response, it spawns an instance of MOMHost.exe running under the management server’s action account credentials. Responses include generating an alert, or running a script against the management server itself or against a managed node. The agent can generate an email, transfer a file, or execute managed code as well.
What is unique about this agent is that it also uses the data stream coming from the managed nodes as one of its data providers. This gives it the ability to correlate alerts and events coming from multiple managed nodes and generate new alerts that reflect a significant event across a wider set of machines. This is precisely the case for the example “MOM Agent heartbeat failure” alert. The management server agent is expecting a heartbeat message from every remote agent (which run over UDP) at 15-second intervals. When it does not receive one, it generates the example alert.
The other responsibility that the management server agent has is the administration of remote agents and the monitoring of agentless-managed computers. In this capacity, the management server agent uses a MOMHost.exe instance to install agents remotely on discovered computers, run discovery tasks on remote computers, and update settings on remote agents. In the case of agentless-managed computers, this agent performs discovery of the computer’s role, remotely collects information from the computer, applies a set of criteria against the incoming data, and initiates responses based on matches to the criteria. Because it is performing these tasks remotely, there is a performance hit on the management server that needs to be planned for. Microsoft does not recommend performing agentless management against more than 60 machines per management group.
The MOM agent (point 5 in Figure 1-11) represents a remote agent. This is an agent that resides on a computer other than the management server. This agent receives all its configuration information from the MOM 2005 management server and sends its processed and filtered data to the MOM server on the management server. Just like the MOM agent on the management server, it creates MOMHost.exe processes for collecting data from the data providers. It then applies the criteria to this data and executes the appropriate response if a match is found, such as generating an alert. Below this are the data providers that you would manually examine for clues to the cause of an issue, if you don’t have an operations management system in place. Agents perform the bulk of the work in MOM 2005 and hopefully send only actionable information up to the management server.
So, at this point you are asking yourself, how does an agent know what criteria to apply to a set of data and what response to take? Rules applied by agents are imported into the management group in files called management packs.
Every server-based application that Microsoft releases has a management pack. There are management packs for operating systems, Exchange 2000 and 2003, MOM 2005, and SMS 2.0 and 2003, to name a few. At the time of this writing, Microsoft is shipping 55 different management packs, with more being added.
In each management pack are application definitions that the agents use to identify the role of a computer, such as it being a domain controller or a MOM server or an Internet Information Server (IIS). The agents determine this based on the discovery process that they execute. The discovery process looks at a number of computer attributes such as registry values and the existence of certain files, directories, and registered services. Once a computer role has been identified, that computer is placed in a computer group inside of MOM.
A computer group in MOM is not a computer security group in Active Directory and is used only inside of MOM. Membership in MOM computer groups is dynamic, based on the discovered role of a computer.
Computers always belong to more than one MOM computer group—this is normal. For example, homemembersrvr in Figure 1-14 belongs to six computer groups.
In each management pack there are also three groupings of rules:
These rules tell an agent to collect information from various event logs, to filter out certain types of alerts, to look for missing events, to consolidate multiple alerts with similar characteristics, and to suppress duplicate alerts. Figure 1-15 shows an event rule.
These rules instruct an agent to either sample and report specific performance monitor data or to generate an alert when the performance monitor data cross over a threshold defined in the rule.
Rather than having each event or performance rule generate its own alert, a single alert rule can instruct an agent to generate a response for an entire group of rules.
Each rule identifies the provider from which data will be examined, the criteria to look for, the response to take when a match between the raw data and the criteria is found, and the vendor and company knowledge. This is where the text on the Product Knowledge tab of an alert in the Operator console comes from. The rules are defined by the product teams themselves and as such represent their definition of health for the application. Figure 1-15 shows the General tab of the “Agent heartbeat failure” rule.
Each computer group can have one or more rule groups associated with it. It is the responsibility of the MOM server to assign the correct set of rules to each agent it manages, based on the role of the computer. Only those rules that are associated with a computer group to which a computer belongs are sent to the agent on that computer. This is how MOM ensures that an agent is only processing the necessary rules for the computer it monitors.
The last items included in a management pack are definitions for the views in the Operator console and predefined reports that will be viewed in the Reporting console.
You are already familiar with the data providers on a Windows server (point 7 in Figure 1-11). These are the tools you examine manually now. They include the event logs and performance monitor counters and data collected from WMI and application logfiles. This is the raw data used to generate an alert, but you won’t find alerts here.
Data provider objects are defined in management packs. An individual provider can be used by many rules. For example, event logs are defined as individual providers and are used as the data source or provider by many event rules.