I’ve spent the last few weeks investigating the alerts that the new v2 product will offer, in particular the new alerts that are not currently available in v1.
SQL Response v2 will provide a number of Overviews for SQL Server Instances summarising activity on those servers. For example, a number of Perfmon counters (Buffer Cache Hit Ratio etc) will be shown, allowing the user to track changes in these values over time.
However, this does not mean to say a DBA wants to be alerted every time each and every one of these counters goes over or below a given threshold. Many are interesting pieces of information that help an investigation (and hence should be shown on an Overview), but should not be triggers for alerts.
Also, of course, a number of alerts should be generated by binary events, rather than on continuous values crossing thresholds. For example, the v1 alert “SQL Server unreachable” is precisely this.
So we’re very interested in getting feedback on what you would like to be alerted on; to make this slightly easier, I’m splitting this posting in to 3 parts -
- Perfmon counters (machine and SQL Server),
- Non-perfmon SQL Server alerts (deadlocks, problem queries etc),
- The rest! (Here, I’ll cover areas such as replication, mirroring, job management, auditing and so on).
First of all – Perfmon counters. Of course, there is a myriad of these, and to alert on all of them would be overwhelming to a DBA. In particular, many counters coincide heavily, the same problem causing multiple perfmon spikes.
Which values do you look at regularly? I’m interested in anything that relates either to SQL Server or to the underlying OS/hardware. There are the standard four categories, if it helps:
- Disk I/O,
- but any feedback would be most welcome! What I’ve found particularly interesting in this area is that many DBAs consider certain counters to be either less informative than many believe or, worse, misleading. Are there any counters that you think are particularly misleading? What are the alternatives that really capture the problem on the server?
Thank you again for your help.