Alert Notification

When SQL Response detects a problem on a server, it raises an alert. This alert contains a summary of the problem and information about the state of the system when the alert was generated. The alert will always appear in the main user interface, but you can also choose, on an alert by alert basis, to be notified via email.

In any monitoring tool there’s a fine line between alerting too often and not enough. In V1, to try and avoid bombarding you with too many alerts, we came up with the concept of ‘occurrences’. Once SQL Response v1 raises an alert, that alert is not re-raised, even if the same problem is detected again, until the alert has been cleared. Instead, the number of occurrences of that alert is incremented by one. The main problem with this approach is that if you have not cleared the original alert you cannot receive another notification of the alert.

Reminder emails
In V2 we are considering abandoning the concept of occurrences; instead SQL Response will create a new alert each time a problem occurs. We could potentially go one step further and allow multiple reminder emails, say every 10 minutes, to be sent until the problem has been acknowledged or resolved.

Email digests
Something else we are considering implementing are alert digest emails. These would send you hourly, daily or weekly reports of all the alerts that had happened during that period. For example, you could potentially opt to be sent only a single email once an hour summarizing all the minor alerts that had happened during that hour, rather than receive a separate email each and every time an alert is raised. Digest emails could be used instead of, or alongside, regular email notification.

Your ideas
How do you prefer to be notified about the alerts raised by your monitoring tool? Do you prefer using the main interface of the actual monitoring application, or do you rely on emails? What notification process do you have set up for out of hours alerts?

We’re very keen to hear about how you’re currently notified about problems, and also, in an ideal world, what your notification process would be (if the two are different!). The more you tell us, the more able we’ll be to develop something that caters for your needs.

This entry was posted in Alert and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

8 Comments

  1. Jonathan Allen
    Posted December 4, 2009 at 4:07 pm | Permalink

    Hi guys,

    Sorry for the delay in getting anything by way of reply to this one. It’s a big question. I think different sites will need different options so the standard DBA answer to any question applies – “it depends”! Personally I dont have a 24/7 responsibility for the majority of the year so being on a pager/SMS isnt needed but I can see that a server dropping off the network could need a rapid response for some places. In some cases you would then of course need a ‘stand down’ message if the situation resolves itself.

    Aggregating the alerts into an hourly emailed report would be good, per server. Certainly one of the things with v1 is that when I open the UI and see an alert, its the most recent alert and once I clear that all of those alerts are gone(OK I could choose to view cleared alerts) but I didnt actually clear the 3 previous long running SQLs I just wanted to clear this (most recent) one.

    Getting an alert summary emailed when there have been X (10,20,50) alerts of the same type would also help, it would bring the issue to the inbox sooner that the next hourly digest.

    • Adam
      Posted December 10, 2009 at 12:29 pm | Permalink

      We like the ‘standard DBA answer’ – “It depends”! As we’re discovering, that’s actually a useful answer, as key to the user experience here will be the configurability of the alert notification to cover scenarios as relatively simple as your own, through to those with the complexity described by qcjims below.

      Alert notification in v2 will certainly be different to v1, and soon as we have some designs to show you, we’ll post them here and look forward to your feedback.

      Regards,
      Adam

  2. qcjims
    Posted December 7, 2009 at 10:05 pm | Permalink

    We have hundreds of physical servers that we monitor, and a majority of these hosts have multiple instances running. Since we have a fairly large environment, as well as a landscape that includes dev, test, sandbox, uat, prod, etc. environments our alert notification requirements can be complicated.

    Having a system that only sends out an email regarding alerts won’t cut it for us. We need the ability for our point monitoring systems (microsoft MOM, concord, netiq, red-gate, etc.) to execute an external command that we then pass our notification parameters to. We have created this external tool and it is what sends the required information to our alert escalation system. this alert escalation system is the destination for many other point monitoring systems. The alert escalation system we have has the definitions for who gets notified of a particular alert, what to do if an alert is not acked, de-duplication of the same alert, etc.

    -qcjims

    • Adam
      Posted December 10, 2009 at 1:17 pm | Permalink

      This does sound like a challenging environment to monitor, and it’s interesting that you’ve created your own tool to handle these complexities. We’ve seen similar environments that have also necessitated home-grown solutions. We’d really like to understand this more, and determine how we can best support these kind of requirements. I’ve sent you an email regarding this and hope we can hear more on this.

      Thanks again,
      Adam

      • qcjims
        Posted December 22, 2009 at 3:17 am | Permalink

        Adam,

        Please re-send your email. I am not sure if I used a valid email address in that previous posting. Thanks.

        • Adam
          Posted January 4, 2010 at 5:01 pm | Permalink

          Hi James – thanks for getting back in touch – i’ve resent the email – let me know if you don’t recieve it soon!

          Thanks again,
          Adam

  3. Posted March 23, 2010 at 4:52 pm | Permalink

    Just started using SQL Resposne today. So far, pretty cool stuff. One thing that I would kinda like is an escalation system for certain events. Long Running queries is a perfect example. I might like a low priority alert at a certain time, a medium priority alert if it passes the next threshold and a high priority alert with an email if it reaches a critical point. Locking / table contention is another one of these that would be very handy. The breakpoints would be nice to spot smaller problems when I have time to go through and troubleshoot some things and still be notified immediately of the larger issues that need immediate attention.

    Taking this one step further, allowing certain queries to be excluded from this list or have their thresholds changed (for example a monthly report that might take hours to run) would also be really handy.

    • Adam
      Posted April 1, 2010 at 3:07 pm | Permalink

      Thanks for your comment Seth.

      We do intend to have the granularity you suggest for the alert configuration – the ability to set multiple thresholds and criticality.

      We’re also developing ‘maintenence window’ functionality which would allow the user to exclude an instance from monitoring while you run a monthly report for example.

      Hopefully these improvements will helpyour particular requirements?

      Thanks again,
      Adam

Post a Comment

Required fields are marked *

Add an Image

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>