Mirroring: What do you want to monitor?

There are many technologies available at the software and hardware level to help achieve high availability on mission-critical servers. We’d like your views on what you’d expect a monitoring tool to tell you about for all these various issues. To kick things off, we’d first like your requirements for monitoring mirrored environments.

When would you expect a monitoring tool to alert you?

Here are some ideas on when SQL Response might trigger a mirroring-based alert:

  • When the mirroring state of the principal or mirror database changes to either DISCONNECTED or SUSPENDED
  • When roles change, eg. when the mirror becomes principal or vice versa
  • When the mirroring witness is not connected (if witness is configured)
  • When, in the event of a failover, the estimated time a mirror database will take to finish a redo and become available is longer than x minutes, or when the size of the redo queue is more than x KB in size
  •  When the estimated catch-up time* is longer than x minutes, or when the Log Send Queue is larger than x KB

*Catch-up time is the time it will take for the mirror to catch up with the principle.

What information would you like in the alert details?

As well as the cause of the alert and relevant metrics, what basic information about the mirror would you like to see in a raised alert:

  • Name of the principal, mirror and witness server and their current status.
  • Operating mode (High Safety, High Performance or High Safety with automatic failover)

Which performance counters are useful for you?

  • On the principal: Log Bytes Sent/sec
  • On the principal: Log Send Queue KB
  • On the principal: Transaction Delay
  • On the mirror: Log  Bytes Received/sec
  • On the mirror: Redo Bytes/sec
  • On the mirror: Redo Queue KB

Is the information mentioned above enough to monitor mirroring? Or are there things you’d like to monitor which we have not mentioned here? We’d love to hear your views.

This entry was posted in Alert and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. Jonathan Allen
    Posted September 22, 2009 at 8:58 am | Permalink

    OK, so I’ve been off to the HDD metrics part 2 entry and grabbed the comments I made about mirroring there are the 4 key stats that I watch currently are: Send Queue, Redo Queue, Average Delay and Time Behind. Yes, knowing whether its connected or not would be good too.

    Would the monitoring switch to the mirror in the event of a failover?

    Knowing the bandwidth being taken up by the mirroring would be most useful, we keep getting it in the neck from the network team that we are causing slowdowns etc. Especially if it can always show that we are only using 10% of the whole line ;)

    In our environment we suffer with peeks and troughs of activity – there are nightly data updates that take place a 5am that maybe update 10k records and that throws mirroring behind for 20 – 30 minutes. I would love to see a way of not having these events fire an alert – can we have multiple schedules for the alerts to allow for different cycles of activity. Let me guess, the next blog post will be about scheduling and trending jobs and recommendations!!!!?

  2. Priya.Sinha
    Posted September 22, 2009 at 9:21 am | Permalink

    Jonathan,

    Thanks a lot for your comment. It is very useful.

    Regarding switching the monitoring to mirror in the event of failure — yes that would be our aim.

    I must confess that I hadn’t thought about bandwidth usage so thanks for pointing this.

    Regarding the multiple schedules for alerts and recommendations, we had some discussion about it internally and we are thinking of introducing something similar. So it is really nice to get your views on this. Hopefully nearer the time one of us will blog about it.

    About the next blog …. we dont know yet … its a surprise :)

    Priya

Post a Comment

Required fields are marked *

Add an Image

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>