What metrics do you use? Part one: Memory

One of the key features of SQL Response v2 will be the ability to monitor a whole set of metrics in real-time across your enterprise from physical servers, SQL Servers, disks, network and so on.

There’s a huge amount of information that we can and will display in SQL Response v2, but we’d like to know what data matters most to you. Key metrics will be displayed graphically on the Overview pages, that show selected real-time information about your physical server or SQL Server; they will also be used to trigger relevant alerts.

Over the next few weeks, we’ll post lists of the counters that we think are most useful in a number of categories. To get the ball rolling, we’re going to start with memory counters.

Suggested Counters

  • Memory available for physical server

    Amount of physical memory currently available to processes running on the computer (in MB)

  • Memory available for SQL server

    Amount of physical memory currently available to SQL Server (in MB)
    If memory is allocated to SQL Server then this equals Memory allocated to SQL Server – Total SQL Server Memory Used. If memory is not allocated to SQL Server then this value is the same as the memory available for the physical server.

  • Memory Paging / Sec

    Rate at which pages are read from or written to disk (number of pages per second).
    This counter is the sum of Pages Input / Sec and Pages Output / Sec.

  • Total SQL Server Memory Used

    Amount of RAM the SQL Server service is currently using.
    Includes the total of the buffers committed to the SQL Server BPool and OS buffers of the type “OS in Use”.

  • Target SQL Server Memory

    Amount of RAM the SQL Server requires to operate efficiently, based on the number of buffers reserved by SQL Server when it is first started.

  • Buffer Cache Hit Ratio

    How often SQL Server accesses the buffer rather than the hard disk to get data, as a percentage.

  • Buffer Manager Free Pages

    The total number of free buffer pages in the free buffer list.

  • Buffer Manager Page life Expectancy

    The number of seconds that a buffer page stays in the buffer before being written to disk.

Are these counters useful to you? Are there are any other memory counters that you look at, that we’ve not mentioned here? Let us know!

In Part two we’ll ask you to comment on the disk-based counters we’re looking to implement.

This entry was posted in Metrics. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

9 Comments

  1. Tom
    Posted September 1, 2009 at 3:59 pm | Permalink

    You can read more about Memory counters here

    http://msdn.microsoft.com/en-us/library/aa905152(SQL.80).aspx

  2. PDinCA
    Posted September 3, 2009 at 8:00 pm | Permalink

    I have a large number of tables PINned in memory. Can you include a stat for memory consumed by them, if that’s even available… (Running SS2008 SP1 Enterprise and NOT a DBA!)

  3. Pavle Subarevic
    Posted September 4, 2009 at 9:29 am | Permalink

    I mainly use available memory, buffer cache hit ratio and mem pages/s which you cover here. However I often get people asking me what the “correct” values are (such dashboards are attractive not only for DBA’s) and I think it would be great to have an indicator showing the best practice or standard value, the current measured values and the variation from the past x months, days or hours. The latter would help detect unusual trends without having to manually compare with past stats. This implies archiving stats over time which might be a problem for a temporary dashboard (I don’t know how you planned your monitor)…
    Regards, Pavle

    • Jonathan Allen
      Posted September 4, 2009 at 10:49 am | Permalink

      I certainly agree that the suggestion of a correct/preferred level of any counter is most useful. Knowing that something is 90% on your server is less use that not knowing at all if you dont know whether it should be at 4% or 100% …

  4. Brian Harris
    Posted September 4, 2009 at 11:33 am | Permalink

    Thanks for the great comments, guys.

    Re: setting the ‘correct’ or preferred level for a counter. Our current thinking is that for each counter, you will be able to set various thresholds, above which that metric is flagged as being a currently ‘active’ alert – at low, medium or high. This status will be displayed on the dashboard, and can be configured per server. So, if one SQL Server shouldn’t be using more than 10% of the physical memory, and another is not a problem at 100%, you can set the alerts for each, and the dashboard will reflect that.

    We’re also thinking about showing the last x mins of any metric as a sparkline (a mini line graph) on the dashboard and possibly, allowing you to configure this timescale. So you could see the last 24hrs for a counter, or just the last 5 mins. (I think this answers your point, Pavle, about identifying variation – we are intending to store these statistics for as long as possible to allow you to view historical data and therefore look at trends over time).

    Clicking on the memory counters on the dashboard will take you to a dedicated memory page for the selected server or SQL Server, where you will have more options on what to view and be able to see more graphs, and see trends in more detail etc. At least, that’s our thinking for now.

    • Mark Allison
      Posted September 4, 2009 at 3:00 pm | Permalink

      Having the ability to view trends further back than 24 hours would also be useful right within the dashboard. Also, a reporting tool that trends over months or years would be great.

      I also like to look at page life expectancy and the memtoleave setting when troubleshooting memory issues. Memory problems on 32 bit servers can be alleviated by playing with memtoleave setting. Don’t forget procedure cache hit ratio also.

    • Iain
      Posted September 8, 2009 at 3:50 pm | Permalink

      I like the idea of setting active alerts on a per server basis as really you need to make decisions based on your baseline or trend. It would be great to alert on the deltas – say my page life expectancy reduced by 30% from the same time the day before.
      Also i think its important to retain good baseline data, perhaps up to a year but at least 3 months. The counters would need to be averaged out but its useful for predicting scalability

  5. Merrill Aldrich
    Posted September 18, 2009 at 11:39 pm | Permalink

    Another vital counter on memory stressed servers is Memory Grants Pending — it’s a count of queries waiting for execution memory. Usually happens with bigger BI/warehouse / type workloads, but when it happens its nasty, and acts just like blocking.

    Other than that I like page life expectancy and paging in a dashboard scenario; the rest of the counters are interesting but not needed at the “top level” view.

  6. Brian Harris
    Posted September 22, 2009 at 12:01 pm | Permalink

    Thanks for all the excellent suggestions. The issue of data retention is one we’re taking very seriously; clearly it’s constrained by limitations on storage space and the number of instances/machines your’re monitoring, but we’re very aware of the requirement to view trends over a reasonable length of time.

    Various deviation from baseline type alerts will be available in Response v2. We haven’t finalised yet exactly what these are, but we’ll post them here in due course when we look at alerting in more detail.

    One point to bear in mind is that if the overviews aren’t quite what you want to see, we’re planning to allow you to create completely customised dashboards. These are pages to which you can add whatever metrics you want, from pretty much anything SQL Response will monitor. So you can piece together a view of your servers that matches exactly what you care most about.

Post a Comment

Required fields are marked *

Add an Image

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>