Disk usage on the global overview

We think we’re going down the right track with our ‘global overview’, the part of the UI designed to give you a summary of the current status across your entire system, but a lot of you are asking us to show current disk usage in addition to CPU and Memory.

Latest global overview

Latest global overview

The problem is we’re not sure how best to present disk usage information at this very top level.

There’s not really enough screen space to break disk usage down by logical volume, if your instances are split across different disks. So what do we do if you’ve got more than one disk?

Could we show the total disk used across all volumes?

So if Machine 2 in the design above had the following logical disks;

Logical disk Used by SQL Server (Instance 3) Used in total Available
C 900 900 1000
D 200 500 1000
E 0 100 500

We could show the total disk used by Machine 2 as 1500 GB and for the SQL Server as 1100 GB? To find out more information about the specific usage by logical disk you would drill down from the global overview; or would you actually be happy to only have disk metrics at a deeper level?

We’d really like to hear your opinion.

Can you modify our design to what you’d like to see?

This entry was posted in Designs, Metrics and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

9 Comments

  1. Jonathan Allen
    Posted October 9, 2009 at 10:17 am | Permalink

    At the very top level, I only need to know when logical disk usage is reaching a point where action/investigation is needed – having an area that indicates that this point is close or has been passed is sufficient. If you follow your examples for CPU and memory, but the area that has a column for each logical disk with % used above each and the column goes red at the selected tipping point. Drilling in to this chart to see details on the disk usage history, database file growth events and top n large files would then lead on to finding and resolving the issue. Not sure if I can link to an image here so also emailed to you

    • Tom.Randle
      Posted October 9, 2009 at 3:01 pm | Permalink

      Thanks Jonathan!

      I think mini bar graphs might work quite well. I wonder whether at this high level we could get away without showing the drive letters next to them?

      I’ll have to have a play around with the designs!

      • Jonathan Allen
        Posted October 12, 2009 at 11:19 am | Permalink

        Tom, its largely irrelevant which drive it is at this level. If there is a need to take action (ie there is a red column in the disk usage chart) then I will need to click the chart area to see the details and make a few investigations prior to implementing the fix.

        Once I click the chart the whole UI would be HDD stats where the logical disks statistics can be displayed in greater detail with a list of the last ten SQL Server events that were disk intensive – database growths etc.

        Having seen what caused the issue I can then choose to deal with the transaction log that has grown becuase the mirroring broke or the failed backup job that caused this and so on …

        Knowing that the logical drive X is the log files drive or drive Y is the Accounts team data drive doesnt really matter on the high level dashboard. If there is a disk problem then it needs my attention, whether that is before or after the deadlocks, CPU peaks or failed jobs alerts is my choice. Which ever of these I think is more important needs a click to get more details before I can resolve any of them. In the same way that in your image the Machine 4 Instance 5 deadlock doesnt tell me which database is involved and that is fine.

        I look forward to seeing the new design option.

        Jonathan

        • Adam.Walker
          Posted October 19, 2009 at 11:24 am | Permalink

          Thanks Jonathan. So at this level, it’s enough to show there is a disk problem, as that in itself will necessitate drilling down to see what and where the problem is.

          We look forward to showing you designs for the drilled down SQL Server overview later this week…

          Thanks,
          Adam

  2. Merrill Aldrich
    Posted October 9, 2009 at 9:54 pm | Permalink

    The disk space issue is tricky. You can have failure if:

    a. A disk fills as the result of auto-grow
    b. A disk fills for some other reason, like backup files
    c. A file fills because it is set with a fixed maximum size

    This is complicated by the fact that:

    a. It makes sense to alert by % full on a small disk or file (e.g. disk Q is 90% full)
    b. Alerting on % full is a lot less helpful on a huge disk (e.g. only 500 GB remain on your 5 TB disk! Alert! Alert!). On the other hand, what if your full backup directed at that disk is 502GB?

    Combine this with the idea of a big cluster that might well have 20 disks, and it gets to wanting a whole screen to display.

    Ideally you want a warning any time a database is likely to run out of space for any reason, which is very hard to represent succinctly (I’ve never seen a monitoring system do this well.) Many use the metric “file x has n growths remaining,” which is OK but not spectacular.

    The best idea I can come up with is to have two small areas in the dashboard, side by side. One area shows graphs of the three (or 5 or whatever) “fullest” disks according to logical disk space usage, and the other area shows graphs of the three (or 5 or whatever) fullest database files according to their (a) fixed size limit or (b) limit were they to autogrow to fill the disk where they reside.

  3. Jeff Stanlick
    Posted October 15, 2009 at 8:01 pm | Permalink

    Other than bad queries slowing down my servers and impact other processes, one of the biggest problems I get alerted on is disk space running low. On our production servers our log files are on different drives than data files and some of our servers have tempdb on its own drive as well. Much like Jonathan’s mock up, at a high level I don’t need all of the details. I merely need to know if something requires my immediate attention and then I can dig into it from there.

    It would be great if I could have some kind of minor warning threshold that changes it yellow and a critical threshold that changes it red. Better still would be to tie that into the particular configuration for that server/template. That allows me the ability to set the warning level at a different percentage or disk space per server since I know which servers have a lot of disk and which ones don’t. For example, I currently have one server in SQL Response 1.3 that sends me an alert when it gets to 80% full and another waits until it gets to 95%. One has 100GB drive and the other has a 1 TB. They each have their own template. If the dashboard could grab those levels from the template, I don’t have to pay attention to the details of which server and what the percent full is. I merely rely on the colors.

    Lastly, (I’m not sure how feasible this is in a dashboard UI) but if there would be a way to look at the server defaults for each server being monitored to find where the data, log, temp and backup directories are and display the data by unique drives would be slick. For example, a production server may have separate data, log, and temp drives while re-using the data drive for backups. A test server may have all of that on a single drive. In the dashboard I see 3 mini-bars for the production server and only 1 for the test server.

  4. John Clark
    Posted October 16, 2009 at 5:51 pm | Permalink

    I want to things on disk monitoring:

    1) I don’t want to set a limit in percentage cause I don’t think in percentage.. Lets see I have a 750GB disk and I need to know when 100GB is left so that is hmmm bring up calculator…. 100 /750…. lalala you get the idea… I just want to say alert me when I get to this much left 100GB.. or 35GB or whatever on EACH Drive on EACH Server.

    2) I need two alerts the normal low priority… it takes me 2 weeks to get disk added so normally I need to know when I have 100 GB free or less 2 weeks ahead of time.. so let me know and make it a low priority so I can ack and then create request.. The other alert is high priority level o crap my log disk blew past 75GB free and is now down to 10GB make it red and flashing cause someone is doing a single transaction update on 10M rows and I have to fix NOW..

    In shops like mine I need two levels of alerts… I have an OPS group that I would give access to the system and want them to page on RED Flashin alerts 24 hours a day… but on Yellow info alerts order disks.. I don’t even want them to see them…

    Same thing with lots of stuff… Like jobs… some job fails are yellow info ones but others are big RED FLASHING call someone now..

    You gotta be able to break down to this level for some places..

    Sorry for rollin on and on …

  5. Tom
    Posted October 19, 2009 at 10:52 am | Permalink

    Thanks again for all your great feedback!

    Merrill

    Thanks for your very useful comments and great suggestions.
    It’s an interesting problem deciding what should be exposed at the very top level of the interface and what to demote to slightly deeper in the UI. We are considering having a dedicated page for disk usage per SQL Server or machine, that will be accessible when drilling down from an overview page. This would address your requirement for “a whole screen to display” multiple disk information.
    The details regarding databases that may be running out of space in the near future will be available as part of the alert that will be raised for this issue (these types of trending-based alerts are something we’ve been talking about a lot recently, and are hoping to implement for version 2.0). In general, we hope that by including the highest-priority or most recent alerts at the top level, that any issue that isn’t automatically displayed as part of the overview data will still be catered for.
    We haven’t yet investigated file sizes yet – but we do plan to look into this issue.

    Jeff
    We are planning at v2 to implement alerts that automatically update their severity level when thresholds change.
    The dashboard colours will reflect these alert statuses, so that if you have a critical disk issue the disk column should change to red.
    We’ll have a think about how we could show information about the locations for various files for a SQL Server instance; general information about a SQL Server will be displayed on the SQL Server overview screen (per instance).

    John
    Please continue to roll – that’s really useful stuff!
    We are going to allow you to specify the limit in either percentages or GB. GB will probably be the default, and the alert will definitely be configurable per logical disk per server.
    The current plan is to have high, medium and low alerts in v2. It might be worth us having a think whether medium is really necessary. Even if we do implement a three-tiered alert system, you can of course just ignore the middle level and only set things at either “Low” or “High”.
    We totally agree with you about being able to break down to a fine level when configuring! Version 1 of SQL Response already allowed you to configure at a very granular level (per individual job, or per DB for instance) and we plan to maintain this degree of configurability at v2.

  6. Posted October 20, 2009 at 10:32 pm | Permalink

    G’day Tom – could some element of the UI for different connections be colored according to user-defined colors? It’s a handy feature in SSMS (improved in SSMS Tools Pack): http://blogs.msdn.com/buckwoody/archive/2009/10/19/color-me-informed.aspx

    Cheers, Thomas

Post a Comment

Required fields are marked *

Add an Image

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>