Top 5 Managed Service Provider SLA Metrics to Look For
Service level agreements (SLAs) between your organization and the managed service provider (MSP) that you’re contracting with can be crucial for managing expectations and setting the ground rules for your engagement with them.
With a clearly-defined set of SLAs, you know what to expect from your MSP and have a clear list of service goals to hold them accountable to. When working with an MSP, it’s important to verify that their SLAs include relevant metrics that are clear-cut and enforceable.
In this blog, we’ll discuss why tracking SLAs is important and some of the top metrics to look for in an SLA document.
Why Do You Need an SLA with Your Managed Service Provider?
So, why are SLAs a “must-have” for any engagement with a managed service provider? As noted by CIO.com, an SLA “defines the level of service you expect from a vendor, laying out the metrics by which service is measured, as well as remedies or penalties should agreed-on service levels not be achieved.”
The SLA document in your service contract is what your MSP is guaranteeing to you as the level of service that they’ll provide. If you don’t have an SLA, you don’t have a formal agreement on what kind of service the vendor should be providing that you can hold them to. In that case, if a disagreement comes up later, you would have a difficult time recovering any potential losses from the vendor’s failure to meet expectations (because they didn’t actually set any).
An SLA is your insurance against potential carelessness or neglect by a managed IT services provider. If they don’t measure up to the metrics they set forth in the document, then you can impose the appropriate penalties listed for the violation. A common penalty example is a fee reduction for failing to meet certain incident response deadlines or other important metrics.
It is incredibly important to, when you engage with an MSP for any kind of IT service, review their SLA documents to verify what they’re agreeing to provide to your company. If you’re expecting a specific type of service, but don’t see an SLA addressing it, then you may want to address it before signing the service agreement!
Top 5 SLA Metrics to Look for in a Managed Service Provider
If you’re contracting with an MSP, there are a few different SLAs you’ll want to look for.
The specific SLAs you should expect may vary depending on the IT services you contracted for with an MSP, but some common examples to look for include:
1. Time to Respond/Response Time
If something happens to your IT network or you send a request to your IT service provider, how quickly will they respond? Long response times can create unacceptable delays in your managed IT services, which can be harmful if it causes excessive downtime.
For example, say that a critical IT asset, such as your main database, suffers a major failure that brings it offline. If your MSP’s response time metric to detect and respond to downtime is >5 minutes, then you can minimize the impact of the downtime. Note that this is a separate metric from time-to-remediation. Here, the MSP is only providing a guarantee of how quickly they’ll start responding to a situation, not how quickly the situation will be resolved.
2. Time to Notification
How quickly will the service provider notify you of an IT incident and what they’re doing to resolve it (or if they’ve already resolved it)? With automated notification solutions, the time to notify for IT incidents should be extremely short—within an hour or less. MSPs that assemble manual incident reports might take longer to put together an analysis of the threat and provide details about their remediation efforts.
MSPs might only send immediate alerts for what they consider “major” IT incidents—leaving minor things like routine maintenance and software updates to a regular weekly report note saying something like “updated XYZ firewall tool” or “fixed a programming error in Y database causing added latency.” This way, you aren’t flooded with non-critical notifications every time a minor IT issue crops up.
3. Time to Restore
After identifying a major IT issue or outright failure, how long does it take for the MSP to restore your IT infrastructure to normal operation? This is another SLA that might be variable depending on the specific issue being fixed and which MSP services you’re contracted for.
For example, say your business’s primary data center experiences a sudden corruption of all its files—possibly because of malware or a malfunction in the operating system’s filing system. This would render your data unavailable to you—creating massive issues with productivity and compromising your ability to get work done. Depending on the IT services you’ve contracted for, your MSP could:
- Format the Affected Storage Media and Redownload from a Backup. If you have a remote data backup service from your MSP, they can simply reformat the storage media of the affected media to clear them (or remove the compromised drives and replace them entirely) and download the backup of your data. The time to restoration of this can vary depending on how much data needs to be backed up and the speed of your internet connection. The more data you need to redownload and the slower your internet, the longer it will take to restore normal operation. For example, if you need to download 47.81 TB of data (the average amount of data an SMB has according to HubSpot) over a 300 Mbps connection (the average needed by a 15-20 person company according to Verizon), it would take about 354 hours to complete the backup (because one terabyte is equal to 8 million megabits). MSPs often shave this time down by only backing up “mission-critical” data—the bare minimum data necessary to ensure that business operations can continue.
- Spinning Up a Duplicate Production Environment. Some business continuity and disaster recovery (BCDR) solutions involve creating a separate production environment that is a complete copy of your primary data center/server that can be brought online in case of emergency. This solution can be activated at the push of a button once the primary data center is rendered unusable—ensuring that fastest time-to-restore SLA of any of these recovery options (typically measured in minutes instead of hours or days).
4. Scope of Services
What IT assets and services is the MSP providing, exactly? It’s important to get in writing precisely which IT assets and services the MSP is providing, as any assets or services not specifically called out in the SLA document may not be covered by the MSP.
For example, is the MSP providing maintenance services for your company’s servers? If so, what kind of maintenance and which servers? Does the SLA document say “all servers owned by [your company]” or does it list each server individually? If it lists individual servers, would a server added in the future be covered or would you need to renegotiate your managed services agreement? A scope of services SLA helps answer all of these questions so that you don’t have to worry about it.
In many cases, adding more hardware and software to manage would require an adjustment to the scope of services document and possibly incur extra costs for the added labor needed to maintain the new asset.
5. System Uptime
How much of the time will your IT assets be available while they’re being managed by the MSP? This SLA often requires a thorough examination of your existing IT assets before the MSP can provide a solid guarantee, since existing wear and tear, presence or lack of backup systems, and other factors may affect the stability (and thus, the availability) of the systems in question. For cloud-based services provided by the MSP, the uptime SLA may be standardized, but for your own internal IT assets, the uptime/availability SLA may vary.
Here, watching for how many decimal points are in the uptime percentage is important, since there’s a world of difference between an uptime SLA of 99% (3.65 days of downtime per year), 99.9% (8.78 hours of downtime/year), and 99.99% (52 minutes of downtime/year).
These are just a few examples of how the specific managed services you have can impact your time-to-restore SLA. In many cases, having a spare production environment is the best way to guarantee that you can return your operations to normal after a major security breach. However, it isn’t proof against all kinds of incidents.
Do you need help protecting your business from cyber threats and setting up a robust IT infrastructure that can weather the storm? Reach out to Converged Technology Group today to get started!