SLA (Service Level Agreement) is a contract between the service provider and the customer. This contract makes commitments about the service’s quality that is perceived by the customer.
There are 2 main types of SLAs: Customer and Operational. Customer SLAs are the ones that are sold to the customers. Operational SLAs are also two types: OLAs and underpinning SLAs. OLAs are internal to the provider. For example they are the commitments that are agreed between two departments. Underpinning SLAs, on the other hand, are signed with suppliers/partners of the provider.
SLA Management starts with the business commitment. As SLAs need the full support of the organization involving multiple functional groups, there should be a common understanding about the vocabulary, methodologies and consequences.
A typical SLA Management process includes the following 5 steps:
1- ) Creating the SLA Template
SLA templates define the SLAs that will be negotiated with the customers. They include service level indicators (SLIs) that will be monitored such as availability (downtime), reliability (exp. MTBF) and maintainability (exp. MTTR).
Typically SLA Templates also include service levels such as gold, silver, bronze that indicates the acceptable, offered SLI values. A gold service level may say %99,99 availability and 1 hour MTTR while a silver one may commit on %99,98 availability with a 2 hour MTTR. Service levels SHOULD be decided with the cooperation among the marketing and the operational teams. I have seen some examples in which the service levels are decided by the marketing teams (most probably with values that the competitors are committing on) and mandated to the operational teams. The operational teams however were complaining that those values were almost impossible to be maintained.
SLAs should be designed with care as they have a direct interface with the customers and have financial impacts. Service levels also limit the “expectation creep” and set the acceptable targets.
There are other parameters of the SLA templates; terms and conditions, penalties, roles and responsibilities, calendar to name a few.
2- ) Negotiate the SLA
This step mainly belongs to the sales area. In this section the provider and the customer works on the SLA Templates to construct the “customized” SLA that aligns with the customer’s business. In this step, the customer, hopefully, selects a service level that suits the needs. However, customers may (and generally do) want to have some SLI commitments that do not match any service level in the template. The reasons could be several. For example, the customer may be running a business critical service and the committed SLI values may not satisfy the customer. Another example would be the case of aligning OLAs/underpinning SLAs with customer SLAs. (I will explain this in a later post).
Sales can agree on any values with the customer to gain the customer. We should avoid this situation. All the custom service levels should be negotiated with the operations before the contract is signed by the customers.
3- ) Monitor the SLAs
SLAs should be monitored and violations and degradations should be notified. After the contract is signed, SLA instance is created within the SLA Management tool (and the SQM tool if it is separate). This step is the service quality monitoring step and it is mainly targeted to the operational teams of the provider. There may be a customer interface for the customer to see the current accumulated downtime / incident records but this is real time and exposing this to the customer is not chosen by most of the providers.
4- ) Report the SLAs
SLA Reports should be generated at the end of reporting periods. The reports should not directly be sent to the customer from the tool as they may have financial impacts. There should be a control mechanism on the provider side before they are “published”. The customer should be given an interface to the SLA Tool to see his previous SLA reports. (If featured by the tool)
5- ) Review the SLAs
SLAs and their parameters should be reviewed occasionally and service levels should be fine-tuned.
SLA Management is a complex process that involves multiple tools, organizational units and the customers. There is a lot to talk about the SLA Management. I will continue writing about SLA Management to explore more details on specific areas.
Hi Balkan, I appreciate if you could define the acronyms like OLA, MTTR etc..
Hi,
MTTR is Mean Time to Repair. It is a metric that shows your average incident repair time.
MTBF is Mean Time Between Failures. It is a
metric that shows how reliable your network is.
OLA is Operational Level Aggreement.