This reactive, primitive approach is no longer welcomed by today’s corporate customers. These customers expect more reporting capabilities that will also include outages that they are not aware of.
If the customer has a current SQM process, this could be an enabler for an effective customer SLA management system. When a process name includes “Customer” in it, you should be careful. If you show an outage to the customer in a report, you give them the option to claim a rebate. Therefore, we, as CSPs, need to adjust the outages after the outage has already occurred.
Most SQM systems will not allow you to edit already occurred outage events. Therefore, these events should be exported to another system(NW DWH, EDWH ) for further correlation with force-major outage information.
If the CSP has not invested in an SQM solution before, they could utilize a Network Analytics (NW DWH) solution but this will not provide all the benefits that an SQM system brings. The best combination would always be SQM and Customer SLA Management solutions working together in a fully managed reporting environment.
OK, but how can we correlate force-major outage with a service outage information coming from a network probe for example? Or the question should be, how can we identify the force-major? The key component in here could be the service management platform where the problem management processes have been implemented. SQM systems will auto populate resource facing incidents for proactive recoveries. These incidents can after be inspected for a common root cause and if the root cause of the outage is not because an operator fault, it can be excluded in the force-major list.
As the level of maturity increases in the operators, they focus more on improving the quality of services they give. As I explained in my previous posts, SQM (or sometimes call BSM-Business Service Management) is the key to measure the quality of the overall service. In SQM, we model the service and the service is composed of multiple service components. These components can be HLR, Charging, Core Network, Packet Network, Applications, Databases etc. and each of them is managed by a functional unit in the organization.
In an SQM”less” scenario, from the top-down approach, when a service problem is identified on the very top level, it is questioned the source of the problem. Identifying the root source of the problem may be time consuming. (The departments will most probably blame each other). SQM can pinpoint the source and take necessary actions that increases the effectiveness. But who runs the SQM?
According to eTOM, SQM belongs to the Service Management & Operations functional grouping, so in best practice, it is advisable to assign a department for this process. (eTOM does not mandate it should be separate. This could be a role that can be assigned to an existing department. But as we will see, it won’t be effective to consolidate functions)
A new term, Service Operation Centers arise with the introduction of service quality management concept. Service Operation Center or SOC is an organizational “department” that monitors the quality of the overall service and take the necessary actions in the case of service degradations and outages. The main data source for the SOC screens will be the SQM. The operators in the SOC will continuesly monitor SQM and coordinate with other departments to decrease the MTTR of the service outages.
A typical operator has a Network Operation Center or NOC inside it’s organization. This NOC, manages the NMS systems, monitors faults and events, track the performance of the network and troubleshoot the problems at the first hand. (L1 support). However, as the name implies, the main purpose of the NOC is to manage the network.
The network is only one part of the service. There are other components from IT. There may even some components from outside the organization such as content. NOC’s primary responsibility is to deal with those network specific complex problems. They should not communicate with the content provider to resolve a problem.
Because a network service provider’s main product is “network”, up to now, NOCs were sufficient to achieve overall assurance activities. But, as the services get more diverse and complex, SOC concept became much more logical.
As SOC deals with cross-functional teams, they should be sponsored by a upper level organizational entity to be effective. SOC should also have necessary interfaces to the other units (most likely the TT system) where strict OLAs are applied to. The people in SOC should include experts with skills in networking, IT and other necessary topics to streamline the troubleshooting activities.
SLA (Service Level Agreement) is a contract between the service provider and the customer. This contract makes commitments about the service’s quality that is perceived by the customer.
There are 2 main types of SLAs: Customer and Operational. Customer SLAs are the ones that are sold to the customers. Operational SLAs are also two types: OLAs and underpinning SLAs. OLAs are internal to the provider. For example they are the commitments that are agreed between two departments. Underpinning SLAs, on the other hand, are signed with suppliers/partners of the provider.
SLA Management starts with the business commitment. As SLAs need the full support of the organization involving multiple functional groups, there should be a common understanding about the vocabulary, methodologies and consequences.
A typical SLA Management process includes the following 5 steps:
1- ) Creating the SLA Template
SLA templates define the SLAs that will be negotiated with the customers. They include service level indicators (SLIs) that will be monitored such as availability (downtime), reliability (exp. MTBF) and maintainability (exp. MTTR).
Typically SLA Templates also include service levels such as gold, silver, bronze that indicates the acceptable, offered SLI values. A gold service level may say %99,99 availability and 1 hour MTTR while a silver one may commit on %99,98 availability with a 2 hour MTTR. Service levels SHOULD be decided with the cooperation among the marketing and the operational teams. I have seen some examples in which the service levels are decided by the marketing teams (most probably with values that the competitors are committing on) and mandated to the operational teams. The operational teams however were complaining that those values were almost impossible to be maintained.
SLAs should be designed with care as they have a direct interface with the customers and have financial impacts. Service levels also limit the “expectation creep” and set the acceptable targets.
There are other parameters of the SLA templates; terms and conditions, penalties, roles and responsibilities, calendar to name a few.
2- ) Negotiate the SLA
This step mainly belongs to the sales area. In this section the provider and the customer works on the SLA Templates to construct the “customized” SLA that aligns with the customer’s business. In this step, the customer, hopefully, selects a service level that suits the needs. However, customers may (and generally do) want to have some SLI commitments that do not match any service level in the template. The reasons could be several. For example, the customer may be running a business critical service and the committed SLI values may not satisfy the customer. Another example would be the case of aligning OLAs/underpinning SLAs with customer SLAs. (I will explain this in a later post).
Sales can agree on any values with the customer to gain the customer. We should avoid this situation. All the custom service levels should be negotiated with the operations before the contract is signed by the customers.
3- ) Monitor the SLAs
SLAs should be monitored and violations and degradations should be notified. After the contract is signed, SLA instance is created within the SLA Management tool (and the SQM tool if it is separate). This step is the service quality monitoring step and it is mainly targeted to the operational teams of the provider. There may be a customer interface for the customer to see the current accumulated downtime / incident records but this is real time and exposing this to the customer is not chosen by most of the providers.
4- ) Report the SLAs
SLA Reports should be generated at the end of reporting periods. The reports should not directly be sent to the customer from the tool as they may have financial impacts. There should be a control mechanism on the provider side before they are “published”. The customer should be given an interface to the SLA Tool to see his previous SLA reports. (If featured by the tool)
5- ) Review the SLAs
SLAs and their parameters should be reviewed occasionally and service levels should be fine-tuned.
SLA Management is a complex process that involves multiple tools, organizational units and the customers. There is a lot to talk about the SLA Management. I will continue writing about SLA Management to explore more details on specific areas.