Implementing Service Quality Management
As the customers depend more on service providers, they started demanding the commitment on the quality of the services that they receive. The SQM (Service Quality Management) concept came into practice to fulfill this demand. SQM is a set of practices to ensure that the customers receive the service quality that they need. The goal is to manage (negotiate, monitor, report, predict/review, take action) the quality/level of the services that are provided to the customers.
Service Quality Management is implemented in several steps. I try to apply the following 8 – step methodology in the SQM projects that I have involved.
1-) Decide on the services to be monitored
In this step, we decide which services will be monitored by the SQM process. These services are mainly customer facing services that are perceived by the customers. We may also want to monitor the services that we receive from 3PPs.
2-) Design the service model
The services have components and the components have parameters. Service components also have dependencies between them. All of these aspects construct the “model” of the service. Service model is an hierarchical structure of elements. That’s why it is sometimes called the service tree. The backbone of the SQM is the service models. Service models use the concept of status propagation in order to find a lower level service component’s impact on the service which resides at the up most level. This analysis through status propagation is also called service impact analysis.
Designing the service model is the most important step of an SQM implementation. It requires domain knowledge, therefore should involve all the functional units that are responsible for managing different components of a service. Before designing the service model, we should decide on the granularity of the service. This is a very important step of the service design. The granularity of a service defines the number of service components in the service model. When this number gets bigger, we end up with more complex service models. Service models that have lots of service components become unmanageable and may lead to scalability problems on the SQM tools.
After deciding the level of granularity we will use, we can start “drawing” the model. Most SQM tools provide drag drop based service designers for the creation of new models.
3-) Decide on the monitoring points and data sources
After designing the service model, the next step is to “feed” this service model with data, collected from several points on the infrastructure. The data sources for the SQM will typically be other OSS tools mainly Performance Management, Fault Management and Trouble Ticketing systems. Active or Passive probes that provide end-to-end network and application performance should also be introduced where available.
The data sources will provide us raw-data which should be mapped to KPIs. From the KPIs, we may choose to develop secondary parameters (KQIs). Those KPIs/KQIs should then be mapped to services/service components in the service model.
4-) Design the data collection
After we define the raw data to model the mappings, the next step is to define the rules about polling, aggregation and no-data-policy. Polling intervals define the granularity of the downtime data so the intervals should be as small as possible. However, polling frequently will lead to performance problems.
5-) Design the thresholds and the actions that will be applied on those thresholds
In order to play a proactive role, the provider should be notified on service level degradations and violations. Thresholds are the tools that enable proactive monitoring. Setting the correct thresholds requires domain and application knowledge. The thresholds should also support the SLAs that will be given to the customers. In SQM, a specific service component parameter may be assigned several levels of thresholds for the same KPI. Typically there will be one violation threshold and multiple degradation thresholds. The thresholds manipulate the status of the service components which in turn propagates up to the service level.
Whenever a threshold is breached, an action should be triggered. This action could be very simple such as sending a notification via email or complex such as triggering a traffic engineering script. It is a good practice to open trouble tickets (performance degradation reports in eTOM terms) in the TT systems to trigger a process that will take corrective actions on the service quality degradations.
6-) Create Customer SLAs and/or OLAs
SLAs are the drivers of the SQM and for a complete solution; we should introduce customer SLA’s or OLAs. SLAs and OLAs are negotiated with the customer/supplier/partner/internal departments. SLA violations will cause penalties. In order to prevent SLA violations, proactive thresholds should also be put on the SLA parameters. Customer SLAs should be supported by OLAs. Therefore, assigning the right thresholds for OLAs is very important.
We may also assign thresholds on the SLA / OLAs. The SLA and OLA parameters are sometimes called service level indicators and they are negotiated and listed in the SLA/OLA contracts.
7-) Design the Service Quality Reporting
The service quality data that is created by the SQM systems are distributed by service quality reports. Different reports should be designed for users that have different perspectives. For example, an executive summary report may include some statistics about total SLA violations and degradations. While the summary reports may provide enough detail for the upper management, different departments in the organization may be interested with more detailed, technical reports.
The reports should be distributed automatically on daily/weekly/monthly basis. The automatic report distribution should be internal and the SQM reports should not be sent to the customer directly, without a control mechanism.
8- ) Getting the commitment from the organization for the SQM
SQM should be implemented by separate functional groups namely service operating centers or SOCs. The SOCs have the end-to-end visibility of the services and they are customer aware. SOC constantly monitors the services just like the NOC monitors the resources. Whenever a service problem or degradation is detected, the SOC should take the responsibility and coordinate the necessary actions. This may involve communicating with the NOC (or IT) for the resolution of the resource problems.
SQM is beneficial but it also brings more stress and extra work for the operational staff. This will lead to resistance to change. That’s why SQM requires commitment from upper levels of the organization. SQM brings additional vocabulary. Necessary trainings should also be provided to the operational teams to avoid unnecessary confusions.
There are several products on the market that does the SQM. Each of them has strengths and weaknesses. As we can see from the 8-step methodology, rolling out an SQM implementation could be a very time consuming process. Therefore the tools that will support the SQM process should be flexible, scalable and easy to implement. For example, one tool may require you to “compile” the service model before you start monitoring it. This will require vendor involvement or you should maintain more skilled staff. Looking from the scalability perspective, one tool may provide you a very rich service quality monitoring dashboard but it lacks of scalability on the total number of the managed objects in the service tree, or the number of service instances. Selecting on the right product is the key to successful implementation.