“Smarter” Mediation Platforms to Achieve Customer Experience Management

 Mediation, SQM  Comments Off on “Smarter” Mediation Platforms to Achieve Customer Experience Management
Oct 172015
 

Today, we will revisit an important function in the telecommunications OSS/BSS space: Mediation.

As we know, the primary purpose of legacy mediation has been to collect usage-related files from the network, extract data from them, do transaction-based enrichments and output them to other BSS platforms (which have been primarily billing and charging systems; but more recently business intelligence and other applications). This limited usage of mediation technology is changing.

The mediation layer has been with us since the beginning of the telecommunications industry. From the very start, its primary technological requirement was to maintain scalability, availability and integrity so that the provider would not lose any money. To cope with the increasing number of subscribers, evolving network technologies and more diversified network services, mediation vendors have increasingly had to focus and challenge themselves to improve their hardware and software processing capabilities.

Some mediation vendors stopped at this point: when getting the CDR from the network elements and passing it to the northbound platforms was achieved as efficiently as possible. However, the data that is exchanged between these platforms can provide invaluable insights to the Communications Service Providers (CSP) beyond just billing and charging. That is because the data is related to the end customer and with some additional correlation and enrichment, it can provide a premium input to any Customer Experience Management and Network intelligence initiative too. So, if you have a very fast, highly configurable and scalable mediation platform, why not utilize it for this purpose and more?

The mediation vendors who have seen this opportunity have re-oriented their product cores so that they accommodate functionality that is increasingly directed to adding value at the customer layer. (More limited vendors have integrated their software with third parties to try to come up with solutions that achieve the same goal, but honestly that is not the same thing.) What more innovative vendors have done is essentially to add advanced programmatic and configurable correlation capabilities on top of the performance-proven data collection layer. They have also improved their northbound interfaces as their ” clients”  will not just be billing and charging but, rather, feeding other IT systems that will expect to communicate in IT-like way. These include CRM platforms through APIs, CEM platforms, Performance Management platforms and customer self-care portals.  All these, and others, will require a new generation of database feeds (such as Big Data ones), APIs and Web Service integrations.

The smartest mediation companies have also expanded their reachability by supporting real time data flows. We know that traditional mediation utilizes a store-and-forward mechanism (batch processing) which introduces delays. These delays are acceptable for postpaid billing purposes but if the CSP wants to have real-time information about its customers (and moreover to act on the findings the information reveals), store-and-forward is ineffective. If you want to execute real-time actions on data you need to be able to process the data-in-motion. Injecting dynamic processing logic into the collection layer is one way to achieve this and mediation has become an important answer to the question of “how?”

To elaborate more on what’s going on in this area, I will talk about a vendor whose software supports all the “smart” features that I mentioned above.

DigitalRoute, a Swedish company with over 350 telco customers worldwide positions itself as an OSS/BSS mediation and policy software provider/ISV and its newest “smart mediation” product is called OSS Mediation. The product is built on a technology that has a relatively long history – the company has 16-plus years experience under its belt, and its version control has now reached 7.2.

Its technology conforms to TMForum standards and DigitalRoute frequently participates in Catalyst projects to contribute to standardization studies.  Apart from its standard mediation functionalities, its MediationZone base platform technology has some interesting features make it stand out. Here are some of the unique, “smart” features that are worth elaborating on.

Workflow Capabilities

OSS Mediation is built around a workflow mechanism that is primarily used for designing (through configuration) the required data exchange paths. It provides a graphical workflow environment where the system user is able to drag & drop the flow elements and associate them in sequence with each other. The flow elements themselves are called “agents” and different kinds of agents exist to handle different functions such as collection, analysis, or distribution. An example: Aggregation Agents are agents that can aggregate, correlate and consolidate different flavors of data sources. If you want to combine two sources to output an aggregate result (such as one addressing a KPI), then you insert an Aggregation Agent or even the company’s own proprietary KPI Management agent into the workflow. If you want that KPI to be written in a DB, then you add an agent type called a Forwarding Agent to the workflow sequence.  Agents can be easily configured by the system user to reflect the business rules required. This kind of flow design definitely delivers faster times-to-market.

Real Time Support

OSS Mediation handles real time data sources. As noted, the product is designed to cope with both batch data (CDRs, XDRs, logs, etc) and real time data flows (Active-Passive network probes, AAA systems; basically, socket-based performance data from any network node.) The union of batch and real time workflows is managed in two different ways according to the users preference, by proprietary DigitalRoute technology.  The alternatives are:

  • Inter Workflow Agents: where different data streams are combined by an agent which Is configured to integrate a real time data feed into a batched output format. Handling convergence this way can probably be described as the “traditional” approach.
  • Workkflow Bridge: This propriety DigitalRoute technology works the other way around (in a manner of speaking), incorporating batched feeds into real time output flows. It enables high speed connectivity between different workflows and allows large volumes of data to be processed simultaneously, i.e. it can be used for scaling out on several CPU’s. The advantages of this approach include high availability due to using multiple batch receivers which avoid other, less reliable end points.  A batched workflow can also be seen as a service that is available to other workflows, making it easier to move data to its final destination despite multiple processing services.

It is also worth mentioning that OSS Mediation is configurable to enable integrated with all well-known network probe vendors and this reduces its deployment and development times dramatically. Combined with the workflow capabilities described above, data-in-motion can easily be converted into valuable real-time actions.

Data Persistence and Summarizations

With its advanced data summarization capabilities, OSS Mediation is a good candidate to be an alternative to a probe analytics platform. It also has a KPI Management Layer that allows the user to define individual KPIs that are then collected in tree-like-structures. This means the user can create what can be thought of as “smart” or proactive (rather than reactive) KPIs. The support of multi-dimensional structures enables OSS Mediation to be utilized as an SQM or Customer Experience Management Platform that is able to build KQIs all the way from raw data.

– Big Data & Cloud Integrations

Since network data volumes are not only huge but also fast growing, it is not really feasible any longer to store data on expensive disks such as SANs. That’s why most CSPs are investing in Big Data technologies to store their network-related data. OSS Mediation has built-in integration with these outlets, for example Hadoop. It also has pre-integration with Cloud platforms such as Amazon WS. With DigitalRoute pre-processing in front of Big Data, Big Data solutions can easily become “Smart Data” enabled.

– Integration & Alerting Support

Since OSS Mediation is also a mediation product, it comes with lots of off-the-shelf integrations with network devices, probes, EMS systems, Billing and charging systems etc.  On the top of these, OSS Mediation also provides a RESTful Web Services layer which enables it to integrate with other platforms such as those that handle Customer Experience Management. It also comes standard with some industry approved alerting options such as SNMP.

OSS Mediation from DigitalRoute represents a very good example of what the next generation mediation platforms will look like. Today’s “customer aware” Communication Service Providers will definitely want to utilize products like OSS Mediation to understand their customer usage behaviors. Armed with an end-to-end experience insight, service providers will eventually enhance the quality of the experience they deliver and that will lead to increased profits and reduced churn.

– More Detailed Info For OSS Mediation can be found in the below links. –

Learn more about DigitalRoute’s OSS Mediation by clicking here

Download an analyst report by Stratecast at no charge, “The Platform of the Future,” here

View a deployed OSS Mediation for Customer Experience Use Case here

SIEM

 Other, Security, SQM  Comments Off on SIEM
Nov 302013
 

SIEM stands for “Security Information and Event Management” and it is a well known OSS system in the security world. It is not much visible to other domains because it has been used mainly for internal purposes.

Today, I am going to talk about what SIEM is, and elaborate on possible uses of it.

Every system, servers (VM, Hypervisor, Physical), routers, switches, access points, firewalls, IPSs produce logs. These logs can be huge. To process all these logs we should talk in big data terms but this is not todays’ topic. So lets’ decrease the scope: To the access logs level.

The access logs on a system (Login/Logout requests, Password change requests, FW dropped access) should be collected for security purposes. Who connected where?, who took exception in the login process, who sweeps the IP addresses in the network? who was dropped on the Firewall/IPS? (The “who” portion can be the real identity of the user (via integration with AD/LDAP).)

The SIEM system’s, first goal is the store these logs in the most effective way. Since the log data can be high in terms of volume and velocity,the archiving system should be a specialized one and utilize fast disks and/or in-memory databases.

After the log collection, the SIEM’s second goal can be achieved: Log correlation.

In the Log correlation phase, SIEM system will correlate the logs from multiple sources to combine under a single incident. The correlation can be rule based or topology based. SIEM system for example take a connect event and look for the destination IP address in the blacklist (C&C Control center db etc) database. If there is any match, an alarm will be created in the system that can be forwarded to TT or FM systems. Or it can directly generate an alarm in the case of a login failure to a critical system. These are good candidates for rule based correlation.

A topology based example could be: If 3 login failed alarms are received from the same system then assign a score to this server. If the same system had a Firewall block in the same day, enrich the score. If more than 2 of the servers in a specific server farm have decreased to low scores than generate an alarm. This is a simple example for a statefull pattern that can be identified by SIEM.

(By the way, some operators do not have the desire to generate security related alarms. The main driver for a SIEM could also be reducing the logs to a human manageble level for further review by the security staff.)

Alternatives are limitless but managing and maintaining this mechanism could be a burden for the security department.

I see 2 problems with SIEM investments: First one is the maintenance. Security personnel are not generally familiar with the OSS and their integrations. They can provide the rules but will not be able to (or have time to) implement those on the SIEM.
So, they will rely on the OSS department (if any) which will not know anything about security. The miscommunication may lead to problems and under utilization of the investment. A solution to this problem could be outsourcing of implementation to the vendor. The vendor solutions are generally “empty” in terms of pre-defined rules so each rule should be built from scratch. The cost for the implementation could grow dramatically.

Second problem is overlapping of functions. For example, your goal is to be notified when a system tries to connect to a well-known bot center. This requirement can be achieved by SIEM but also with other “cheaper” security components. Or if you have an SQM, why not consider using it if your topology based correlation requirements are less?

When investing on a SIEM you should elaborate if you would be able to fully utilize the system, as this OSS component is generally a not cheap one.

Mar 222012
 

Today, I want to talk about a new trend that seems to popped up in the SQM/CEM field: Mobile Device Agents.

Mobile Device agents are software components that reside on user devices and collect statistics about the quality of user experience which will enable the operator to act upon service degradation. Operator can also have the same data correlated with service quality data to plan future service improvements.

Device agent term is fairly new for the mobile industry. However, this is not the case for the fixed line. In fixed line, operators have been collecting metrics about the given end-to-end service for years. These metrics are collected from CPE(Customer Premises Equipment) devices (mostly routers and L2 switches) that reside on the customer premises. Ideally, but not necessarily, these devices are also managed by the operator, taking the name Managed CPE. By utilizing data coming from these CPEs, operators are able to measure not only the core network health, but also the Access side.

In order to increase user perceived quality,  service providers continuously seek new datasources that will give clues about the customer’s service perception. Customer usage data can be collected in several places:

–          Probe systems

–          DPI systems

–          Device Agent systems

Probe system and DPI system can provide the top most visited URLs, throughput/speed kind of statistics that will give clues about the service usage. Probe systems can additionaly provide call drop statatistics and catch device configuration errors.

Device agents can do both. But, they also provide device related information such as signal strength or battery status. They even can tell which software along with their versions are installed on the phone.

If we collect all this data (usage + device + signalling) and correlate successfully, we can do lot’s of customer experience related analysis with it. We can detect that a specific service usage drop from the DPI system and correlate this with mobile phone configuration errors.  The dropped calls can be correlated with device battery information to see if the dropped call has occurred because of a device problem. In some cases where the operator has not done any investment to DPI and probe systems, just the Device Agent system can provide all that data.

But why device agents are not so popular? First answer is the privacy. Most people will not want agents on their phones that are sending their usage patterns to somewhere else. There are not so many regulations around this but we should expect to see them soon.

The second answer is more technical. The agents consume processing power and drain the batteries soon. In order to get rid of this, agents should not always be on-line, and collected statistics should be uploaded in relatively longer intervals (a couple hours). That late data cannot be utilized by SQM systems so it can only be used for late correlation and planning purposes.

Device agents use push mechanism and upload their statistics to a central server where further correlation and reporting functions can be executed. However, because of the reasons I have provided, they cannot be real-time data sources which are required by most SQM/CEM systems.

Who are the Service Owners?

 Product Management, Service Management, SQM  Comments Off on Who are the Service Owners?
Dec 282011
 

As we enter the challenging environment of service management, one question arises naturally. We are managing the services but who owns them within the organization? The answer should be obvious but surprisingly it is not that simple for most operators.

In a typical SQM / Service Management project, we have to interface with the service owners. Service owners are the people who are accountable from the service and they know every details of the service. They are accountable from the technical performance of the service as well. Service owners, appear in IT CAB meetings, new product design processes etc. When compared to product managers, they are more technically involved in the service.

Readers who are familiar with the ITIL terminology will recognize the abbreviation, “RACI”. RACI stands for Responsible, Accountable, Consulted, Informed and it is used to describe the roles of the stakeholders within an IT organization. RACI is mostly applied to IT processes. However, it can be applied to products and services as well. According to ITIL;

– Responsible is the person who “does” the service. He/she is the person who executes the process/service.
– Accountable is the person who “owns” the service. This person is also called the service owner in different contexts.
– Consulted is the person who “knows” information about the service. The consulted person provides feedback about the service execution, also he is consulted in certain cases by the service responsible or accountable. The communication is bi-directional.
– Informed is the person who is “kept in the loop”. Informed people do not provide feedback to the other parties in the RACI.

Now, lets look at who could take the roles that are defined by the RACI.

Service Performance impacts Product Performance therefore the product managers should not be accountable for it, instead, they seem to be consulted. Product Management should be consulted about the changes that will be applied on the service as it directly impacts the product performance.

The responsible would be the operations, who runs the service. They are the people who fulfill, assure and bill it. As you can see, there are multiple groups that are responsible of the service.

But who is the accountable person for the service? It seems there should be a role which is missing in the chain. There needs to appear some “service managers” who are the real owner of the service in context. The service managers could be a separate functional group or they may carry additional responsibilities. This layer could be seen as a functional group that “overlays” on the top of the current functional organizational structure.

Each service component in the service tree can also be a sub service, either customer facing or resource facing. These services should also have owners and for the technical, resource facing ones, these would typically be individual departments within the operations. But for the customer facing services the “service manager” would again be required as the service authority.

It is important to differentiate the SOC (Service Operation Center) with the service owners. Service Operation Centers are a sub function of the monitoring function and they watch the service performance and orchestrate the necessary actions for service continuation. They watch OLAs and can notify the responsible departments about the violating conditions. (but they don’t push or force their manager’s anyway)

In today’s operator environment, it seems that the assurance departments (NOCs or SOCs) have owned the services naturally. This is not the right place as these departments do not have the authority for cross- organizational decisions. The “Service Manager” role will be more and more important in the process where the operators become more and more customer centric.

From SQM to Customer SLA Management

 SLA Management, SQM  Comments Off on From SQM to Customer SLA Management
Nov 102011
 
Customers will expect you to deliver the quality of service you have committed in the presales phase. Most of the operators, however, fell short on delivering this expectation and service outages occur every time.Most operators who does not trust their current network, do not implement any customer SLA management process. Lacking of an end to end view, these operators just accept trouble tickets coming from the customer side. Also since, most telecommunications regulatory bodies enforce customer SLA management processes, CSP calculates the SLAs based on customer facing trouble ticket information.

This reactive, primitive approach is no longer welcomed by today’s corporate customers. These customers expect more reporting capabilities that will also include outages that they are not aware of.

If the customer has a current SQM process, this could be an enabler for an effective customer SLA  management system. When a process name includes  “Customer” in it, you should be careful. If you show an outage to the customer in a report, you give them the option to claim a rebate. Therefore, we, as CSPs, need to adjust the outages after the outage has already occurred.

Most SQM systems will not allow you to edit already occurred outage events.  Therefore, these events should be exported to another system(NW DWH, EDWH ) for further correlation with force-major outage information.

If the CSP has not invested in an SQM solution before, they could utilize a Network Analytics (NW DWH) solution but this will not provide all the benefits that an SQM system brings. The best combination would always be SQM and Customer SLA Management solutions working together in a fully managed reporting environment.

OK, but how can we correlate force-major outage with a service outage information coming from a network probe for example? Or the question should be, how  can we identify the force-major? The key component in here could be the service management platform where the problem management processes have been implemented. SQM systems will auto populate resource facing incidents for proactive recoveries. These incidents can after be inspected for a common root cause and if the root cause of the outage is not because an operator fault, it can be excluded in the force-major list.