“Smarter” Mediation Platforms to Achieve Customer Experience Management

 Mediation, SQM  Comments Off on “Smarter” Mediation Platforms to Achieve Customer Experience Management
Oct 172015
 

Today, we will revisit an important function in the telecommunications OSS/BSS space: Mediation.

As we know, the primary purpose of legacy mediation has been to collect usage-related files from the network, extract data from them, do transaction-based enrichments and output them to other BSS platforms (which have been primarily billing and charging systems; but more recently business intelligence and other applications). This limited usage of mediation technology is changing.

The mediation layer has been with us since the beginning of the telecommunications industry. From the very start, its primary technological requirement was to maintain scalability, availability and integrity so that the provider would not lose any money. To cope with the increasing number of subscribers, evolving network technologies and more diversified network services, mediation vendors have increasingly had to focus and challenge themselves to improve their hardware and software processing capabilities.

Some mediation vendors stopped at this point: when getting the CDR from the network elements and passing it to the northbound platforms was achieved as efficiently as possible. However, the data that is exchanged between these platforms can provide invaluable insights to the Communications Service Providers (CSP) beyond just billing and charging. That is because the data is related to the end customer and with some additional correlation and enrichment, it can provide a premium input to any Customer Experience Management and Network intelligence initiative too. So, if you have a very fast, highly configurable and scalable mediation platform, why not utilize it for this purpose and more?

The mediation vendors who have seen this opportunity have re-oriented their product cores so that they accommodate functionality that is increasingly directed to adding value at the customer layer. (More limited vendors have integrated their software with third parties to try to come up with solutions that achieve the same goal, but honestly that is not the same thing.) What more innovative vendors have done is essentially to add advanced programmatic and configurable correlation capabilities on top of the performance-proven data collection layer. They have also improved their northbound interfaces as their ” clients”  will not just be billing and charging but, rather, feeding other IT systems that will expect to communicate in IT-like way. These include CRM platforms through APIs, CEM platforms, Performance Management platforms and customer self-care portals.  All these, and others, will require a new generation of database feeds (such as Big Data ones), APIs and Web Service integrations.

The smartest mediation companies have also expanded their reachability by supporting real time data flows. We know that traditional mediation utilizes a store-and-forward mechanism (batch processing) which introduces delays. These delays are acceptable for postpaid billing purposes but if the CSP wants to have real-time information about its customers (and moreover to act on the findings the information reveals), store-and-forward is ineffective. If you want to execute real-time actions on data you need to be able to process the data-in-motion. Injecting dynamic processing logic into the collection layer is one way to achieve this and mediation has become an important answer to the question of “how?”

To elaborate more on what’s going on in this area, I will talk about a vendor whose software supports all the “smart” features that I mentioned above.

DigitalRoute, a Swedish company with over 350 telco customers worldwide positions itself as an OSS/BSS mediation and policy software provider/ISV and its newest “smart mediation” product is called OSS Mediation. The product is built on a technology that has a relatively long history – the company has 16-plus years experience under its belt, and its version control has now reached 7.2.

Its technology conforms to TMForum standards and DigitalRoute frequently participates in Catalyst projects to contribute to standardization studies.  Apart from its standard mediation functionalities, its MediationZone base platform technology has some interesting features make it stand out. Here are some of the unique, “smart” features that are worth elaborating on.

Workflow Capabilities

OSS Mediation is built around a workflow mechanism that is primarily used for designing (through configuration) the required data exchange paths. It provides a graphical workflow environment where the system user is able to drag & drop the flow elements and associate them in sequence with each other. The flow elements themselves are called “agents” and different kinds of agents exist to handle different functions such as collection, analysis, or distribution. An example: Aggregation Agents are agents that can aggregate, correlate and consolidate different flavors of data sources. If you want to combine two sources to output an aggregate result (such as one addressing a KPI), then you insert an Aggregation Agent or even the company’s own proprietary KPI Management agent into the workflow. If you want that KPI to be written in a DB, then you add an agent type called a Forwarding Agent to the workflow sequence.  Agents can be easily configured by the system user to reflect the business rules required. This kind of flow design definitely delivers faster times-to-market.

Real Time Support

OSS Mediation handles real time data sources. As noted, the product is designed to cope with both batch data (CDRs, XDRs, logs, etc) and real time data flows (Active-Passive network probes, AAA systems; basically, socket-based performance data from any network node.) The union of batch and real time workflows is managed in two different ways according to the users preference, by proprietary DigitalRoute technology.  The alternatives are:

  • Inter Workflow Agents: where different data streams are combined by an agent which Is configured to integrate a real time data feed into a batched output format. Handling convergence this way can probably be described as the “traditional” approach.
  • Workkflow Bridge: This propriety DigitalRoute technology works the other way around (in a manner of speaking), incorporating batched feeds into real time output flows. It enables high speed connectivity between different workflows and allows large volumes of data to be processed simultaneously, i.e. it can be used for scaling out on several CPU’s. The advantages of this approach include high availability due to using multiple batch receivers which avoid other, less reliable end points.  A batched workflow can also be seen as a service that is available to other workflows, making it easier to move data to its final destination despite multiple processing services.

It is also worth mentioning that OSS Mediation is configurable to enable integrated with all well-known network probe vendors and this reduces its deployment and development times dramatically. Combined with the workflow capabilities described above, data-in-motion can easily be converted into valuable real-time actions.

Data Persistence and Summarizations

With its advanced data summarization capabilities, OSS Mediation is a good candidate to be an alternative to a probe analytics platform. It also has a KPI Management Layer that allows the user to define individual KPIs that are then collected in tree-like-structures. This means the user can create what can be thought of as “smart” or proactive (rather than reactive) KPIs. The support of multi-dimensional structures enables OSS Mediation to be utilized as an SQM or Customer Experience Management Platform that is able to build KQIs all the way from raw data.

– Big Data & Cloud Integrations

Since network data volumes are not only huge but also fast growing, it is not really feasible any longer to store data on expensive disks such as SANs. That’s why most CSPs are investing in Big Data technologies to store their network-related data. OSS Mediation has built-in integration with these outlets, for example Hadoop. It also has pre-integration with Cloud platforms such as Amazon WS. With DigitalRoute pre-processing in front of Big Data, Big Data solutions can easily become “Smart Data” enabled.

– Integration & Alerting Support

Since OSS Mediation is also a mediation product, it comes with lots of off-the-shelf integrations with network devices, probes, EMS systems, Billing and charging systems etc.  On the top of these, OSS Mediation also provides a RESTful Web Services layer which enables it to integrate with other platforms such as those that handle Customer Experience Management. It also comes standard with some industry approved alerting options such as SNMP.

OSS Mediation from DigitalRoute represents a very good example of what the next generation mediation platforms will look like. Today’s “customer aware” Communication Service Providers will definitely want to utilize products like OSS Mediation to understand their customer usage behaviors. Armed with an end-to-end experience insight, service providers will eventually enhance the quality of the experience they deliver and that will lead to increased profits and reduced churn.

– More Detailed Info For OSS Mediation can be found in the below links. –

Learn more about DigitalRoute’s OSS Mediation by clicking here

Download an analyst report by Stratecast at no charge, “The Platform of the Future,” here

View a deployed OSS Mediation for Customer Experience Use Case here

Importance of Parallel and Local Measurements in Web Monitoring

 Cloud, Performance Management  Comments Off on Importance of Parallel and Local Measurements in Web Monitoring
Apr 172015
 

Every company that relies on web business should invest in web monitoring platforms. Web monitoring platforms connect to a web site and measure KPI’s such as DNS resolution time, page download time and page consistency. (Checking a specific header or content value). These synthetic transactions, that are run by the probes, help to identify server reachability.

The term “reachability” is important in here as it is not the same as “availability”. There may be some cases where your web application and it’s dependent infrastructure (Web server, DB server, Application server etc) seems to be running smoothly from your side but not the customer’s. This is usually due to routing problems on the network and problems on the remote DNS server.

It is important to know these downtime scenarios when supporting your customers. In some situations you may even take some corrective actions such as guiding users to change their DNS server settings or even opening a ticket to the remote ISP for investigation.

There are 2 important selection criteria to consider when investing in web monitoring service.

First, the service should have local probes. If your business resides in Istanbul/Turkey but your probe resides in Philadelphia/US, the response times or availability calculations may not reflect the truth. Suppose the country has a problem reaching Internet. Your probe will notify you about a downtime. However most of your local users will still be able to reach you.

Second, the service should do parallel calculations. This is for covering the load balancer scenarios. Load balancers will typically work in a round robin fashion to distribute the load across a web server farm. So, if you measure 1 time and the current web server on the queue does not have any problems, you will measure the service as “up”.

However, the next server on the pool may suffer from performance problems or even downtime. If you make at least 3 measurements at relatively same time, you would catch individual server problem within a pool. This is a very important feature you would be seeking when deciding on a web monitoring tool.

Local and parallel calculations will help you identify web server problems and troubleshoot them more quickly.

Raw Data Sizing

 Performance Management  Comments Off on Raw Data Sizing
Jan 012014
 

I have been asked multiple times how much disk space will “a” performance management system take on given business requirements.

Well, this depends first on the implementation of the data schema. One system could persist just the name-value pairs in the raw data file, other system introduce extra columns such as last poll time, unit etc. and ask for more space.

Most PM systems maintain the raw records for a period of time in order to summarize data. If for example the first summarization is on the hourly level, 1 hour raw data must be maintained. After the summarization, the raw data can be purged. An the higher level summarizations will utilize the summarization 1 step below.

Because of the reasons such as late data arrival or manual data insertion, PM implementations maintain the raw data at least one day. This is because, it will need raw data for the recalculation of the summarizations. Regulatory reasons may also force the retention period of the raw data.

But how much disk space will be occupied if we retain 1 day of raw data? The math is simple and I will try to explain it below. However you have to take into action some side factors.  For example the solution can utilize file compression which can reduce the size required down to 20%.

A very rough sizing excluding the compression factor is below:

The requirements of the customer is: 100 devices on the network. Each device has at least 3 interfaces. 1 month of raw data retention.

So lets begin;

1 KPI must occupy at least 2 columns in the database/flat file excluding metadata:
KPI Id Column: Integer: 4 Byte
KPI Value Column: Double: 8 Byte
Total: 12 bytes per KPI

Device Based KPI Count: 10 (CPU Utilization, Memory Utilization, Uptime etc.)
Interface Count:3
Interface Based KPI Count: 10 (Throughput, Utilization, Speed, Packet Loss, Delay, Queue etc.)
Device Count: 100

10 + 3*10=40 KPIs per device.

40 KPI takes (12 bytes each)=480 bytes per device poll.
For 100 device; 48000 bytes for the whole network poll= 48KByte.

48K * 60 * 24 = 69120 KBytes=~ 70 MByte per day =~ 2,1 Gbyte per month(raw data)

Please note this would be the minimum. PM may need much more space in order to maintain the retention mechanism. You should be in close contact with your vendor for the correct sizing. But I advise you to do a quick math by yourself and compare the results with the vendor’s before sending the purchase order for the disk arrays.

Justifying Your OSS Investment

 General, Strategy  Comments Off on Justifying Your OSS Investment
Nov 302013
 

OSS have been seen as the cost center since it was born. That’s why the business has been less motivated to invest in it.
We, as OSS professionals have struggled to justify the investments in terms of operational excellence, improved quality, increased productivity. These has been used multiple times for the justifications however I assure you these are not interesting the sponsors anymore.

To justify our OSS, we need to convert it to a product. We have to convert OSS functions to revenue generating functions and start selling them either as standalone products or add-ons.

Here are some areas where you can collect revenue from your OSS investment.

– Advanced Reporting Platform (OSS)
– Pay as You Grow Services (BSS)
– Advanced Notification Services (OSS)
– Customer based correlation (business) rules (OSS & BSS)
– Customer SLA Management (OSS & BSS)

Other ideas? Please reply on this topic or join the Linked-in discussion. Your valuable comments are always appreciated.

SIEM

 Other, Security, SQM  Comments Off on SIEM
Nov 302013
 

SIEM stands for “Security Information and Event Management” and it is a well known OSS system in the security world. It is not much visible to other domains because it has been used mainly for internal purposes.

Today, I am going to talk about what SIEM is, and elaborate on possible uses of it.

Every system, servers (VM, Hypervisor, Physical), routers, switches, access points, firewalls, IPSs produce logs. These logs can be huge. To process all these logs we should talk in big data terms but this is not todays’ topic. So lets’ decrease the scope: To the access logs level.

The access logs on a system (Login/Logout requests, Password change requests, FW dropped access) should be collected for security purposes. Who connected where?, who took exception in the login process, who sweeps the IP addresses in the network? who was dropped on the Firewall/IPS? (The “who” portion can be the real identity of the user (via integration with AD/LDAP).)

The SIEM system’s, first goal is the store these logs in the most effective way. Since the log data can be high in terms of volume and velocity,the archiving system should be a specialized one and utilize fast disks and/or in-memory databases.

After the log collection, the SIEM’s second goal can be achieved: Log correlation.

In the Log correlation phase, SIEM system will correlate the logs from multiple sources to combine under a single incident. The correlation can be rule based or topology based. SIEM system for example take a connect event and look for the destination IP address in the blacklist (C&C Control center db etc) database. If there is any match, an alarm will be created in the system that can be forwarded to TT or FM systems. Or it can directly generate an alarm in the case of a login failure to a critical system. These are good candidates for rule based correlation.

A topology based example could be: If 3 login failed alarms are received from the same system then assign a score to this server. If the same system had a Firewall block in the same day, enrich the score. If more than 2 of the servers in a specific server farm have decreased to low scores than generate an alarm. This is a simple example for a statefull pattern that can be identified by SIEM.

(By the way, some operators do not have the desire to generate security related alarms. The main driver for a SIEM could also be reducing the logs to a human manageble level for further review by the security staff.)

Alternatives are limitless but managing and maintaining this mechanism could be a burden for the security department.

I see 2 problems with SIEM investments: First one is the maintenance. Security personnel are not generally familiar with the OSS and their integrations. They can provide the rules but will not be able to (or have time to) implement those on the SIEM.
So, they will rely on the OSS department (if any) which will not know anything about security. The miscommunication may lead to problems and under utilization of the investment. A solution to this problem could be outsourcing of implementation to the vendor. The vendor solutions are generally “empty” in terms of pre-defined rules so each rule should be built from scratch. The cost for the implementation could grow dramatically.

Second problem is overlapping of functions. For example, your goal is to be notified when a system tries to connect to a well-known bot center. This requirement can be achieved by SIEM but also with other “cheaper” security components. Or if you have an SQM, why not consider using it if your topology based correlation requirements are less?

When investing on a SIEM you should elaborate if you would be able to fully utilize the system, as this OSS component is generally a not cheap one.