Customer Facing Services and the Granularity

 Product Management  Comments Off on Customer Facing Services and the Granularity
Jun 222010

According to TMForum, Customer Facing Service is a service that is represented as product offerings. This is quite explanatory. Another definition could be: Customer Facing Service (CFS) is any service that resides in the product/service catalog and interacts with the customer. A voice mail, site-to-site VPN, GSM voice, SMS etc. are all kinds of CFSs. The design of Customer Facing Services can be challenging. Inefficient designs may lead to other problems in business processes that manage these CFSs. Since our main goal is to reach to a more manageable, more automated and more flexible environment, our CFSs should be aligned with this strategy.

In this article, I will try to focus on the granularity (level of detail) of the CFSs. How granular our CFSs should be? This depends but if you want to implement “everything as a service” concept in your product/service design, you should have “very” granular services. When creating the most granular service, we should ask ourselves, “is this the least detailed sellable unit?” and “can I bind this, as standalone, to a product and bill separately?” Please note that, CFSs can be combined with other CFSs under one Product; there is no one-to-one relationship.

Ok. Let’s work on some examples now. Granular services concepts can be applied to datacenters. I have been in the design of such datacenter and two of the services we have worked were Bandwidth and Power. (Off course there were dozens of others but this two will be enough to explain the concept.)
Are these sellable and perceived by the customer? Do they, individually, represent the least detailed sellable unit? Yes. So, they are granular CFS candidates. What we can do next is to bundle these two CFSs under a Product Specification and offer it to the Market under different Product Offerings. Please note that, I still have the option to put one of these services under a different Product Specification and sell as standalone. The power of granularity comes into scene in here. Your Product Management and Marketing groups will have the option to create lots of combinations of these services to be offered which will increase our competitiveness.

The granular CFS should be sellable as standalone but this does not mean that you must sell them. For example, Power could be sold separately, but does it make sense for a datacenter? No. In theory, it could be sold but in real life you must bundle it with other granular services such as bandwidth, security, maintenance etc. and the best place to bundle related services could be the Product level. (You can also do this on the product offering level but that’s not the topic of today)

Suppose, our service provider does not have any intention to sell these services separately and launches one service named Co-Location Service. This co-location service has a monthly rate. Over bandwidth and power usages are rated and applied to the same service. Co-Location service is sold under the Co-Location Product. This is also a valid scenario but it increases complexity and decreases flexibility. Rather going this way, we could have 2 CFSs under the Co-Location Product.

Another example could be GSM Voice Call. Voice call is a granular CFS. I have seen some examples on the web saying that “receiving a voice call” is a CFS. I don’t think so. Voice call is a granular CFS which has attributes (is call receiving active, Suspended, Roaming Activated etc.) These attributes are not sellable as standalone. If they were, if you were subscribing for receiving calls, this would be a CFS but I have not seen any service like this. In the roaming scenarios for example, receiving calls are also billed. But, this is not a service that the customer buys! This is a separate rating scheme that is applied to the service under this specific condition. As I said before, the service attributes would be bind to one/more update flows (service activation flows). These flows (or operations you could say) could be exposed as different web services (but not different CFSs).

Choosing the granular services will also help you to decrease your troubleshooting time. We said that CFSs are the services that are “perceived” by the customer. But this perception is heavily impacted by your service design. Let me explain this. Suppose I am selling a product called Co-location, and in my product specification I am also saying that this product is composed of one service which is Co-Location. Whenever the customer perceives any problem, he/she will blame the co-location service, since this is the only service he/she is receiving. He/she will create a trouble ticket for this outage giving the detail “I am having a problem with my co-location service”. This is not very informative. Most of the times, it will be the service provider’s job to localize the problem. However, if the service was designed as two separate CFSs (bandwidth and power) rather than one, the customer “may” come and open a ticket on the bandwidth service with the detail “I am having packet loss problem on my line”. Your MTTRs will definitely fall.

Choosing the granular approach in your service design will bring flexibility. The service activation flows will be more manageable and troubleshooting will be easier. Choosing the CFS details is not the only task but it is a starting point before we move to the Product design area.

Fault Management

 Fault Management  Comments Off on Fault Management
Jun 142010

Fault Managers (FM) collect alarm and event information from the network elements. There are several interface types that fault managers use in order to collect this data. These interfaces are called the northbound interfaces (NBI) of the given alarm source. Alarm sources could be network elements or more often element management systems(EMS). A Fault Manager could also be an alarm source to another OSS system such as SQM, SLA Management or another Fault Manager. (In the context of manager-of-manager).

Most popular NBIs are SNMP based. These NBIs use the technology of SNMP traps to deliver the fault/event information to the target NMS system. TL1 and CORBA interfaces are also popular ones but they started to be considered as legacy. JMS is gaining popularity among the NBIs on the market.

Fault manager implementations are rather straightforward.

First, you need to identify the type of NBI you will connect to. You can learn this from the product vendor. Second, you need to collect the necessary connection parameters such as security, port number (corba/rmi could use dynamic ports), IP addresses etc. Most EMS systems will allow you to select the type of alarms/events that will be forwarded to the NBI. If you do not have any EMS and you are directly interfacing with the devices, you will have to configure the devices for alarm forwarding.

Fault Management products collect the alarms on their mediation layers where they have modules that know how to collect alarms from a specific source (device/interface type). These modules are also responsible with resynchronization of alarm information.

Resynchronization is an important concept in the Fault Management area. Devices/EMS systems forward their alarms to the fault management systems and do not care whether they have been received or not. (This is especially the case for the UDP based SNMP traps). Thus, if the network connectivity between the FM and EMS is lost, all new alarms or updates to previous alarms (clear alarms) will be lost too. Resynchronization is the process to recover the alarms after a connectivity issue. How this happens? Simple. EMS should maintain an active alarm list on it. It should also be able to provide this list to an OSS system through it’s NBI(most probably via a method call, or setting an SNMP OID value). The OSS system that takes the active alarm list will then apply a diff algorithm to find and apply the deltas to its own repository. If the EMS does not have the “active alarm list” feature, then there is no luck!

After the raw alarms are arrived to the FM platform, filtering phase starts. There may be thousands of active alarms even on small-scale networks. It is impossible for a NOC to track and manage them all. Thus, filtering becomes an essential step. Based on the customer requirements, we put filters to the alarm flow to pass only the ones we need. Aside from the simple pass/ no pass filters, there could be other types of filters that can handle specific fault scenarios such as link flaps.(link goes up and down). Let me try to explain this. When the link goes down, EMS will send an alarm. After ½ seconds, it goes up and it emits a clear alarm for the previous alarm. These conditions should be filtered as there is no need to take an action for this on the upper layers. (a notification could be send if the flaps continue).A filter could “wait” for a clear event for a specific period of time before sending the alarm to the upper layer. This could prevent the flappings to generate an alarm flood in the platform.
The next phase after the filtering is the enrichment phase. In this phase we enrich the alarm information by using external data sources. Most of the times, raw alarm values are meaningless to the NOC operator. In order for the operator start corrective actions on an alarm instance, he/she needs to get quick and usable information from the alarm. For example, if the alarm has a field named Device and its value is an IP address, NOC operator would go to a manual procedure to find the host name, region of that device before sending to the correct back-office. This time consuming manual processes should be automated on the Fault Manager. Enrichments are generally applied via custom scripts which are using the API of the FM platform.

All the filtered/enriched alarms are now ready for correlation. Correlation is the process of grouping similar alarms together to increase the efficiency of the NOC and the assurance process. You may have a look at my previous post on this topic.

The last important concept to mention is the expert rules. Expert rules are the automatic actions that are run on specific cases. An expert rule could be triggered whenever a severe alarm is received by the system or a specific text in the AdditionalText attribute is detected. The actions could be sending e-mail, SMS, creating trouble tickets or just manipulating the alarm fields.( such as changing state to Acknowledged.)

All the Fault Management systems have similar alarm interfaces where you will see a data grid and alarms inside. They also employ fancy network maps which are not usable at all.

Fault Managers have several interactions with other OSS systems such as Trouble Ticket, Workforce Management, SQM, Performance Managers, Inventory Managers etc.

The most important integration which is usually implemented first is the trouble ticketing integration. The faults should be tracked and solved quickly and trouble tickets are the instruments for that. TTs could be opened manually by the NOC operator or automatically by an expert rule.

Fault Managers are must have OSS systems. Their basic functionality is not very hard to implement however advanced features such as correlations could lead to time and resource consuming implementations.

Application Performance Managers

 Performance Management  Comments Off on Application Performance Managers
Jun 112010

In the performance management area, I have talked about network and device performance management. I should also mention the application performance management to complete the picture.

Application performance managers(APMs) track how much time an application is available and how well it meets the expected functionality. APMs have different types: Transactional Monitors, User Monitors, Application Server Monitors and Database Monitoes.

Transactional Monitors:

Typically, if you want to monitor an application, you should first determine on the use cases that will be implemented by the monitor. Use cases are a set of activities or business processes that the monitor should perform. A use case example could be:

– Open the application URL
– Enter username and password and press submit
– After successful login, click on the report 1 and wait for the report to be displayed.
– Logoff

APMs run these activities (transactions) one by one and wrote down the response times. They also check if any errors occurred during the process.

User Monitors:

Most of the APMs simulate the user behavior but some of them also “sniff” the user actions. An agent program that is installed on the user machine tracks the user actions in the form of KPIs. One of the most important KPI is the “think time” which represents the end-user’s thinking period. (between an action’s result and the next action.)

Application Server Monitors:

APMs are also able to track application server performances such as Websphere, Tomcat etc. These APMs report the application server’s performance and monitor the process stacks to find the most time consuming method calls. These statistics are extremely important if you are dealing with an in-house application and trying to pinpoint a performance degradation points.

Database Monitors:

Database specific APMs monitor the well-known databases( Oracle, DB2, MS SQL Server etc.) and their performances.

APM statistics should be correlated along with the server (OS Level) statistics. An end-to-end view will also require the network related statistics and customer experience management statistics(from active probes). At the end of each monitoring, a set of KPIs are exposed. These KPIs are fed to performance management systems and SQM systems to be further analyzed.

Discovering the Network Cloud

 Inventory Management  Comments Off on Discovering the Network Cloud
Jun 112010

Service providers started moving to IP based technologies in their infrastructures because of the flexibilities offered by these technologies. We have seen all-IP projects all around the world where traditional voice, data and transmission infrastructures are transformed to IP based infrastructures. All-IP also means sending the traditional assets (NEs and management systems) to trash. Thus, big operators try to use a transition approach to move to all-IP. In this approach, they are still using traditional technologies (such as SDH/SONET) in the backbone and put IP on the top of them. However, this reduces throughput as these technologies add extra overhead. Operators that have vast amounts of free capacity may not care this throughput issue now but it will definitely be a problem in the future.

Traditionally, the transmission environment was dependent on the circuit based technologies. SDH, PDH, SONET, ATM, Frame Relay are those type of technologies where you construct logical circuits on the top of physical infrastructure. Looking from the OSS perspective, modeling a circuit based technology is easy. This is because, it is predictable. For example, we will definitely know that the VC-12 low-order circuit that starts from point A will end at point B. On its way it will be transported over several high-order VC4s that are running on physical STM-1, 4, 16 circuits.

In our fault managers it is “easy” to load this hierarchy to a correlation engine and run root-cause analysis algorithms to find the root cause of a network problem. Or, if we take the ATM case, the PVCs are deployed on predefined paths which are also a good candidate for topology based root cause analysis. Frame relay likewise…

When we move to IP, things become more complicated from the logical inventory perspective. The root from point A to point B is no longer predictable. If one of the links to that direction fails, the router will, hopefully, find another way around to send the packet. This dynamic behavior put a barrier in front of the traditional topology based root cause analysis.

In order to do root-cause analysis in packet-based dynamic networks, we need another approach. That is Real-Time Topology Discovery. Real-Time Topology Discovery uses the same techniques of any auto discovery process. The difference is that it does this more frequently. There are 2 approaches I have seen up to now. Virtual Routers and Routing Protocol Listening.

Routing Protocol Listening, utilizes the “topology table” feature of linked-state routing protocols (such as OSPF and IS-IS). Devices (routers) that implement a link-state routing protocol maintains two tables in their memory: Routing Table and Topology Table. Devices first generate their topology tables by listening routing protocol updates. After the topology table is generated, they apply a best-path algorithm to determine the best paths to the network destinations. These best paths are inserted into the routing table to be used in the routing decisions.

Routing Protocol Listeners “sniff” these topology related conversations between the devices to construct a real-time topology of the network. This topology can then be used in root cause analysis. Some network optimization tools use this technique to populate the network topology where they can later run some what-if scenarios on the top. This topology can also be enriched by some SNMP queries to have some other information that are not exchanged by routing protocols. (Such as link utilizations)

The second approach is similar to the first one in some sense. In this approach you create an in-memory clone of the device on your management server. For each device/VRF, you create another instance. You copy the initial configuration of the real device on the virtual one and then start listening alarms/events from the real-device.
This way, you have the near real time topological view of your network where you can base your analysis.

First approach is more real-time but this applies to link state protocols only. If you are using a distance vector protocol, for example, you have to pick the second option.

In today’s world where we started to work more with intelligent NEs and network protocols, Real-Time Topology Discovery solutions will definitely find their places in the next generation OSS.

Jun 092010

Prepaid Billing systems, (which can also be called charging systems in day-to-day talks), run the billing process for the prepaid customer accounts.

They heavily use the online charging mechanism rather than offline charging that we typically see in the postpaid billing systems. Online charging systems rate the usage on-the-fly and charge the user account immediately.

Prepaid billing environment is composed of several different systems. And the most important one is the charging system. Prepaid Charging systems are integrated with the switches (MSC, PSTN), SMSC, MMSC, GGSN, Interconnect and other VAS applications. Those service infrastructure equipments are configured to ask an authorization from the charging manager before proceeding to give a service.

Charging systems communicate with (or employs) several other systems to charge the customer usage. One of them is the rating system which calculates the cost of the usage. Rating systems maintain the tariffs, discounts and campaign related information.

Charging systems ask rating systems the cost (in terms of dollars, credits etc) of the usage (volume, time or event based) and then try to apply it to the customer account balance. They do this by communicating with a balance manager. Balance manager systems manage the customer account balance.

There are 2 mechanisms that charging managers use in order to debit from the account balance:

First mechanism is the “direct debit” mechanism. In this approach, balance manager directly debits a pre-defined amount. For example, a debit is applied each 30 seconds (For those 30 seconds.) If you terminate the session at 25th second, the remaining 5 seconds are rated and refunded back to the account.

The second mechanism is the “unit reservation” mechanism. Here, rather directly debiting from the account, you reserve some of the balance. When the call is complete (or SMS, MMS has reached it destination) the reserved amount is committed.

Voucher management is also an important function that needs to be mentioned about. Voucher managers manage the lifecycle of the voucher that are used to credit the customer account balances. Therefore, voucher managers have a direct interface with the balance managers. (and possibly with other systems such as CRM, Fraud managers etc.)

Charging platforms also employ IVR functionalities. These are usefull to inform the user about the current balance, crediting the balance and notifying the user before any call drop due to insufficient balance.

(If the charging system cannot do the charging, due to some reasons, it outputs a CDR file including the session details. This CDR is post processed and applied to the account balance.)

Charging manager, balance manager, rating manager, voucher management and IVR can be sold separately or combined in a single “prepaid billing” platform.

Operators around the world started consolidating their prepaid and postpaid billing platforms. This is called convergent billing. This approach reduces the complexities, license/hardware costs and operational expenses. I’ll try to comment on convergent billing in another post.