In-Flight Order Management

 Order management  Comments Off on In-Flight Order Management
Sep 092016

In Order Management, an in-flight order means an order that is amending or cancelling an already running order. The reasons differ. After getting the order, your customer may call you back and say he wants a new order item, or he may want to cancel the order.

Both of the operations (revise or cancel), needs some checks before accepting the order. The first and most important check is PoNR or ‘Point of No Return’ check. In Order Management, PoNR is an Order state. After that state is set, the Order cannot be cancelled or amended. Most of the cases, PoNR is the network activation or shipment and it is there to protect the provider from extra costs. For example, you may have a network activation order where 10 branch locations should be provisioned and activated. 2 of these locations does not have any connectivity at all, so a fiber cable should be laid to the customer location. This work is typically done by 3rd part supplier/partners who are contracted on an hourly basis. Also for the digging work, expensive permits may be required from regulatory bodies. In a much simpler scenario, a retailer may have shipped the goods and they may on their way. There may be no way to cancel the shipment at that moment.
PoNR is communicated with the customer at the order capturing time, so the customer is aware of it.

After the PoNR is checked, cancel and revise operations can be validated. For cancel, no validation is necessary. Typically, the customer will contact the contact center and ask for cancellation. The order id that is acquired at the order submission time will be used to cancel the in-flight order. After the ‘Cancel Order’ is received by the OM platform, the running workflow is signaled and if implemented, all the rollback scripts are run before setting the order state to ‘Cancelled’.

Revise operation is a little bit trickier as it requires a catalog validation step. The amendment order may be breaking some catalog rules so before submitting it from the order capturing platform (or after the submission, by order manager), the validation rules should be executed. After that point, the in-flight order is cancelled and a new order is created with the same order id as the previous one. The changing property is the revision number of the order. The revision is incremented by 1, indicating that the Order had some changes in the past. It should be noted that different OM vendors may be implementing different strategies here. The preferable way is to keep the order id same for consistent customer experience, however, some platforms may not allow doing so. It is better to check the implementation details with the OM vendors before designing your end to end flows.

Aug 302016

In OSS, we use the polling concept often to pull statistics and configuration data from the devices. If the devices we are dealing with are implementing the pull based protocols such as SNMP or FTP, we cannot get rid of this.

All types of polling processes come with a polling period. If I have 100 routers and a polling period of 5 minutes, each and every 5 minutes I will have to connect each device and pull the necessary KPIs to be injected into my DataMart.

If you look at the CPU and Memory utilization of a performance management server (poller) during the process, you will see high peaks at the start of the polling periods. If we follow the 5 minutes polling example above, we will see the peaks at the minutes, for example, 0,5,10,15,20,25,30,35,40,45,50,55. If your polling period is 5 minutes, you have 5 minutes to finish your job. If it exceeds that period, you will fall into data consistency issues. As the node and KPI count increase, you have to throw more hardware to finish soon. (For each device connection, we will most probably want to open up a separate thread until we hit the point of diminishing returns)

Considering the whole collection process does not occupy the whole 5 minutes’ period, the remaining period will be wasted in the waiting state for the server. Since the hardware configuration was designed for the peak times, our server will remain to be “expensive”.

Assigning a polling time to a specific node is the key to this problem. In this approach, we divide the polling period to sub-periods. So, if the polling period is 5 minutes, we can divide it like:

10 nodes Zeroth second of First Minute, 10 nodes Thirtieth second of First Minute, 10 nodes Zeroth second of Second Minute, 10 nodes Thirtieth second of Second Minute…

Here we put 10 nodes into each 30 seconds timeframe, to finalize polling of 100 nodes in 5 minutes.

We also need to consider the speed of these nodes. Some nodes will suffer performance problems due to weak hardware configuration or high load. The response time of those may exceed the 30 seconds timeframe.

In order to cope with this problem, we should also consider putting the slowest responding nodes to the earliest sub-frames. This way, a node’s polling can “extend” to the next subframe and can still be finalized in the given 5 minutes. This, of course, requires you to maintain a continuous baseline of node response times at the server side.

Splitting the polling period and distributing the nodes wisely to the sub-periods will help you to reduce your hardware costs.

Aug 152016

Today’s topic is about the Network Sweeping and how it can be optimized. As you may know from the previous topics, sweeping means searching a subnet by attempting to connect to each and every possible IP addresses it has.  Usually, the initial protocol is ICMP due to its’ low overhead. (In that case, the sweep is called Ping Sweep). SNMP and even HTTP interfaces are also used as sweep protocols.

Sweeping is used in different domains, such as;

  • Security
  • Inventory Management
  • Performance Management
  • Configuration Management

Sweeping could be time and resource consuming (both for sender and receiver side). That’s why, for most enterprise customers, it is normally done daily.

For large networks, it may take hours to complete a sweeping process. Consider the scenario of sweeping a class C IP subnet. (It will have at least 254 IP addresses.). Also, suppose that only 10 devices exist in that subnet. I am supposing I will be using ICMP for discovery. That is the simple ping request and at least I need to send 2 ICMP packets to be sure that there is a device there. (50% packet loss still means the remote side is up)

For the reachable devices, the round-trip ping time should not exceed 5ms. Considering we have 2 ICMP packets, it would be 10ms per check. We have 10 devices and it would take around 100ms which is well below 1 sec. That’s a great performance if you just consider pinging the “up” devices. But what about the remaining 244 down ones?

ICMP timeout kicks in when dealing with the dead devices or vacant IP addresses. ICMP timeout is the duration in milliseconds for the ping software will wait until an ICMP echo reply package arrives. If the packet does not arrive within that period, it will report it as “down”. The default timeout for ICMP in Cisco routers is 2 seconds. So, using the defaults, if you use 2 seconds as the timeout, for 2 packets in the test, you will have to wait 4 seconds per test. If we do the math, the total wait time for the class C subnet on hand would be 976 seconds, roughly 16 minutes. Organizations that rely on sweeping normally have much bigger subnets with thousands of possible IP addresses. The sweeping process would take hours in such kind of networks.

Luckily, we can tweak this process so it will take less time.

1: Use of Parallel Measurements:

This is the first thing we need to do. Opening multiple threads of ICMP operation at the same time. How about opening up 1000 threads? It will be finished in 4 seconds. Isn’t it great? Not really, it has some consequences.

  • Increased LAN traffic: Sending 1000 ICMP packets at the same second will generate lots of traffic in your LAN/WAN. (around 70 bytes per packet * 1000 threads = 70000 bytes/sec =560000 bits/sec = 560Kbps one-way traffic. Considering there would be replies to these requests, the total bandwidth consumption can easily reach 1Mbps.
  • CPU Cycles: Each thread will consume CPU and Memory resources. Source machine should be able to cope with this. 

This is just the sweeping part of it. In the real world scenarios, no inventory or security tool will stop there after it discovered a live IP address. It will go ahead and try to fetch more information. So these two parameters can boost if you open up too many threads.

2: Optimize your ICMP Packet Timeout

I told that the default ICMP timeout is 2 seconds. Luckily this is configurable. Go ahead and send some pings to those destination IP addresses. For the “live” ones, capture the round trip time. This is the network delay (plus the processing delay of the remote NIC). That delay will not change much on LAN links, may slightly change on WAN links. Baseline this. So if it is 100msec you can easily put a timeout of 300 msec. This is 3 times more than the baseline but still well below 2 seconds default.

Keep in mind that ICMP is one of the protocols which has the lowest overhead. Layer 7 protocols like SNMP and HTTP will have much more overhead, so above suggestions may bring greater value.

Long sweep times can also result in inconsistencies between the sweep periods. Suppose you started with /24 and found out that is vacant. You continue your sweeping and 10 seconds later became up. If you sweep every day, your inventory (and other dependent OSS systems) will not know this device until the next day. (If you don’t have a change process in place for this device) That’s why there should be a mechanism to listen for new IP address activity during the sweep time. DHCP logs could be a good alternative for the networks that utilize DHCP for IP addressing. A costlier solution could be listening for Syslog events or switch span ports.

Order Capturing

 Order management  Comments Off on Order Capturing
Oct 222015

Today we are going to talk about Order Capturing from the Order Management domain.

What is an Order?

When you ask this question to the customer, he/she will reply, it is the collection of goods and services I am demanding from the provider.

However, from the Provider (and also Order Management) perspective, it is more than that. Order is a complex data structure which includes goods/services that are demanded plus any additional information such as shipping, payment, service location etc. that will help with the fulfillment of the order.

At the end of the day, the primary responsibility of Order Management is to run the necessary orchestration and fulfill an order. It does so by interfacing the infrastructure (network, IT, OSS/BSS), functional groups in the organization, customer and other enabling organizations (shipment providers, payment providers etc.)

Most of the time, the Order’s primary components are the goods/services. These goods and services should be presented in the Product Catalogs, so that the customers can browse and pick the ones they need. At this “selection” stage, we tend not to call these selected items as Order yet, rather we tend to call it Cart. (We also see the terms Shopping Cart and Basket which refers to the same concept)

So, the customer-selected products are placed to a Cart object.  After the customer is done with his selection, this cart should be freeze. This stage is called “Checkout” phase. Most of us know this from our online shopping experiences. At some moment we are asked to hit the “Checkout” button. After the checkout stage, we cannot (or supposed not to) change what is inside the cart. (Why is that? Because Cart belongs to another domain: Catalog Management. The items, the rules between the items, the checks that are done against these items are all reside at the catalog side and we will be using its’ interfaces to construct our Cart object. We cannot change anything in this goods/services list before consulting the Catalog Manager. If we do so, we will break the integrity. We checkout, to leave the Catalog Management domain and enter to another domain: Order Management.)

Can this checked-out cart object be sent to the order management right away? No, because we do not know how to fulfill it yet. We can instantiate the products for that customer but how their rates will be collected? If there are physical goods in the list, where they will be shipped to? Without this additional information, the order cannot be fulfilled. That’s why, the Order Capture platform, will also attempt to collect these details. Some basic validations can also be done at this stage (other than the Catalog Validations, which I will write in another blog post)

The Order, prepared at the Order Capture platform (which includes Cart Information + Additional Information + Customer Information) can now be sent to the Order Management for fulfillment. The OM platform will return an identifier to the Order Capture platform to be used for future queries regarding that specific order. Most of the times, the Order Capture platform is also triggered asynchronously about the key status changes of the order at the OM side (Pending, Error, Cancelled, Success etc.)

Order Capturing, is mostly achieved by CRM systems and Enterprise Portals which allow customizations. In the absence of such system, some add-on tools offered by most of the OM providers can be used. Regardless of the tool type, we should be ready for heavy customizations as each provider’s way of handling the orders differ from the others.

“Smarter” Mediation Platforms to Achieve Customer Experience Management

 Mediation, SQM  Comments Off on “Smarter” Mediation Platforms to Achieve Customer Experience Management
Oct 172015

Today, we will revisit an important function in the telecommunications OSS/BSS space: Mediation.

As we know, the primary purpose of legacy mediation has been to collect usage-related files from the network, extract data from them, do transaction-based enrichments and output them to other BSS platforms (which have been primarily billing and charging systems; but more recently business intelligence and other applications). This limited usage of mediation technology is changing.

The mediation layer has been with us since the beginning of the telecommunications industry. From the very start, its primary technological requirement was to maintain scalability, availability and integrity so that the provider would not lose any money. To cope with the increasing number of subscribers, evolving network technologies and more diversified network services, mediation vendors have increasingly had to focus and challenge themselves to improve their hardware and software processing capabilities.

Some mediation vendors stopped at this point: when getting the CDR from the network elements and passing it to the northbound platforms was achieved as efficiently as possible. However, the data that is exchanged between these platforms can provide invaluable insights to the Communications Service Providers (CSP) beyond just billing and charging. That is because the data is related to the end customer and with some additional correlation and enrichment, it can provide a premium input to any Customer Experience Management and Network intelligence initiative too. So, if you have a very fast, highly configurable and scalable mediation platform, why not utilize it for this purpose and more?

The mediation vendors who have seen this opportunity have re-oriented their product cores so that they accommodate functionality that is increasingly directed to adding value at the customer layer. (More limited vendors have integrated their software with third parties to try to come up with solutions that achieve the same goal, but honestly that is not the same thing.) What more innovative vendors have done is essentially to add advanced programmatic and configurable correlation capabilities on top of the performance-proven data collection layer. They have also improved their northbound interfaces as their ” clients”  will not just be billing and charging but, rather, feeding other IT systems that will expect to communicate in IT-like way. These include CRM platforms through APIs, CEM platforms, Performance Management platforms and customer self-care portals.  All these, and others, will require a new generation of database feeds (such as Big Data ones), APIs and Web Service integrations.

The smartest mediation companies have also expanded their reachability by supporting real time data flows. We know that traditional mediation utilizes a store-and-forward mechanism (batch processing) which introduces delays. These delays are acceptable for postpaid billing purposes but if the CSP wants to have real-time information about its customers (and moreover to act on the findings the information reveals), store-and-forward is ineffective. If you want to execute real-time actions on data you need to be able to process the data-in-motion. Injecting dynamic processing logic into the collection layer is one way to achieve this and mediation has become an important answer to the question of “how?”

To elaborate more on what’s going on in this area, I will talk about a vendor whose software supports all the “smart” features that I mentioned above.

DigitalRoute, a Swedish company with over 350 telco customers worldwide positions itself as an OSS/BSS mediation and policy software provider/ISV and its newest “smart mediation” product is called OSS Mediation. The product is built on a technology that has a relatively long history – the company has 16-plus years experience under its belt, and its version control has now reached 7.2.

Its technology conforms to TMForum standards and DigitalRoute frequently participates in Catalyst projects to contribute to standardization studies.  Apart from its standard mediation functionalities, its MediationZone base platform technology has some interesting features make it stand out. Here are some of the unique, “smart” features that are worth elaborating on.

Workflow Capabilities

OSS Mediation is built around a workflow mechanism that is primarily used for designing (through configuration) the required data exchange paths. It provides a graphical workflow environment where the system user is able to drag & drop the flow elements and associate them in sequence with each other. The flow elements themselves are called “agents” and different kinds of agents exist to handle different functions such as collection, analysis, or distribution. An example: Aggregation Agents are agents that can aggregate, correlate and consolidate different flavors of data sources. If you want to combine two sources to output an aggregate result (such as one addressing a KPI), then you insert an Aggregation Agent or even the company’s own proprietary KPI Management agent into the workflow. If you want that KPI to be written in a DB, then you add an agent type called a Forwarding Agent to the workflow sequence.  Agents can be easily configured by the system user to reflect the business rules required. This kind of flow design definitely delivers faster times-to-market.

Real Time Support

OSS Mediation handles real time data sources. As noted, the product is designed to cope with both batch data (CDRs, XDRs, logs, etc) and real time data flows (Active-Passive network probes, AAA systems; basically, socket-based performance data from any network node.) The union of batch and real time workflows is managed in two different ways according to the users preference, by proprietary DigitalRoute technology.  The alternatives are:

  • Inter Workflow Agents: where different data streams are combined by an agent which Is configured to integrate a real time data feed into a batched output format. Handling convergence this way can probably be described as the “traditional” approach.
  • Workkflow Bridge: This propriety DigitalRoute technology works the other way around (in a manner of speaking), incorporating batched feeds into real time output flows. It enables high speed connectivity between different workflows and allows large volumes of data to be processed simultaneously, i.e. it can be used for scaling out on several CPU’s. The advantages of this approach include high availability due to using multiple batch receivers which avoid other, less reliable end points.  A batched workflow can also be seen as a service that is available to other workflows, making it easier to move data to its final destination despite multiple processing services.

It is also worth mentioning that OSS Mediation is configurable to enable integrated with all well-known network probe vendors and this reduces its deployment and development times dramatically. Combined with the workflow capabilities described above, data-in-motion can easily be converted into valuable real-time actions.

Data Persistence and Summarizations

With its advanced data summarization capabilities, OSS Mediation is a good candidate to be an alternative to a probe analytics platform. It also has a KPI Management Layer that allows the user to define individual KPIs that are then collected in tree-like-structures. This means the user can create what can be thought of as “smart” or proactive (rather than reactive) KPIs. The support of multi-dimensional structures enables OSS Mediation to be utilized as an SQM or Customer Experience Management Platform that is able to build KQIs all the way from raw data.

– Big Data & Cloud Integrations

Since network data volumes are not only huge but also fast growing, it is not really feasible any longer to store data on expensive disks such as SANs. That’s why most CSPs are investing in Big Data technologies to store their network-related data. OSS Mediation has built-in integration with these outlets, for example Hadoop. It also has pre-integration with Cloud platforms such as Amazon WS. With DigitalRoute pre-processing in front of Big Data, Big Data solutions can easily become “Smart Data” enabled.

– Integration & Alerting Support

Since OSS Mediation is also a mediation product, it comes with lots of off-the-shelf integrations with network devices, probes, EMS systems, Billing and charging systems etc.  On the top of these, OSS Mediation also provides a RESTful Web Services layer which enables it to integrate with other platforms such as those that handle Customer Experience Management. It also comes standard with some industry approved alerting options such as SNMP.

OSS Mediation from DigitalRoute represents a very good example of what the next generation mediation platforms will look like. Today’s “customer aware” Communication Service Providers will definitely want to utilize products like OSS Mediation to understand their customer usage behaviors. Armed with an end-to-end experience insight, service providers will eventually enhance the quality of the experience they deliver and that will lead to increased profits and reduced churn.

– More Detailed Info For OSS Mediation can be found in the below links. –

Learn more about DigitalRoute’s OSS Mediation by clicking here

Download an analyst report by Stratecast at no charge, “The Platform of the Future,” here

View a deployed OSS Mediation for Customer Experience Use Case here