Mar 312010

Order Management process (Order Handling in eTOM terms) belongs to the Fulfillment vertical in eTOM. It is also in the CRM functional grouping and considered in the BSS domain. Order Managers automate the order handling process and track the whole lifecycle of an order (from the creation stage to provisioning).

All the Order Management, sometimes called Customer Order Management, products come with a workflow engine. (Either external or a propriety, bundled one) and typically there will be more than one deployed workflows. (Based on product type, customer type etc.) The workflows automate the tasks that should be done in order to fulfill an end-to-end customer order.

Order Management starts with Order Capturing. Order Capturing stage is where we collect the order specific parameters. These may be customer choices, provisioning related parameters and some price related parameters. (Reduction rate, promotions etc.). During order capturing, order is also validated against some basic validation rules. Typically, a CRM system captures the order parameters and triggers a workflow on the order manager side. (Over a service bus)

Second step is the Order Decomposition. Generally, Customer Orders represent the Product Orders.(i.e composed of one or more product provisioning requests). Products may include multiple Customer Facing Services and Physical Resources. The Order Manager should know this hierarchy for a particular product and decompose the Product Order into Service Order(s). Service Orders are then be handled by Service Activators to initiate Resource Orders that does the resource provisioning. (Either automatic or via manual work orders)

There is an important detail in the case where you separate your Order Capturing and Order Management platforms. (Typically that is the case in large implementations.) The CRM, (in the mean time your Product Manager) has the Product Catalog, so it knows the Product-to-service decompositions. If we want Order Manager to do the decomposition, it should have access to the Product catalog. This is possible in three ways. One way is you take full dump of product catalog from your product management platform (or CRM) and import it to the Order Manager, replicating the information. Second way is to use Product Manager’s NBIs to reach its’ inventory data. This is a better way. However, if we deploy a separate Inventory Management application that also employs the Product Catalog, this would be the best solution.

The workflow that the Order Manager is running is sometimes called the business workflow. It has interface with the customer, supplier/partners, billing system etc. It coordinates service activation workflows which activates/deactivates services in the infrastructure. Depending on the business processes, it may include availability checks, feasibility checks and credibility checks. It follows a specific business process.

Order managers allow us to provide our customers the progress of their order. This is increases the customer satisfaction. They reduce the operational expenses and order lifecyle times by increasing automation.

Mar 302010

OSS/J defines a common interface structure and SID based data model for the messages that are delivered between different OSS components (FM, PM, TT etc.). It aims to deliver interoperability between those components with minimum effort.

OSS/J initiative (previously by JCP, now TMF) is composed of specifications. Specifications are issued for OSS domains such as trouble ticket (TT), fault management (FM) etc. and they depict the attributes, naming conventions (templates) and mandatory/optional operations for that domain.

These specifications are fully implemented by the OSS vendors who claim that they are OSS/J compatible. The clients (other OSS components in the infrastructure) connect to the server by using these specifications.

OSS/J also provides 3 integration profiles: EJB (Java), JMS (XML) and web services. Profiles are the possible ways that you may connect to the OSS/J servers. A server may implement one of these, or all of these. Some vendors even add other non-standard profiles that enable the legacy clients to connect. The important part here is the nature of the message flow. For example, fault management is asynchronous in nature. So, for me, the best protocol to use would be JMS. For trouble ticket operations, which are synchronous in nature, we may use EJB or web service profiles.

How should we migrate our interfaces to OSS/J? Well, it is better to explain the process with an example.

Suppose, we have a fault management platform and a trouble ticket platform. We have an expert rule deployed in the fault manager that creates a trouble ticket when it receives a specific string in the alarm’s AdditionalText attribute. FM reaches TT via its “legacy” CORBA interface.

Our TT vendor says that their latest product version supports OSS/J TT Specification. (which means they wrote a server which expose OSS/J interface)

Since we want to migrate to that interface, we ask the vendor which version of the specification they have implemented on the server. They replied with the specification number: v1.2. We also learned that we can use EJB (Java) profile to reach their server.

A quick note: most probably, the vendor did not write the whole NBI stack, but they wrote an intermediary OSS/J server which, at the backend, communicates with the old NBI stack over CORBA. (There will be data mapping and conversions between the specifications) This way, they will able to support both CORBA clients and OSS/J clients and reduce time to market.

On the FM side, we need to write an OSS/J TT client that follows the v1.2 specification. (Generally we ask the FM vendor to write it for us). Within the client’s configuration, we will have to specify the connection details to the remote server. After the client is successfully connected to the server, it can issue calls to operations that are implemented on the server.

The important part in here is; if the vendor is saying, “I am OSS/J TT spec v1.2 compliant”; it should define all the mandatory operations that appear in the specification. For the clients, however, this is not mandatory. For example, if I only need to create a trouble ticket within my Fault Management application, and I do not care about closing it, I can implement only the create TT operation from the spec.

OSS/J brings portability. The client would work, in theory, with any TT OSS/J server (vendor) that conforms the same specification. I say, in theory, because most of the time you are required to do minor modifications on your client.

For those, who are interested in OSS/J, you may have a look at the TMF OSS/J site at:

Mar 262010

Performance Management is about polling data, aggregating it, running thresholds on it and reporting of performance parameters. In this article, I will concentrate on the polling and data retention side of it.

Performance managers deal with lots of data from several resources. This mass amount of data directly impacts the disk space requirements. Since the disk space is not limitless, we should play with some parameters to limit it based on the customer’s requirements.

One of the most important parameters that need to be asked to the customer is the retention period of the data. This depicts the time the system should wait before it purges the data. I used several different retention periods in the projects I have involved. This parameter highly depends on the customer requirements. Some customers may want to see the daily KPIs for a year, while others may require a month.

The second important parameter is the polling period. We always tend to set lower polling periods. However, setting low level polling periods may lead to problems. Here are some examples:

Suppose you are polling (via SNMP GET) from a device interface to get the Inbound Octets KPI. The SNMP object you are polling is a 32-bit counter. In order to get the octets passed, you should subtract previous polling’s counter value from the current one. This is ok. The problem arises when you “wait” too much. If your polling period is 15 minutes for example and this is a highly utilized interface, after it reaches the counter value of 2^32, it resets itself. Even, in some cases, it resets itself multiple times within these 15 minutes. The result is: wrong, misleading information on the reports. To cope with this situation there are some formulas I have been using. But solving this via formulas is not to best way as they are not very reliable at all times. The best way to deal with this particular case is to use 64 bit interfaces (if available in the MIB) or reduce the polling period.

Another example would be from the SQM domain. Suppose you are polling each 5 minutes and forward the KPIs to an SQM system. The SQM system collects those data and run some thresholds on them to do the service impact analysis. In your first poll, your SQM found that the data received violates the threshold limits. The system then marks the status of the service to down and started calculating the downtime. In order for the system to detect the next service status, it should wait for min. 5 minutes. This condition causes the service downtimes to appear as multiples of 5’s in the reports. 5, 10, 15… minutes. If you commit on the %99, 9999 availability to a customer, this granularity is simply not enough. What you should do? Reduce the polling period again.

Reducing the polling period is not very straightforward. When you reduce the polling period, you should ensure that all the pollings and their processing should be finished within that period. Suppose you have a poller which polls 1000 resources each 1 minute. If we assume that each polling takes 2 seconds (including processing delays, propagation delays etc.) this makes 2000 seconds. In 1 minute, we have 3600 seconds so no problem in here. But this is the sunny day scenario. What happens if 10% of my resources cannot respond in 2 seconds or they are not available at all? Obviously there’s a risk to consume the 3600 seconds before finishing the polling. What we can do? Well if we cannot increase the polling period, we can increase the poller count. Instead of using 1 poller, we can use 5 pollers and poll in parallel. Additional pollers bring additional license costs and introduce additional machines to the infrastructure.

Polling periods and retention periods heavily impact the disk size requirements for the performance management solutions. They should be studied carefully before the roll out of any PM implementation.

Introduction to eTOM

 NGOSS  Comments Off on Introduction to eTOM
Mar 252010

eTOM is a map which categorizes and classifies the business processes of a service provider in a hierarchical structure. It gives us a common vocabulary of processes which brings huge benefits in defining business interactions with other entities such as suppliers, partners and customers. It also acts like a marketing tool where product vendors use to claim their products comply with eTOM and specific processes within it.

The best way to express large volume of information is to present it in a hierarchical structure. eTOM uses decomposition method between it’s elements to expose more details. Each process element in the hierarchy (the boxes), decomposes to more detailed process elements in the next level. Decomposition steps are called “levels”. The leveling starts at level-0 and continues. In theory the maximum decomposition level is limitless however in practice we do not see decompositions above level 7. Current version of eTOM (v8) decomposes the process elements until level-3. There are some level-4 decompositions but level-4 is not common yet.

eTOM gives you the process elements to be used when constructing your organizational business flows. The important detail is, it does not mandate how those process elements should interact with each other or how you should order them. eTOM says, these flows are organization specific and it is impossible to cover every different flow in a generic framework. However, as a guideline, TM Forum provides some common flows as an addendum to the framework documentation.

eTOM can be used to construct business flows of new products/services/policies etc. It’s more common use is to guide the re-engineering efforts. Service providers are applying assessments to themselves to see if their current processes comply with the best practices, if they have duplicate or missing processes that may lead to organizational inefficiencies. I will comment on this topic in another article.

We talked about the levels in eTOM. Understanding the levels are important as they define the scope of the process elements you will see in that particular view.

Level 0 does not give us much detail. It is the place where we see the domain areas that we may encounter in a service provider. SIP (Strategy and Commit, Infrastructure, Product), OPS (Operations) and Enterprise. Within the Level 0, we also see the horizontal groupings. It is important that these groupings align with the SID. Level 0 defines the business activities of a service provider organization.

Level 1 introduces new horizontal and vertical groupings. I will not name them all in here. The important detail is the level-1 vertical groupings are overlays on the framework decomposition hierarchy. In a correct decomposition hierarchy, each element should appear only once. That is because the horizontal ones are chosen in the decomposition hierarchy and the vertical ones are left as overlays. The overlaid ones denote the elements’ nature and help us to locate elements that most-probably appear in the end-to-end process flows such as fulfillment. Level 1 is sometimes called CxO view as it focuses on the horizontal and vertical groupings that should be under the responsibility of CIOs and CEOs respectively.

Level 2 is what we can call the core business process view. Process engineering starts here because this is where the process elements (boxes) appear. We can start building the highest level process flows in this context.

Level 3 is the business process flow view that enables us to draw more detailed flow diagrams.

Level 4 and below belongs to the operational processes and highly specific to organizations. We will see product or service specific processes and procedures in those lower levels.

IP Probes

 SQM  Comments Off on IP Probes
Mar 242010

In order to get the end-to-end network performance data, we should install probes to the infrastructure. Probes are the most effective way to reach the end-to-end performance that is perceived by the end user. An alternative approach is to correlate multiple node-to-node performance data to reach the end-to-end performance metrics. This is very hard to maintain and may cause misleading results so we should always use probes where possible.

There are two types of probes. Active and Passive.

Passive probes passively monitor the packets that are passed through them. Basically in the entrance point, they monitor the packet header to identify the source address, destination address and some other information. The remote probe at the exit point does the same. Both probes send this information along with the timestamps to the management system where they are correlated and converted to performance metrics. Passive probing is a costly solution. Its’ main benefit is, they do not generate any traffic so maximum throughput is maintained.

Active Probes, on the other hand, generate special traffic between each other to measure the end-to-end performance. Some probe vendors use ICMP PDUs for this purpose. Other vendors, such as Cisco, prefer to send special PDUs that have additional parameters. Active Probes are cost effective when compared to the passive ones. They should be the preferred approach to measure IP based traffic.

Probes can be external (hardware based) or internal (software based). Software based probes are easier to deploy and maintain. Most popular software based internal probes are Cisco SAA probes (IP SLA).

Probes provide granular data. Typically this data is collected and further aggregated by the performance management systems and forwarded to other systems such as SQM.