Sep 182012
 

Inventory management systems are a must for most of the root cause analysis and service impact analysis that we rely on.

One of the other primary benefits of a NIM is that it gives you a holistic view of your infrastructure. This way, you can use this infrastructure more effectively and reduce your CAPEX. You can pinpoint your idle resources and assign the next work package to them rather procuring a new one. Seeing the processing capacity, the planning department can apply more processes to lower utilized devices.

The problem that may arise in here is the static nature of the NIM. The data you enter manually or import to the system is static. That is to say, the system does not go and fetch data by itself. You define the device, you define locations, you define IP addresses. All static. Theres nothing wrong with this as the NIM lives like this without a problem. A healthy process architecture, can make the NIM data concrete and up-to-date.

But wouldnt’ it be nice if the NIM becomes more active? For example, I need a virtual machine to be installed on my architecture. I have a look at my devices and see the they are currently at their capacity.
So I need to procure a new device? This may not be the case. Looking at the performance management, I see that the device X is CPU loaded only at the midnight for 2 hours period. Other times its’ utilization is 1% only. For sure this VM can be installed onto this machine if the application on this VM will not use CPU that much on that time.

If my NIM can somehow fetch this load data and reflect it to the provisioning process, the admin or the expert system who will assign the VM to a machine can choose to install it on this device rather starting a new device procurement process.

Offcourse this VM example can be extended to virtual routers or TDM resources. The approach will save resources and promote re-usability while reducing CAPEX.

Do current NIM vendors ready for such change or are they willing to?

Dec 112011
 

In an operational telecom environment, each fault or quality degradation will be handled by the NOC engineers and repaired by following the necessary steps. These steps are written in knowledgebase management systems or in people’s heads based on the past experiences.

Each time a new problem occurs on the network, it is detected by the network management platforms and if implemented, automatic trouble ticket generation is initiated for the root cause alarms. NOC engineers, handle each trouble ticket separately. During the troubleshooting process, a separate knowledge base system may also be consulted. However, due to the added operational costs, most of the time, knowledge base systems seem not so efficient.

A self-healing method could be used in order to automate these knowledge base systems. In this approach, each reconfiguration activity over the configuration management platform is logged for further reference. In the mean time, alarm information is also logged in the trouble management platform. The  alarms along with the configuration management logs are fed into a database platform where they can be further correlated. The node id (IP address, MAC address etc.) field along with other inventory related configuration  information (such as card id, slot id) can be used as the primary key for this correlation.

During the day to day operation, when a new root cause alarm occurs on the network, the RCA type will be looked up in the knowledge base for a best match to a configuration template.  If a match is found, then the configuration template can be populated to create the self-healing  re-configuration information to be applied to the faulty device.

This way, fully automated could be run without running an end-to-end incident management process. An incident process can and should be triggered as these configuration activities will not be finalized in a second and the service degradation or outages may have been experienced by the customers. However, the first task in the incident flow could be the checking of the alarm to identify if it is applicable for the automated self healing process. If the self healing processes does not apply to the scenario on hand, the incident flow can continue on it’s way. Again, each configuration task that is done over the configuration management platform will  continue to feed the self healing system with new profiles. The more data in the system will lead to better results with the template matching algorithm.

Oct 042011
 

Here are 3 articles that I have sent to a Turkish telecommunications magazine: Tele.com.tr.
Unfortunately there is no English translation, but I can say that the context is in-line with my previous blog articles.

Tele.com.tr – December 2011 Issue
Mobile Devices and Web Experience – Page: 62,63
http://www.scribd.com/doc/74883534/Tele-com-tr-Aral%C4%B1k-2011

Tele.com.tr – September 2011 Issue
Network Inventory Management Systems – Page: 54,55
http://www.scribd.com/doc/65277706/Tele-com-tr-Eylul-2011

Tele.com.tr – August 2011 Issue
Service Quality Management – Page: 54,55
http://www.scribd.com/doc/64840536/Tele-com-tr-Temmuz-A%C4%9Fustos-2011

Oct 142010
 

CMDB is an IT acronym for Configuration Management Database. Basically, it is a database that holds the IT “assets” called CIs. (Configuration Items). Every resource that is managed by the IT department can be a CI candidate. These CIs can point to logical or physical infrastructure items. For example, an application is a typical logical CI , while a PC is a physical one.

CMDB is structured in a hierarchical manner. CIs include other CIs etc. Applications run on machines, customers use applications.. In the CMDB terminology CIs can be connected to each other via several role types such as depends, includes etc.

The main reason why I started this post is to elaborate if the CMDBs can be “re-used” as an NIMS (Network Inventory Management Systems).

All of the telecom operators have IT departments and most of those invested in CMDB systems. These CMDBs hold the data for the users, PC’s, routers , switches, all the CIs that the IT department needs. Can this investment be reused to include the network data?

Well, there will be some administrative problems as CMDB is owned by the IT and IT manager will not want to share it. Suppose that we somehow overcame this issue. We now have to think about the technical feasibility of the solution.

Looking from the scalability perspective, I can surely say that CMDBs can handle thousands of CIs and their relationships without any problem. Most CMDB vendors (IBM, HP etc.) will give you clues about the scalability of their products.

Passing this first concern, we come to the second one: Can we model the network inside the CMDB? For me yes! Let me give you an example of how can it be achieved.

Suppose we are selling a SDH VC-12 data service. This VC-12 low-order circuit that starts from SDH node A will end at node B. On its way it will be transported over several high-order VC4s that are running on physical STM-1, 4, 16 circuits. Can this be modelled inside CMDB? Definitely yes. The same hierarchy can be structured in the CMDB. On the top of this SDH link I can bind router ports. And again bind those to routers. I can even model cities as CIs and put those routers under them.

This approach , for sure, will bring lots of system integration and development effort. However, if your infrastructe is not very complex, this can be a more cost effective solution than going with a COTS NIMS product.

Jun 112010
 

Service providers started moving to IP based technologies in their infrastructures because of the flexibilities offered by these technologies. We have seen all-IP projects all around the world where traditional voice, data and transmission infrastructures are transformed to IP based infrastructures. All-IP also means sending the traditional assets (NEs and management systems) to trash. Thus, big operators try to use a transition approach to move to all-IP. In this approach, they are still using traditional technologies (such as SDH/SONET) in the backbone and put IP on the top of them. However, this reduces throughput as these technologies add extra overhead. Operators that have vast amounts of free capacity may not care this throughput issue now but it will definitely be a problem in the future.

Traditionally, the transmission environment was dependent on the circuit based technologies. SDH, PDH, SONET, ATM, Frame Relay are those type of technologies where you construct logical circuits on the top of physical infrastructure. Looking from the OSS perspective, modeling a circuit based technology is easy. This is because, it is predictable. For example, we will definitely know that the VC-12 low-order circuit that starts from point A will end at point B. On its way it will be transported over several high-order VC4s that are running on physical STM-1, 4, 16 circuits.

In our fault managers it is “easy” to load this hierarchy to a correlation engine and run root-cause analysis algorithms to find the root cause of a network problem. Or, if we take the ATM case, the PVCs are deployed on predefined paths which are also a good candidate for topology based root cause analysis. Frame relay likewise…

When we move to IP, things become more complicated from the logical inventory perspective. The root from point A to point B is no longer predictable. If one of the links to that direction fails, the router will, hopefully, find another way around to send the packet. This dynamic behavior put a barrier in front of the traditional topology based root cause analysis.

In order to do root-cause analysis in packet-based dynamic networks, we need another approach. That is Real-Time Topology Discovery. Real-Time Topology Discovery uses the same techniques of any auto discovery process. The difference is that it does this more frequently. There are 2 approaches I have seen up to now. Virtual Routers and Routing Protocol Listening.

Routing Protocol Listening, utilizes the “topology table” feature of linked-state routing protocols (such as OSPF and IS-IS). Devices (routers) that implement a link-state routing protocol maintains two tables in their memory: Routing Table and Topology Table. Devices first generate their topology tables by listening routing protocol updates. After the topology table is generated, they apply a best-path algorithm to determine the best paths to the network destinations. These best paths are inserted into the routing table to be used in the routing decisions.

Routing Protocol Listeners “sniff” these topology related conversations between the devices to construct a real-time topology of the network. This topology can then be used in root cause analysis. Some network optimization tools use this technique to populate the network topology where they can later run some what-if scenarios on the top. This topology can also be enriched by some SNMP queries to have some other information that are not exchanged by routing protocols. (Such as link utilizations)

The second approach is similar to the first one in some sense. In this approach you create an in-memory clone of the device on your management server. For each device/VRF, you create another instance. You copy the initial configuration of the real device on the virtual one and then start listening alarms/events from the real-device.
This way, you have the near real time topological view of your network where you can base your analysis.

First approach is more real-time but this applies to link state protocols only. If you are using a distance vector protocol, for example, you have to pick the second option.

In today’s world where we started to work more with intelligent NEs and network protocols, Real-Time Topology Discovery solutions will definitely find their places in the next generation OSS.