Dec 112011

In an operational telecom environment, each fault or quality degradation will be handled by the NOC engineers and repaired by following the necessary steps. These steps are written in knowledgebase management systems or in people’s heads based on the past experiences.

Each time a new problem occurs on the network, it is detected by the network management platforms and if implemented, automatic trouble ticket generation is initiated for the root cause alarms. NOC engineers, handle each trouble ticket separately. During the troubleshooting process, a separate knowledge base system may also be consulted. However, due to the added operational costs, most of the time, knowledge base systems seem not so efficient.

A self-healing method could be used in order to automate these knowledge base systems. In this approach, each reconfiguration activity over the configuration management platform is logged for further reference. In the mean time, alarm information is also logged in the trouble management platform. The  alarms along with the configuration management logs are fed into a database platform where they can be further correlated. The node id (IP address, MAC address etc.) field along with other inventory related configuration  information (such as card id, slot id) can be used as the primary key for this correlation.

During the day to day operation, when a new root cause alarm occurs on the network, the RCA type will be looked up in the knowledge base for a best match to a configuration template.  If a match is found, then the configuration template can be populated to create the self-healing  re-configuration information to be applied to the faulty device.

This way, fully automated could be run without running an end-to-end incident management process. An incident process can and should be triggered as these configuration activities will not be finalized in a second and the service degradation or outages may have been experienced by the customers. However, the first task in the incident flow could be the checking of the alarm to identify if it is applicable for the automated self healing process. If the self healing processes does not apply to the scenario on hand, the incident flow can continue on it’s way. Again, each configuration task that is done over the configuration management platform will  continue to feed the self healing system with new profiles. The more data in the system will lead to better results with the template matching algorithm.