Aug 152016
 

Today’s topic is about the Network Sweeping and how it can be optimized. As you may know from the previous topics, sweeping means searching a subnet by attempting to connect to each and every possible IP addresses it has.  Usually, the initial protocol is ICMP due to its’ low overhead. (In that case, the sweep is called Ping Sweep). SNMP and even HTTP interfaces are also used as sweep protocols.

Sweeping is used in different domains, such as;

  • Security
  • Inventory Management
  • Performance Management
  • Configuration Management

Sweeping could be time and resource consuming (both for sender and receiver side). That’s why, for most enterprise customers, it is normally done daily.

For large networks, it may take hours to complete a sweeping process. Consider the scenario of sweeping a class C IP subnet. (It will have at least 254 IP addresses.). Also, suppose that only 10 devices exist in that subnet. I am supposing I will be using ICMP for discovery. That is the simple ping request and at least I need to send 2 ICMP packets to be sure that there is a device there. (50% packet loss still means the remote side is up)

For the reachable devices, the round-trip ping time should not exceed 5ms. Considering we have 2 ICMP packets, it would be 10ms per check. We have 10 devices and it would take around 100ms which is well below 1 sec. That’s a great performance if you just consider pinging the “up” devices. But what about the remaining 244 down ones?

ICMP timeout kicks in when dealing with the dead devices or vacant IP addresses. ICMP timeout is the duration in milliseconds for the ping software will wait until an ICMP echo reply package arrives. If the packet does not arrive within that period, it will report it as “down”. The default timeout for ICMP in Cisco routers is 2 seconds. So, using the defaults, if you use 2 seconds as the timeout, for 2 packets in the test, you will have to wait 4 seconds per test. If we do the math, the total wait time for the class C subnet on hand would be 976 seconds, roughly 16 minutes. Organizations that rely on sweeping normally have much bigger subnets with thousands of possible IP addresses. The sweeping process would take hours in such kind of networks.

Luckily, we can tweak this process so it will take less time.

1: Use of Parallel Measurements:

This is the first thing we need to do. Opening multiple threads of ICMP operation at the same time. How about opening up 1000 threads? It will be finished in 4 seconds. Isn’t it great? Not really, it has some consequences.

  • Increased LAN traffic: Sending 1000 ICMP packets at the same second will generate lots of traffic in your LAN/WAN. (around 70 bytes per packet * 1000 threads = 70000 bytes/sec =560000 bits/sec = 560Kbps one-way traffic. Considering there would be replies to these requests, the total bandwidth consumption can easily reach 1Mbps.
  • CPU Cycles: Each thread will consume CPU and Memory resources. Source machine should be able to cope with this. 

This is just the sweeping part of it. In the real world scenarios, no inventory or security tool will stop there after it discovered a live IP address. It will go ahead and try to fetch more information. So these two parameters can boost if you open up too many threads.

2: Optimize your ICMP Packet Timeout

I told that the default ICMP timeout is 2 seconds. Luckily this is configurable. Go ahead and send some pings to those destination IP addresses. For the “live” ones, capture the round trip time. This is the network delay (plus the processing delay of the remote NIC). That delay will not change much on LAN links, may slightly change on WAN links. Baseline this. So if it is 100msec you can easily put a timeout of 300 msec. This is 3 times more than the baseline but still well below 2 seconds default.

Keep in mind that ICMP is one of the protocols which has the lowest overhead. Layer 7 protocols like SNMP and HTTP will have much more overhead, so above suggestions may bring greater value.

Long sweep times can also result in inconsistencies between the sweep periods. Suppose you started with 10.1.1.1 /24 and found out that 10.1.1.1 is vacant. You continue your sweeping and 10 seconds later 10.1.1.1 became up. If you sweep every day, your inventory (and other dependent OSS systems) will not know this device until the next day. (If you don’t have a change process in place for this device) That’s why there should be a mechanism to listen for new IP address activity during the sweep time. DHCP logs could be a good alternative for the networks that utilize DHCP for IP addressing. A costlier solution could be listening for Syslog events or switch span ports.

SIEM

 Other, Security, SQM  Comments Off on SIEM
Nov 302013
 

SIEM stands for “Security Information and Event Management” and it is a well known OSS system in the security world. It is not much visible to other domains because it has been used mainly for internal purposes.

Today, I am going to talk about what SIEM is, and elaborate on possible uses of it.

Every system, servers (VM, Hypervisor, Physical), routers, switches, access points, firewalls, IPSs produce logs. These logs can be huge. To process all these logs we should talk in big data terms but this is not todays’ topic. So lets’ decrease the scope: To the access logs level.

The access logs on a system (Login/Logout requests, Password change requests, FW dropped access) should be collected for security purposes. Who connected where?, who took exception in the login process, who sweeps the IP addresses in the network? who was dropped on the Firewall/IPS? (The “who” portion can be the real identity of the user (via integration with AD/LDAP).)

The SIEM system’s, first goal is the store these logs in the most effective way. Since the log data can be high in terms of volume and velocity,the archiving system should be a specialized one and utilize fast disks and/or in-memory databases.

After the log collection, the SIEM’s second goal can be achieved: Log correlation.

In the Log correlation phase, SIEM system will correlate the logs from multiple sources to combine under a single incident. The correlation can be rule based or topology based. SIEM system for example take a connect event and look for the destination IP address in the blacklist (C&C Control center db etc) database. If there is any match, an alarm will be created in the system that can be forwarded to TT or FM systems. Or it can directly generate an alarm in the case of a login failure to a critical system. These are good candidates for rule based correlation.

A topology based example could be: If 3 login failed alarms are received from the same system then assign a score to this server. If the same system had a Firewall block in the same day, enrich the score. If more than 2 of the servers in a specific server farm have decreased to low scores than generate an alarm. This is a simple example for a statefull pattern that can be identified by SIEM.

(By the way, some operators do not have the desire to generate security related alarms. The main driver for a SIEM could also be reducing the logs to a human manageble level for further review by the security staff.)

Alternatives are limitless but managing and maintaining this mechanism could be a burden for the security department.

I see 2 problems with SIEM investments: First one is the maintenance. Security personnel are not generally familiar with the OSS and their integrations. They can provide the rules but will not be able to (or have time to) implement those on the SIEM.
So, they will rely on the OSS department (if any) which will not know anything about security. The miscommunication may lead to problems and under utilization of the investment. A solution to this problem could be outsourcing of implementation to the vendor. The vendor solutions are generally “empty” in terms of pre-defined rules so each rule should be built from scratch. The cost for the implementation could grow dramatically.

Second problem is overlapping of functions. For example, your goal is to be notified when a system tries to connect to a well-known bot center. This requirement can be achieved by SIEM but also with other “cheaper” security components. Or if you have an SQM, why not consider using it if your topology based correlation requirements are less?

When investing on a SIEM you should elaborate if you would be able to fully utilize the system, as this OSS component is generally a not cheap one.

OSS and Security

 Security  Comments Off on OSS and Security
May 062013
 

Today, I’d like to talk a little bit about security and it’s implications on our OSS systems. As OSS are seen mostly “internal” to our organization, most of the time, an OSS system is not security hardened, before going into production. We open up the ports SNMP 161,162 in our firewalls(if any) from the devices to management systems, we open up http:80 or https:443 between our OSS systems for different kinds of API Access.

In the OSS/J article, I mentioned about different kinds of information that can be fetched from an OSS/J enabled platform. Think of an inventory system for example. If I have the correct credentials, I can have the whole network inventory from this system including the customer information. Or I can trigger a service activation within the service activation platform without notifying the order management platform, excluding the Billing system involvement.

As you can see, the access to OSS platforms and their APIs can have the risk to expose your intellectual assets to the outside world and also allows internal  fraud  to occur.

If a malware, running on an admin PC has some kind of access to these sensitive APIs, it can easily transfer this information to the Internet (to the bot manager) to be used in further attacks or information trading.

The communication between device and the EMS is also important. Most of the times, this communication is done within the management network which is also reused by administrator PC’s for telnet/shh  to the devices.  The only protection at the end devices side is the password protection. The passwords should be complex enough to have alphanumeric, special characters and numbers. The password selection process is usually done based of best practices for the telnet/ssh side, however this is not the case for SNMP. Most of the time we will face “easy  to guess” SNMP community strings that can be cracked in a brute force attack. We also face SNMP SET enabled on devices where there is no reason for it, creating another  serious vulnerability.

Another  thing to consider is the management networks. In especially big organizations, management activities are outsourced to different 3PP entities.  The admin PC’s in these entities VPN to the management network to set-up/manage end devices. Since these PC’s are not subject to the companies end user security policy rules, this could be a possible backdoor for bots or hackers to the internal information.

We should always keep in mind the security implications of changes in our OSS infrastructure. We should apply security policies for accessing these systems and keep scheduled security scans for new or possible vulnerabilities. We should protect our management networks, especially it is shared by multiple companies. The management networks should be logically segmented by Service Activation/Resource Provisioning tools. The Access logs should always be collected, and correlated in a central location and reviewed by security personnel.

The OSS security is becoming more important as we utilize more “open” interfaces for management and reporting. Since they are “open”, everybody including the hackers will know how to reach the information. As long as we apply the necessary security controls, we can continue enjoying the interoperability and flexibility that has been delivered by standard OSS interfaces.