Monday, March 21, 2005

Honeypots

Honeypots
Definitions and Value of Honeypots
Lance Spitzner
http://www.tracking-hackers.com
Last Modified: 29 May, 2003

Honeypots are an exciting new technology with enormous potential for the security community. The concepts were first introduced by several icons in computer security, specifically Cliff Stoll in the book The Cuckoo's Egg", and Bill Cheswick's paper " An Evening with Berferd." Since then, honeypots have continued to evolve, developing into the powerful security tools they are today. The purpose of this paper is to explain exactly what honeypots are, their advantages and disadvatages, and their value to the security.

Definitions
The first step to understanding honeypots is defining what a honeypot is. This can be harder then it sounds. Unlike firewalls or Intrusion Detection Systems, honeypots do not solve a specific problem. Instead, they are a highly flexible tool that comes in many shapes and sizes. They can do everything from detecting encrypted attacks in IPv6 networks to capturing the latest in on-line credit card fraud. Its is this flexibility that gives honeypots their true power. It is also this flexibility that can make them challenging to define and understand. As such, I use the following definition to define what a honeypot is.

A honeypot is an information system resource whose value lies in unauthorized or illicit use of that resource. This is a general defintion covering all the different manifistations of honeypots. We will be discussing in this paper different examples of honeypots and their value to security. All will fall under the definition we use above, their value lies in the bad guys interacting with them. Conceptually almost all honeypots work they same. They are a resource that has no authorized activity, they do not have any production value. Theoreticlly, a honeypot should see no traffic because it has no legitimate activity. This means any interaction with a honeypot is most likely unauthorized or malicious activity. Any connection attempts to a honeypot are most likely a probe, attack, or compromise. While this concept sounds very simple (and it is), it is this very simplicity that give honeypots their tremendous advantages (and disadvantages). I highlight these below.

Advantages: Honeypots are a tremendously simply concept, which gives them some very powerful strengths.

Small data sets of high value: Honeypots collect small amounts of information. Instead of logging a one GB of data a day, they can log only one MB of data a day. Instead of generating 10,000 alerts a day, they can generate only 10 alerts a day. Remember, honeypots only capture bad activity, any interaction with a honeypot is most likely unauthorized or malicious activity. As such, honeypots reduce 'noise' by collectin only small data sets, but information of high value, as it is only the bad guys. This means its much easier (and cheaper) to analyze the data a honeypot collects and derive value from it.
New tools and tactics: Honeypots are designed to capture anything thrown at them, including tools or tactics never seen before.
Minimal resources: Honeypots require minimal resources, they only capture bad activity. This means an old Pentium computer with 128MB of RAM can easily handle an entire class B network sitting off an OC-12 network.
Encryption or IPv6: Unlike most security technologies (such as IDS systems) honeypots work fine in encrypted or IPv6 environments. It does not matter what the bad guys throw at a honeypot, the honeypot will detect and capture it.
Information: Honeypots can collect in-depth information that few, if any other technologies can match.
Simplicty: Finally, honeypots are conceptually very simple. There are no fancy algorithms to develop, state tables to maintain, or signatures to update. The simpler a technology, the less likely there will be mistakes or misconfigurations.

Disadvantages: Like any technology, honeypots also have their weaknesses. It is because of this they do not replace any current technology, but work with existing technologies.

Limited view: Honeypots can only track and capture activity that directly interacts with them. Honeypots will not capture attacks against other systems, unless the attacker or threat interacts with the honeypots also.
Risk: All security technologies have risk. Firewalls have risk of being penetrated, encryption has the risk of being broken, IDS sensors have the risk of failing to detect attacks. Honeypots are no different, they have risk also. Specifically, honeypots have the risk of being taken over by the bad guy and being used to harm other systems. This risk various for different honeypots. Depending on the type of honeypot, it can have no more risk then an IDS sensor, while some honeypots have a great deal of risk. We identify which honeypots have what levels of risk later in the paper.

It is how you leverage these advantages and disadvantages that defines the value of your honeypot (which we discuss later).

Types of Honeypots
Honeypots come in many shapes and sizes, making them difficult to get a grasp of. To help us better understand honeypots and all the different types, we break them down into two general categories, low-interaction and high-interaction honeypots. These categories helps us understand what type of honeypot you are dealing with, its strengths, and weaknesses. Interaction defines the level of activity a honeypot allows an attacker. Low-interaction honeypots have limited interaction, they normally work by emulating services and operating systems. Attacker activity is limited to the level of emulation by the honeypot. For example, an emulated FTP service listening on port 21 may just emulate a FTP login, or it may support a variety of additional FTP commands. The advantages of a low-interaction honeypot is their simplicity. These honeypots tend to be easier to deploy and maintain, with minimal risk. Usually they involve installing software, selecting the operating systems and services you want to emulate and monitor, and letting the honeypot go from there. This plug and play approach makes deploying them very easy for most organizations. Also, the emulated services mitigate risk by containing the attacker's activity, the attacker never has access to an operating system to attack or harm others. The main disadvantages with low interaction honeypots is that they log only limited information and are designed to capture known activity. The emulated services can only do so much. Also, its easier for an attacker to detect a low-interaction honeypot, no matter how good the emulation is, skilled attacker can eventually detect their presence. Examples of low-interaction honeypots include Specter, Honeyd, and KFSensor.

High-interaction honeypots are different, they are usually complex solutions as they involve real operating systems and applications. Nothing is emulated, we give attackers the real thing. If you want a Linux honeypot running an FTP server, you build a real Linux system running a real FTP server. The advantages with such a solution are two fold. First, you can capture extensive amounts of information. By giving attackers real systems to interact with, you can learn the full extent of their behavior, everything from new rootkits to international IRC sessions. The second advantage is high-interaction honeypots make no assumptions on how an attacker will behave. Instead, they provide an open environment that captures all activity. This allows high-interaction solutions to learn behavior we would not expect. An excellent example of this is how a Honeynet captured encoded back door commands on a non-standard IP protocol (specifically IP protocol 11, Network Voice Protocol). However, this also increases the risk of the honeypot as attackers can use these real operating system to attack non-honeypot systems. As result, additional technologies have to be implement that prevent the attacker from harming other non-honeypot systems. In general, high-interaction honeypots can do everything low-interaction honeypots can do and much more. However, they can be more complext to deploy and maintain. Examples of high-interaction honeypots include Symantec Decoy Server and Honeynets. You can find a complete listing of both low and high interaction honeypots at Honeypot Solutions page. To better understand both low and high interaction honeypots lets look at two examples. We will start with the low-interaction honeypot Honeyd.

Honeyd: Low-interaction honeypot
Honeyd is a low-interaction honeypot. Developed by Niels Provos, Honeyd is OpenSource and designed to run primarily on Unix systems (though it has been ported to Windows). Honeyd works on the concept of monitoring unused IP space. Anytime it sees a connection attempt to an unused IP, it intercepts the connection and then interacts with the attacker, pretending to be the victim. By default, Honeyd detects and logs any connection to any UDP or TCP port. In addition, you can configure emulated services to monitor specific ports, such as an emulated FTP server monitoring TCP port 21. When an attacker connects to the emulated service, not only does the honeypot detect and log the activity, but it captures all of the attacker's interaction with the emulated service. In the case of the emulated FTP server, we can potentially capture the attacker's login and password, the commands they issue, and perhaps even learn what they are looking for or their identity. It all depends on the level of emulation by the honeypot. Most emulated services work the same way. They expect a specific type of behavior, and then are programmed to react in a predetermined way. If attack A does this, then react this way. If attack B does this, then respond this way. The limitation is if the attacker does something that the emulation does not expect, then it does not know how to respond. Most low-interaction honeypots, including Honeyd, simply generate an error message. You can see what commands the emulated FTP server for Honeyd supports by review the source code.

Some honeypots, such as Honeyd, can not only emulate services, but emulate actual operating systems. In other words, Honeyd can appear to the attacker to be a Cisco router, WinXP webserver, or Linux DNS server. There are several advantages to emulating different operating systems. First, the honeypot can better blend in with existing networks if the honeypot has the same appearance and behavior of production systems. Second, you can target specific attackers by providing systems and services they often target, or systems and services you want to learn about. There are two elements to emulating operating systems. The first is with the emulated services. When an attacker connects to an emulated service, you can have that service behave like and appear to be a specific OS. For example, if you have a service emulating a webserver, and you want your honeypot to appear to be a Win2000 server, then you would emulate the behavior of a IIS webserver. For Linux, you would emulate the behavior of an Apache webserver. Most honeypots emulate OS' in this manner. Some sophisticated honeypots take this emulation one step farther (as Honeyd does). Not only do they emulate at the service level, but at the IP stack level. If someone uses active fingerprinting measures to determine the OS type of your honeypot most honeypots respond with the IP stack of whatever OS the honeypot is installed on. Honeyd spoof the replies, making not only the emulated services, but emulated IP stacks behave as the operating systems would. The level of emulation and sophistication depends on what honeypot technology you chose to use.

Honeynets: High-interaction honeypot
Honeynets are a prime example of high-interaction honeypot. Honeynets are not a product, they are not a software solution that you install on a computer. Instead, Honeyents are an architecture, an entire network of computers designed to attacked. The idea is to have an architecture that creates a highly controlled network, one where all activity is controlled and captured. Within this network we place our intended victims, real computers running real applications. The bad guys find, attack, and break into these systems on their own initiative. When they do, they do not realize they are within a Honeynet. All of their activity, from encrypted SSH sessions to emails and files uploads, are captured without them knowing it. This is done by inserting kernel modules on the victim systems that capture all of the attacker's actions. At the same time, the Honeynet controls the attacker's activity. Honeynets do this using a Honeywall gateway. This gateway allows inbound traffic to the victim systems, but controls the outbound traffic using intrusion prevention technologies. This gives the attacker the flexibility to interact with the victim systems, but prevents the attacker from harming other non-Honeynet computers. An example of such a deployment can be seen in Figure 1.

Value of Honeypots
Now that we have understanding of two general categories of honepyots, we can focus on their value. Specifically, how we can use honeypots. Once again, we have two general categories, honeypots can be used for production purposes or research. When used for production purposes, honeypots are protecting an organization. This would include preventing, detecting, or helping organizations respond to an attack. When used for research purposes, honeypots are being used to collect information. This information has different value to different organizations. Some may want to be studying trends in attacker activity, while others are interested in early warning and prediction, or law enforcement. In general, low-interaction honeypots are often used for production purposes, while high-interaction honeypots are used for research purposes. However, either type of honeypot can be used for either purpose. When used for production purposes, honeypots can protect organizations in one of three ways; prevention, detection, and response. We will take a more in-depth look at how a honeypot can work in all three.

Honeypots can help prevent attacks in several ways. The first is against automated attacks, such as worms or auto-rooters. These attacks are based on tools that randomly scan entire networks looking for vulnerable systems. If vulnerable systems are found, these automated tools will then attack and take over the system (with worms self-replicating, copying themselves to the victim). One way that honeypots can help defend against such attacks is slowing their scanning down, potentially even stopping them. Called sticky honeypots, these solutions monitor unused IP space. When probed by such scanning activity, these honeypots interact with and slow the attacker down. They do this using a variety of TCP tricks, such as a Windows size of zero, putting the attacker into a holding pattern. This is excellent for slowing down or preventing the spread of a worm that has penetrated your internal organization. One such example of a sticky honeypot is LaBrea Tarpit. Sticky honeypots are most often low-interaction solutions (you can almost call them 'no-interaction solutions', as they slow the attacker down to a crawl :). Honeypots can also be protect your organization from human attackers. The concept is deception or deterrence. The idea is to confuse an attacker, to make him waste his time and resources interacting with honeypots. Meanwhile, your organization has detected the attacker's activity and have the time to respond and stop the attacker. This can be even taken one step farther. If an attacker knows your organization is using honeypots, but does not know which systems are honeypots and which systems are legitimate computers, they may be concerned about being caught by honeypots and decided not to attack your organizations. Thus the honeypot deters the attacker. An example of a honeypot designed to do this is Deception Toolkit, a low-interaction honeypot.

The second way honeypots can help protect an organization is through detection. Detection is critical, its purpose is to identify a failure or breakdown in prevention. Regardless of how secure an organization is, there will always be failures, if for no other reasons then humans are involved in the process. By detecting an attacker, you can quickly react to them, stopping or mitigating the damage they do. Tradtionally, detection has proven extremely difficult to do. Technologies such as IDS sensors and systems logs haven proven ineffective for several reasons. They generate far too much data, large percentage of false positives, inability to detect new attacks, and the inability to work in encrypted or IPv6 environments. Honeypots excel at detection, addressing many of these problems of traditional detection. Honeypots reduce false positives by capturing small data sets of high value, capture unknown attacks such as new exploits or polymorphic shellcode, and work in encrypted and IPv6 environments. You can learn more about this in the paper Honeypots: Simple, Cost Effective Detection. In general, low-interaction honeypots make the best solutions for detection. They are easier to deploy and maintain then high-interaction honeypots and have reduced risk.

The third and final way a honeypot can help protect an organization is in reponse. Once an organization has detected a failure, how do they respond? This can often be one of the greatest challenges an organization faces. There is often little information on who the attacker is, how they got in, or how much damage they have done. In these situations detailed information on the attacker's activity are critical. There are two problems compounding incidence response. First, often the very systems compromised cannot be taken offline to analyze. Production systems, such as an organization's mail server, are so critical that even though its been hacked, security professionals may not be able to take the system down and do a proper forensic analysis. Instead, they are limited to analyze the live system while still providing production services. This cripiles the ability to analyze what happend, how much damage the attacker has done, and even if the attacker has broken into other systems. The other problem is even if the system is pulled offline, there is so much data pollution it can be very difficult to determine what the bad guy did. By data pollution, I mean there has been so much activity (user's logging in, mail accounts read, files written to databases, etc) it can be difficult to determine what is normal day-to-day activity, and what is the attacker. Honeypots can help address both problems. Honeypots make an excellent incident resonse tool, as they can quickly and easily be taken offline for a full forensic analysis, without impacting day-to-day business operations. Also, the only activity a honeypot captures is unauthorized or malicious activity. This makes hacked honeypots much easier to analyze then hacked production systems, as any data you retrieve from a honeypot is most likely related to the attacker. The value honeypots provide here is quickly giving organizations the in-depth information they need to rapidly and effectively respond to an incident. In general, high-interaction honeypots make the best solution for response. To respond to an intruder, you need in-depth knowledge on what they did, how they broke in, and the tools they used. For that type of data you most likely need the capabilities of a high-interaction honeypot.

Up to this point we have been talking about how honeypots can be used to protect an organization. We will now talk about a different use for honeypots, research. Honeypots are extremely powerful, not only can they be used to protect your organization, but they can be used to gain extensive information on threats, information few other technologies are capable of gathering. One of the greatest problems security professionals face is a lack of information or intelligence on cyber threats. How can we defend against an enemy when we don't even know who that enemy is? For centuries military organizations have depended on information to better understand who their enemy is and how to defend against them. Why should information security be any different? Research honeypots address this by collecting information on threats. This information can then be used for a variety of purposes, including trend analysis, identifying new tools or methods, identifying attackers and their communities, early warning and prediction, or motivations. One of the most well known examples of using honeypots for research is the work done by the Honeynet Project, an all volunteer, non-profit security research organization. All of the data they collect is with Honeynet distributed around the world. As threats are constantly changing, this information is proving more and more critical.

Getting Started
If you have never worked with honeypots before and want to learn more, I recommend starting with simple low-interaction honeypots, such as KFSensor or Specter for Window users, or Honeyd for Unix users. There is even a Honeyd Linux Toolkit for easy deployment of Honeyd on Linux computers. Low-interaction honeypots have the advantage of being easier to deploy and little risk, as they contain the activity of the attacker. Once you have had an opportunity to work with low-interaction solutions, you can take the skills and understanding you have developed and work with high-interaction solutions. To help you better understand honeypots, below is a chart summarizing what we just covered.

Low-interaction
Solution emulates operating systems and services.

Easy to install and deploy. Usually requires simply installing and configuring software on a computer.

Minimal risk, as the emulated services control what attackers can and cannot do.

Captures limited amounts of information, mainly transactional data and some limited interaction.

High-interaction
No emulation, real operating systems and services are provided.

Can capture far more information, including new tools, communications, or attacker keystrokes.

Can be complex to install or deploy (commercial versions tend to be much simpler).

Increased risk, as attackers are provided real operating systems to interact with

Finally, no paper on honeypots would be complete without a discussion about legal issues. There are many misconcepts about the legal issues of honeypots. Instead of briefly covering the legal issues in this paper, I will be releasing a new paper at the end of May, 2003 dedicated to the legal issues of honeypot technologies.

Conclusion
The purpose of this paper was to define the what honeypots are and their value to the security community. We identified two different types of honeypots, low-interaction and high-interaction honeypots. Interaction defines how much activity a honeypot allows an attacker. The value of these solutions is both for production or research purposes. Honeypots can be used for production purposes by preventing, detecting, or responding to attacks. Honeypots can also be used for research, gathering information on threats so we can better understand and defend against them. If you are interested in learning more about honeypots, you may want to consider the book Honeypots: Tracking Hackers, the first and only book dedicated to honeypot technologies.

No comments: