Finding Probes and Scans in Packet Streams
Wireshark Lab 2
Finding Probes and Scans in Packet Streams
In this assignment your code will identify probes and scans in a stream of packets. Intuitively, a probe is when an agent makes repeated attempts to access or discover a service on a port. A scan is a when an agent tries to map large parts of the IP address/port space to see if there are any running services on those ports. Your code will read in a packet trace as a pcap file, a target IP address, and output a list of probes and scans found against that IP address, as well as the originating IP addresses of the probes and scans.
Formal Definitions:
Figure 1 below shows how the definitions of probes and scans are formalized. The figure shows time on the X-axis and port number on the Y-Axis. If a packet arrives at time t with port p, (for a given IP address), we plot a dot at that point t,p. Ports and Scans are defined as clusters of points in the time vs port space along one axis. A probe is a group of points with the same port number clustered in time. A scan is a
group of points sharing the port space (Y-axis). For example, Figure 1 shows two probes in two separate times periods, and a single scan over a portion of the port space.
There are many algorithms which groups points into logical clusters, however, in this assignment will use a simple one. Clusters are defined by 2 parameters:(1) the width, (which are Wp and Ws for probes and scans, respectively), and (2) the number (which are Npand Ns). The number is the minimum number of points needed for a group to be considered; that is, there must be at least Np or Ns points (i.e. packets) in a group to report a probe or scan.
A cluster on a given axis is defined by points that closer together; if a point is "too far away" it is considered to be in another cluster. The width is the distance between consecutive points for those points to be considered in the same group. That is, the invariant for a point to be in a group (probe or scan) is that a point must we at least Wp/Ws units (seconds or port ID) to at least one point in the same group.
Figure 1: Ports vs Time in a Packet Stream Define Probes and Scans Your code will take six parameters as inputs using the following options (in parenthesis)
- The filename of the pcap file. (-f )
- A target IP address. (-t)
- The width for probes, in seconds, Wp. (-l)
- The minimum number of packets in a probe, Np. (-m)
- The width for scans, in port ID, Ws. (-n)
- The minimum number of packets in a scan, Ns. (-p)
Your code will will output lists of identified probes and scans, and the source IPs of each probe and scan. One set of reports will be for TCP and other for UDP.
Recommended Strategy:
As you read all the packets in a pcap file, build two lists: one is the packets sorted by time, and the other by port number. You will need to do this for TCP and UDP packets separately. After reading in all the packets, you need to find clusters in each list, the resulting clusters will become the ports and scans. Each identified port or scan should be kept on a list of ports or list of scans.
To find the clusters, start at the minimum port ID or time, and keep adding points inside of the width, that is, keep traversing the packets until you find a packet separated by larger than the width. At this point, label the clustered packets into a given probe number of scan number, and add those to the list of probes or scans. Next, for each port and scan, if the number of packets is greater than the Np or Ns parameters, output the port/scan header and then the source IP's addresses for that list.
Grading:
You will be given 4 test pcap files and their sample output files (scan_001.pcap, probe_001.pcap, scan_002.pcap ,probe_002.pcap, output-scan-001.txt, output-probe-001.txt ... ) and we will also run your code on 2 additional pcap files. Also, your code must run on the ilab machines. If you use libraries that are not on the ilab machines, your code will be considered incorrect.
Upload a single file only. The main to execute must be called: wireshark2.py, and it should print its report to standard out, i.e. just use regular print statements.