Auditing Rejected Connections to DSE | DataStax Academy: Free Cassandra Tutorials and Training

Ever had messages like

INFO  [epollEventLoopGroup-6-2] 2018-05-10 19:10:49,142  Message.java:666 -Unexpected exception during request; channel = [id: 0xa5fa6b7c, L:/10.200.179.113:9042 ! R:/10.200.179.114:53533]
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record

in your DSE logs and can’t figure out what process is trying to connect? In this blog post, we are going to walk through one method to determine what process is failing to connect to DSE by leveraging the auditd framework.

If you would like to step through the procedure along with this article, then first things first, set up a cluster with client encryption enabled, non optional. I am not going to cover that here, but rather point you towards the DataStax documentation for setting up DSE with SSL.

To demonstrate a connection being rejected by DSE, we are going to create a script to connect to DSE without using SSL. For simplicity’s sake, we shall just be using cqlsh to perform the connection.

#!/bin/bash
while true
do
        cqlsh 10.200.179.113 -e "DESCRIBE FULL SCHEMA"
        sleep 30
done

I shall be running this script from 10.200.179.114 and attempting to connect to 10.200.179.113 without using SSL on a node which has been configured to only allow encrypted traffic. If we kick off that script we shall begin to see messages such as below appearing in the log every 30 seconds.

Unexpected exception during request; channel = [id: 0xa5fa6b7c, L:/10.200.179.113:9042 ! R:/10.200.179.114:53533]
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record

Now let’s put ourselves in a black box and assume we had no knowledge of this script that we just started. All we know is what we can see in the system.log, which would be the above message, so let’s analyze that. We can see that the request to our DSE node, 10.200.179.113 on port 9042 is coming from 10.200.179.114 via L:/10.200.179.113:9042 ! R:/10.200.179.114:53533, so we shall need to go to that server to see what is trying to connect.

Now many, will automatically think to use netstat to see what process is using the port. However, that will not work for this scenario. The problem is that we know DSE is rejecting the connection, so in essence, we are dealing with an extremely short-lived connection, rather than a persisting one. Even if we loop netstat, odds are we will not see the connection in the output.

On 10.200.179.114, let’s install the auditd framework if it is not already present:

Debian:

sudo apt-get install auditd

Red Hat:

sudo yum install audit

The goal is to identify all sys_connect system calls, syscall 42, on the server to locate the process trying to connect on 9042. Essentially, we want to know every time a socket connection is attempted. I am going to assign a “key” of AUDITCONNECT so that we can more easily search these. In addition to the connect syscall, we are also going to monitor the execve syscall, which I shall elaborate on later, tagged as AUDITEXEC. Execve is called every time we execute a file. A complete list of syscalls can be found at https://filippo.io/linux-syscall-table/.

Before enabling auditing, try and identify a pattern to the connections so that we only have auditing enabled for a small window. For example, if the connection is tried at the top of every hour, we can just enable auditing just around that window. Of course, this is not always possible to identify.

To enable the auditing, which logs to /var/log/audit/audit.log, run the following command:

sudo auditctl -a exit,always -F arch=b64 -S connect -k AUDITCONNECT
sudo auditctl -a exit,always -F arch=b64 -S execve -k AUDITEXEC

We need to leave this running long enough to capture the connect calls. We can of course verify if a request was rejected via the nodes’ system.log files. To stop auditing the connect system calls, we need to delete the rule we created, so in this case run:

sudo auditctl -d exit,always -F arch=b64 -S connect -k AUDITCONNECT
sudo auditctl -d exit,always -F arch=b64 -S execve -k AUDITEXEC

If you just crack open /var/log/audit/audit.log and search inside, you will find this quite difficult to decipher.

type=SOCKADDR msg=audit(1525979767.785:18): saddr=020023520AC8B3710000000000000000
type=SYSCALL msg=audit(1525979798.312:19): arch=c000003e syscall=42 success=no exit=-115 a0=5 a1=7ffd5f
2e3470 a2=10 a3=7ffd5f2e31c0 items=0 ppid=14913 pid=14926 auid=1002 uid=1002 gid=1002 euid=1002 suid=10
02 fsuid=1002 egid=1002 sgid=1002 fsgid=1002 tty=pts0 ses=70 comm="python" exe="/usr/bin/python2.7" key
="AUDITCONNECT"

The saddr which is what we will use to identify the port is in hex. Luckily we can use ausearch -i to interpret these values to allow us to easily search the contents.

sudo ausearch -k AUDITCONNECT -i | grep -A 1 9042
type=SOCKADDR msg=audit(05/10/2018 19:16:07.785:18) : saddr=inet host:10.200.179.113 serv:9042
type=SYSCALL msg=audit(05/10/2018 19:16:07.785:18) : arch=x86_64 syscall=connect success=no exit=-115(Operation now in progress) a0=0x5 a1=0x7fff1f591780 a2=0x10 a3=0x7fff1f5914d0 items=0 ppid=14913 pid=14914 auid=automaton uid=automaton gid=automaton euid=automaton suid=automaton fsuid=automaton egid=automaton sgid=automaton fsgid=automaton tty=pts0 ses=70 comm=python exe=/usr/bin/python2.7 key=AUDITCONNECT
--
type=SOCKADDR msg=audit(05/10/2018 19:16:38.312:19) : saddr=inet host:10.200.179.113 serv:9042
type=SYSCALL msg=audit(05/10/2018 19:16:38.312:19) : arch=x86_64 syscall=connect success=no exit=-115(Operation now in progress) a0=0x5 a1=0x7ffd5f2e3470 a2=0x10 a3=0x7ffd5f2e31c0 items=0 ppid=14913 pid=14926 auid=automaton uid=automaton gid=automaton euid=automaton suid=automaton fsuid=automaton egid=automaton sgid=automaton fsgid=automaton tty=pts0 ses=70 comm=python exe=/usr/bin/python2.7 key=AUDITCONNECT

This shows that the process making the request (pid) was 14914, and it has a parent process of 14913 (ppid). You can check if either of those processes exist

ps -ef | grep 14913
automat+ 14913 14394  0 19:15 pts/0    00:00:00 /bin/bash ./connect_script.sh

So we are all done right...? Wrong! Unfortunately, life is rarely that simple. Many may be wondering, well what if that process call is nested in many other process calls… We cannot just use the pid or ppid to track down what made the call. The simplest way to demonstrate this is to create another script, call it script.sh again:

#!/bin/bash
cqlsh 10.200.179.113 -e "DESCRIBE FULL SCHEMA"

Add that script to your cron jobs, and set it to go off every minute:

crontab -e
* * * * * /path/to/script.sh

Now both the ppid and the pid found from the AUDITCONNECT key search on port 9042 would not give us any currently running processes.

This is why we also created a monitor for the EXECVE syscalls using the auditd framework under the AUDITEXEC key. As mentioned, execve is called every time we execute a file. Since we set up the AUDITEXEC key already, exceve log entries will contain the executed call along with the associated arguments. Determining the offending program connecting to DSE will depend on how the program is executing the connection, but in general terms, we are going to want to look at the arguments being passed for the processes determined to be connecting on port 9042 from our AUDITCONNECT logs, as denoted by the a0,a1,a2,etc...

sudo ausearch -k AUDITEXEC -i | grep -B 1 15562
type=EXECVE msg=audit(05/25/2018 21:09:01.260:708) : argc=2 a0=/bin/bash a1=/etc/cron.hourly/script.sh
type=SYSCALL msg=audit(05/25/2018 21:09:01.260:708) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x7fa78c34cac8 a1=0x7fa78c34cb00 a2=0x7fa78c34cb10 a3=0x7fa78c33fa10 items=3 ppid=15560 pid=15562 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=script.sh exe=/bin/bash key=AUDITEXEC
--
type=EXECVE msg=audit(05/25/2018 21:09:01.260:711) : argc=5 a0=/bin/sh a1=/usr/bin/cqlsh a2=10.200.175.109 a3=-e a4=DESCRIBE FULL SCHEMA
type=SYSCALL msg=audit(05/25/2018 21:09:01.260:711) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x1731e88 a1=0x1731948 a2=0x1732c08 a3=0x7ffc1093e880 items=3 ppid=15562 pid=15563 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=cqlsh exe=/bin/dash key=AUDITEXEC
--
type=EXECVE msg=audit(05/25/2018 21:09:01.288:714) : argc=5 a0=python a1=/usr/bin/dsecqlsh.py a2=10.200.175.109 a3=-e a4=DESCRIBE FULL SCHEMA
type=SYSCALL msg=audit(05/25/2018 21:09:01.288:714) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x7f96ed3055f0 a1=0x7f96ed305570 a2=0x7f96ed3055a0 a3=0x2 items=2 ppid=15562 pid=15563 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=python exe=/usr/bin/python2.7 key=AUDITEXEC

As seen in the last block above, a Python client would be calling python as the exe, but would still pass in the py file being used as an argument, in this case, a1=/usr/bin/dsecqlsh.py. A java application would be running java as the executable, but passing in the jar to run as an argument.

Hopefully, leveraging the argument information, along with the already known pid and ppid information, should help give enough information to identify what is/was running, and then steps can be taken to either remedy the connection, or stop the application from attempting to connect.