Don't be PASVe with Docker

Today I've had the interesting challenge to help Ester trying to figure out why a Specflow test wasn't behaving as it should. Of course the test ran fine in the CI pipeline but locally, no joy.

Some background

The application that this test is for generates a file from data in a database, which it then uploads to a FTP server. Easy as.

The test we were looking at basically does the following:

  1. Clear the files from the FTP server
  2. Set up test data in the database
  3. Run the application
  4. Verify that the file has been uploaded to the FTP server

Because this test uses an FTP server, Ester already set up a local FileZilla server and configured the test and application to point to that server. So far, so good.

However the test failed at step 1...

Time to investigate!

The start of our journey

Okay, so what failed exactly? The method that clears the files from the FTP server is nothing special, it just lists the files in the directory (which we get from settings) and then deletes them one by one.

The error we saw was that the file we attempted to delete didn't exist. Which is odd as we just got that file from a list (NLST) operation. Looking at the path supplied to the delete command it looked like this:
ftp://localhost/workdir/workdir/somefile.dat apparently the directory workdir was included twice. How did that happen? We certainly didn't put it in the format string twice.

Apparently FileZilla includes the directory name in the result of a NLST operation. So if you do NLST workdir you get workdir/file1, workdir/file2 etc instead of file1, file2. According to djb NLST should return an abbreviated list whereas LIST returns the full path for every file.

So our first problem is a FTP server with weird behavior... nice!

Enter Docker!

Okay, so FileZilla doesn't work for us, let's pick another FTP server. Luckily Nathan has already poked around and created a Docker container ready to use with pure-ftpd. After copying the necessary docker-compose.yml and DOCKERFILE we were good to go:

C:\src\application> docker-compose up  
Traceback (most recent call last):  
...
...
  File "site-packages\docker\transport\npipesocket.py", line 49, in connect
pywintypes.error: (2, 'WaitNamedPipe', 'The system cannot find the file specified.')  
docker-compose returned -1  

Not very encouraging.... what the bleep is WaitNamedPipe anyway?

Some Googling only revealed posts with this issue from at least a year ago. Probably updating Docker couldn't hurt.

Oh how wrong I was...

After downloading the latest Docker version and running the installer, it told me: "Hahaha! I only work on Windows 10!" Great. Okay so I should've read the site more carefully and downloaded the Docker Toolbox instead because that supports Windows 7.

One reboot later...

Error: Unable to connect to system D-Bus (2/3): D-Bus not installed  

Shit.

Then I saw another message that the Virtual Box service could not be found. Interesting... I opened up Virtual Box to see if I could start the Docker VM, that didn't work. Also it told me there was an update available. So on a hunch I updated Virtual Box, thinking that re-installing probably would correct the situation.

Another reboot later...

DEVBOX ~  
$ 

Yay! It works, Docker is up!

DEVBOX ~  
$ docker-compose up
...
...
Attaching to pureftpd-test...  

Awww yis!
By now, everything was working and I was able to connect to the FTP server successfully via IP address 192.168.155.165 (<-- remember this).

Fixing them tests

Okay so now we had a working environment, correctly configured test suite and application to point to the FTP server. Time to run the test again and see if it works.

Hoo-rah!! Test cleanup worked, setup worked, running the application worked. Only the assertion failed... bummer!

Let's see if the file got created on the FTP server... nope. Weird. I thought we had everything working, right? Double checking the configuration it certainly looks that way. Time to debug, launch the application from Visual Studio directly and see if it is behaving weird.

Stepping through the file upload I noticed something weird:

PASV  
227 Entering Passive Mode (127,0,0,1,216,3004)  

127.0.0.1? But wait a minute, we're connecting to 192.168.155.165 aren't we?

No wonder the upload fails. The FTP server is telling the application to connect to localhost but nothing is listening there. What's going on?

It turns out PASV instructs the FTP server to tell the client the port and the IP address to connect to. But because the FTP server thinks it's running on localhost, it will tell that to the client causing a mismatch. Luckily there is RFC-2428 that introduces the EPSV verb which only returns the port number to connect to.

So what causes this address confusion in the first place? Docker maps ports to localhost right? Yes, it does. However because of how Docker works on Windows 7 this doesn't quite work all the way.
Normally a Docker container runs directly on the host OS, which makes mapping the ports and IPs straightforward. 127.0.0.1 in the container is mapped to 127.0.0.1 on the host.
However because Docker on Windows 7 runs inside a VirtualBox Linux VM the host is not Windows but the Linux VM! Also VirtualBox doesn't map on 127.0.0.1 but real IP addresses the mismatch occurs.
This image shows the difference between Docker on Windows 7 vs Windows 10:
Docker nested in VirtualBox nested in host OS

Luckily the PASV vs EPSV issue is easily fixable, we just changed:
ftpClient.DataConnectionType = FtpDataConnectionType.PASV;
to:
ftpClient.DataConnectionType = FtpDataConnectionType.EPSV;

FTP upload: solved!

Fixing them tests (2)

Unfortunately, the test still didn't pass. The file we expected still didn't exist on the FTP server... or does it?

Looking at the code for the assertion it was using a FileHelper class. Wait a minute, FileHelper? We're using a FTP server right?

So it turns out that the test was set up to expect the FTP location to be mapped to a local folder which then gets inspected. Major assumption there! But "no worries" (as our Aussie colleagues would say), we can just map the appropriate volume in the Docker container!

Yes, well, no.

Because Docker on Windows 7 is something of a hacktastic solution, mapping volumes actually doesn't quite work. We've specified the folder in the docker-compose.yml but it kept turning up as not mapped in the Docker container. Apparently we are out of luck here until we switch to Windows 10 (not something I'll do to satisfy a test though).
Update: Although VirtualBox isn't used anymore, the IP address jumble hasn't gone away. (Thanks Pat)

Changing to a different tack: why are we looking at the local file system anyway? Aren't we interested in whether the file exists on the FTP server?

The easy solution was just to use the FtpHelper instead of the FileHelper and hey presto, test passes!

Lessons learned

I think the most important thing to learn from investigating this test failure is to first try and understand what the test attempts to do and, for me personally, first to draw a picture of what is going on. Something like this:
Schematic overview of test case would have helped a lot in figuring out which part(s) of the test were failing and identify the assertion that was checking the local directory instead of the FTP server.

Another important point is that you cannot depend on the behavior of 3rd party systems. Something that became painfully clear with FileZilla vs pure-ftpd for example. I think we need to have a look at the behavior of our 3rd party and their FTP server to ensure we can reliably replicate it in our test environments.

Also where Docker was involved, there is some (arcane) knowledge you need to have when attempting to troubleshoot issues. I feel that the "just use Docker, you're problems will be solved" attitude doesn't always hold up. Docker is a piece of software you'll need to grok in order to use it effectively. I'll definitely be looking at organizing some training/workshop around this in the next few weeks.

Conclusion

Test that shit. Really, do it.