Byron Wall bio photo

Byron Wall

Avid programmer and chemical engineer in Indianapolis, IN

Email LinkedIn Github Stackoverflow Kaggle

500 errors while trying to use Tabula

I was working recently with a PDF in Tabula. Previously, I have used Tabula to extract tables from PDFs. It has worked great. This time, I could not get Tabula to start processing the PDF. Debugging the network request, gave a pointer about a temporary directory issue.

Tabula upload error

The error message was (ENOENT) No such file or directory - /tmp, which is a bit cryptic on Windows since it refers to a directory that is unlikely to exist. The error message did not include the full path for this directory nor did it give a file or line number for the error in source.

Since the issue is the AJAX request to upload.json, I did a search for that in the code. You can see the two files that are returned are tabula_web.rb and index.html. The latter is the front end UI, so I went to the former file to spot the issue

I had previously dug into the AJAX request to know that it was POSTing a parameter called files with the relevant info. It would not have mattered because both code paths call is_valid_pdf with a parameter of file[:tempfile].path. This is supposed to refer to the path of the uploaded file. In my case, it seems was not getting into the is_valid_pdf function. I added a line to the top of that function which did not generate output.

Given that, it looked like there was an issue with Tempfile having a correct directory to output to.

Checking the docs of Tempfile you can see that it uses Dir.tmpdir as the output directory. This can be controlled by an evironment variable $TMPDIR. Checking my environment variables, I had nothing set for this. I forced this variable to be equal to the directory %USERPROFILE%/AppData/Local/Temp/tmp, which I had just created.

With this addition and a restart of Tabula, the files uploaded correctly.