500 errors while trying to use Tabula
I was working recently with a PDF in Tabula. Previously, I have used Tabula to extract tables from PDFs. It has worked great. This time, I could not get Tabula to start processing the PDF. Debugging the network request, gave a pointer about a temporary directory issue.
The error message was
(ENOENT) No such file or directory - /tmp, which is a bit cryptic on Windows since it refers to a directory that is unlikely to exist. The error message did not include the full path for this directory nor did it give a file or line number for the error in source.
Since the issue is the AJAX request to
upload.json, I did a search for that in the code. You can see the two files that are returned are
index.html. The latter is the front end UI, so I went to the former file to spot the issue
I had previously dug into the AJAX request to know that it was POSTing a parameter called
files with the relevant info. It would not have mattered because both code paths call
is_valid_pdf with a parameter of
file[:tempfile].path. This is supposed to refer to the path of the uploaded file. In my case, it seems was not getting into the
is_valid_pdf function. I added a line to the top of that function which did not generate output.
Given that, it looked like there was an issue with
Tempfile having a correct directory to output to.
Checking the docs of Tempfile you can see that it uses
Dir.tmpdir as the output directory. This can be controlled by an evironment variable
$TMPDIR. Checking my environment variables, I had nothing set for this. I forced this variable to be equal to the directory
%USERPROFILE%/AppData/Local/Temp/tmp, which I had just created.
With this addition and a restart of Tabula, the files uploaded correctly.