So you’ve coded up your new webapp. It does some cool stuff and then produces files in a binary format (like .pdf, .doc, .mp3) and you need to deliver these files to the browser. So how do you do that?
Random Junk
If you just ‘include’ the file, or you read the file into your script and then print it out to the browser, you just get a jumble of weird characters. The problem is that the file content has been encoded to be read by a particular application and when the browser tries to render it as text, it applies the wrong encoding – so you get the random characters.
Luckily, it’s pretty easy to get a browser to understand that the content you’re sending isn’t supposed to be rendered like HTML. You just have to let the browser know what kind of data you’re sending before you send it, and that’s pretty easy. Setting just one HTTP header, the ‘Content-type’ header, is all that’s needed. Most languages include support for setting a header, like PHP:
header("Content-type: application/pdf");
Setting this header will make most browsers throw up a dialog with options to run the file with an application, or save the file.
PHP Example
Here’s an example you can run, if you have PHP to hand. You’ll need a pdf file to send, and you’ll need to change the filename and path if it’s not in the same location as the PHP file.
### set a header to tell the browser what kind of file I'm about to send header("Content-type: application/pdf"); ### then read in the .pdf file $pdfLocation = "my_pdf_file.pdf"; $pdf = file_get_contents($pdfLocation); ### and the file out to the browser echo $pdf;
All good so far, but you might be wondering about that ‘application/pdf’ bit. It’s a MIME (Multipurpose Internet Mail Extension) type, a code that provides a key for the applications that will be able to handle encoded data. IANA, the organisation that manages the types provides a reference for all registered types here.
A Word on Encoding
When sending the file content, you’ll need to make sure that there’s no character encoding applied, because the data you’re dealing with isn’t a set of characters. Swapping stuff that just happens to look like a Windows-style newline for a UNIX-style newline is going to mess things up too. In the examples the file reading and writing is going on in binary mode, or using input/output streams instead of the usually more convenient character based stuff – to preserve the data from the file exactly as it was in the file.
When the dialog appears to run or save the file, the suggested filename is the name of the PHP script doing the work, not the name of the file – so another HTTP header that can come in useful is Content-Disposition. Amongst other things, this header lets you specify a filename for the file to be saved – however – although I’ve not had any issues myself, but I did see a blog post on IE compatibility problems.
header("Content-Disposition: inline; filename=filename.pdf";);
Well, hope that’s of some use to you. I’ll finish up by providing a couple of simple implementations in other languages. The examples should run in an appropriate web environment without any special support.
PERL example
No helper method to set the header here, so we have to manually print the header in the right format, including a following blank line.
#!/path/to/perl # set HTTP header to tell browser what type of data to expect # in this case, a PDF file print "Content-type: application/pdfnn"; # then open the pdf file $pdfLocation = "my_pdf_file.pdf"; $pdfBuffer = open(PDF, "<$pdfLocation"); # set to work in binary mode binmode(PDF); # read 64k chunks of the source file and write # them to the output stream while (read (PDF, $pdfBuffer, 65536)) { print $pdfBuffer; }
Java Servlet Example
It’s long-winded and untidy to do this in one Servlet class but hey.
package com.crossedstreams.blog.post6; import java.io.IOException; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import java.io.FileInputStream; import java.io.OutputStream; /** * Servlet implementation class PdfServlet * reads a PDF file on the local filesystem and pushes it out to the browser */ public class PdfServlet extends HttpServlet { /** * Sends a PDF file in response to HTTP GET request */ @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // set HTTP header to tell browser what type of data to expect // in this case, a PDF file response.setContentType("application/pdf"); // and give the browser a filename to save as response.setHeader("Content-disposition", "inline; filename=filename.pdf"); // then open the pdf file String pdfLocation = "/path/to/my_pdf_file.pdf"; FileInputStream pdfFile = new FileInputStream(pdfLocation); // get the output stream back to the client OutputStream out = response.getOutputStream(); byte[] pdfBytes = new byte[65536]; try { // read 64k chunks of the source file and write // them to the output stream while (pdfFile.read(pdfBytes) != -1) { out.write(pdfBytes); } // all done, make sure the data has all been sent out.flush(); } finally { // tidy up, even if there were errors try { pdfFile.close(); out.close(); } catch (IOException e) { // I have to catch this exception // ...not a lot I can do with it though } } } }
while (pdfFile.read(pdfBytes) != -1) {
out.write(pdfBytes);
}
Careful with that, if you read less than 64k, you end up writing garbage in the stream.