HttpClient and FileUpload

August 2004

Discuss this Article


Adapted from:
Pro Jakarta Commons, by Harshad Oak
Publisher: Apress
ISBN: 1590592832

All communication over the Internet happens using a standard set of protocols, such as File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Post Office Protocol (POP), Hypertext Transfer Protocol (HTTP), and so on. HTTP is one of the most popular of these protocols and is integral to the World Wide Web. It is also a protocol that many of today’s applications have to be aware of and use wisely. If a Java application has to interact using HTTP, the Commons HttpClient component can make things a bit easier. Using this component, you do not have to worry about all the technicalities of the protocol but just concern yourself with the various classes and methods provided by the HttpClient component. In this article you will have a look at the capabilities of the HttpClient component and also some hands-on examples.

You will also have a quick look at the FileUpload component, which simplifies file-upload tasks on the server side. Finally, you will work through an example where you use HttpClient and FileUpload together.

NOTE For all server-based examples in this article, I have used Tomcat version 4.0.6; however, you should not have any problems if you use another server that supports servlets and Java Server Page (JSP) technology.

Table 9-1 shows the details for the components covered in this article.

Table 9-1. Component Details

Name Version Package
HttpClient 2.0-rc1 org.apache.commons.httpclient
FileUpload 1.0 org.apache.commons.fileupload


Introducing HttpClient

HttpClient is an attempt to provide a simple Application Programming Interface (API) that Java developers can use to create code that communicates over HTTP. If you are developing a Web browser or just an application that occasionally needs some data from the Web, HttpClient can help you develop the client code that will communicate over HTTP. As the name suggests, HttpClient is meant only for HTTP client code and cannot be used to, say, develop a server that processes HTTP requests.

I recommend you use HttpClient instead of using the java.net classes because HttpClient is easier to work with, it supports some HTTP features that java.net does not, and it has a vibrant community backing it. Visit http://www.nogoop.com/product_16.html#compare to compare HttpClient, java.net, and a couple of other APIs.

With a number of Commons components, the Javadocs are the only real documentation that exists. However, with HttpClient some good documentation exists beyond the Javadocs. A short tutorial at http://jakarta.apache.org/commons/httpclient/tutorial.html can get you started with HttpClient.

These are some of the important features of HttpClient:

You will now see the various elements of the HttpClient component and how they fall into place to get you talking over HTTP.

Using HttpClient

HttpClient uses the Commons Logging component, so the only dependency for HttpClient to work properly is that the commons-logging component Java Archive (JAR) file be present. Using HttpClient to handle most requirements is fairly simple. You just need to understand a few key classes and interfaces. The following sections present a simple example of sending a GET request and then explain the classes that play a role in how the example works.

Using the GET Method

The GET method is the most common method used to send an HTTP request. Every time you click a hyperlink, you send an HTTP request using the GET method. You will now see an example of sending a GET request using HttpClient based Java code. The code in Listing 9-1 sends a GET request to the URL http://localhost:8080/validatorStrutsApp/userInfo.do and has three request parameters, firstname, lastname, and email.

Listing 9-1. HttpClientTrial
package com.commonsbook.chap9;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.GetMethod;

public class SubmitHttpForm {

    private static String url =
         "http://localhost:8080/validatorStrutsApp/userInfo.do";

    public static void main(String[] args) {

        //Instantiate an HttpClient
        HttpClient client = new HttpClient();

        //Instantiate a GET HTTP method
        HttpMethod method = new GetMethod(url);

        //Define name-value pairs to set into the QueryString
        NameValuePair nvp1= new NameValuePair("firstName","fname");
        NameValuePair nvp2= new NameValuePair("lastName","lname");
        NameValuePair nvp3= new NameValuePair("email","email@email.com");

        method.setQueryString(new NameValuePair[]{nvp1,nvp2, nvp3});

        try{
            int statusCode = client.executeMethod(method);

            System.out.println("QueryString>>> "+method.getQueryString());
            System.out.println("Status Text>>>"
                  +HttpStatus.getStatusText(statusCode));

            //Get data as a String
            System.out.println(method.getResponseBodyAsString());

            //OR as a byte array
            byte [] res  = method.getResponseBody();

            //write to file
            FileOutputStream fos= new FileOutputStream("donepage.html");
            fos.write(res);

            //release connection
            method.releaseConnection();
        }
        catch(IOException e) {
            e.printStackTrace();
        }
    }
}

The output on executing this piece of code will depend on the response you get to your GET request.

The following steps take place in the class SubmitHttpForm to invoke the URL specified, including passing the three parameters as part of the query string, displaying the response, and writing the response to a file:

  1. You first need to instantiate the HttpClient, and because you have specified no parameters to the constructor, by default the org.apache.commons.httpclient.SimpleHttpConnectionManager class is used to create a new HttpClient. To use a different ConnectionManager, you can specify any class implementing the interface org.apache.commons.httpclient.HttpConnectionManager as a parameter to the constructor. You can use the MultiThreadedHttpConnectionManager connection manager if more than one thread is likely to use the HttpClient. The code would then be new HttpClient(new MultiThreadedHttpConnectionManager()).
  2. Next you create an instance of HttpMethod. Because HttpClient provides implementations for all HTTP methods, you could very well have chosen an instance of PostMethod instead of GetMethod. Because you are using an HttpMethod reference and not a reference of an implementation class such as GetMethod or PostMethod, you intend to use no special features provided by implementations such as GetMethod or PostMethod.
  3. You define name/value pairs and then set an array of those name/value pairs into the query string.
  4. Once the groundwork is complete, you execute the method using the HttpClient instance you created in step 1. The response code returned is based on the success or failure of the execution.
  5. You get the response body both as a string and as a byte array. The response is printed to the console and also written to a file named donepage.html.

    NOTE The class org.apache.commons.httpclient.HttpStatus defines static int variables that map to HTTP status codes.

In this example, you can see how easily you can fire a request and get a response over HTTP using the HttpClient component. You might have noted that writing such code has a lot of potential to enable testing of Web applications quickly and to even load test them. This has led to HttpClient being used in popular testing framework such as Jakarta Cactus, HTMLUnit, and so on. You can find in the documentation a list of popular applications that use HttpClient.

You used the GET method to send name/value pairs as part of a request. However, the GET method cannot always serve your purpose, and in some cases using the POST method is a better option.

Using the POST Method

Listing 9-2 shows an example where you enclose an Extensible Markup Language (XML) file within a request and send it using the POST method to a JSP named GetRequest.jsp. The JSP will just print the request headers it receives. These headers will show if the request got across properly.

Listing 9-2. Sending an XML File Using the POST Method
package com.commonsbook.chap9;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.PostMethod;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class PostAFile {
    private static String url =
         "http://localhost:8080/HttpServerSideApp/GetRequest.jsp";

    public static void main(String[] args) throws IOException {
        HttpClient client = new HttpClient();
        PostMethod postMethod = new PostMethod(url);

        client.setConnectionTimeout(8000);

        // Send any XML file as the body of the POST request
        File f = new File("students.xml");
        System.out.println("File Length = " + f.length());

        postMethod.setRequestBody(new FileInputStream(f));
        postMethod.setRequestHeader("Content-type",
            "text/xml; charset=ISO-8859-1");

        int statusCode1 = client.executeMethod(postMethod);

        System.out.println("statusLine>>>" + postMethod.getStatusLine());
        postMethod.releaseConnection();
    }
}

In Listing 9-2, I have stated the URL for GetRequest.jsp using a server I am running locally on port 8080. This URL will vary based on the server where the JSP is being maintained. In this example, you create an instance of the classes HttpClient and PostMethod. You set the connection timeout for the HTTP connection to 3,000 milliseconds and then set an XML file into the request body. I am using a file named students.xml however, the contents of the file are not relevant to the example, and you could very well use any other file. Because you are sending an XML file, you also set the Content-Type header to state the format and the character set. GetRequest.jsp contains only a scriptlet that prints the request headers. The contents of the JSP are as follows:

<%
        java.util.Enumeration e= request.getHeaderNames();
        while (e.hasMoreElements()) {
          String headerName=(String)e.nextElement();
          System.out.println(headerName +" = "+request.getHeader(headerName));
        }
%>

Upon executing the class PostAFile, the JSP gets invoked, and the output displayed on the server console is as follows:

content-type = text/xml; charset=ISO-8859-1
user-agent = Jakarta Commons-HttpClient/2.0rc1
host = localhost:8080
content-length = 279

The output shown on the console where the PostAFile class was executed is as follows:

File Length = 279
statusLine>>>HTTP/1.1 200 OK

Note that the output on the server shows the content length as 279 (bytes), the same as the length of the file students.xml that is shown on the application console. Because you are not invoking the JSP using any browser, the User-Agent header that normally states the browser specifics shows the HttpClient version being used instead.

NOTE In this example, you sent a single file over HTTP. To upload multiple files, the MultipartPostMethod class is a better alternative. You will look at it later in the “Introducing FileUpload” section.

Managing Cookies

HttpClient provides cookie management features that can be particularly useful to test the way an application handles cookies. Listing 9-3 shows an example where you use HttpClient to add a cookie to a request and also to list details of cookies set by the JSP you invoke using the HttpClient code.

The HttpState class plays an important role while working with cookies. The HttpState class works as a container for HTTP attributes such as cookies that can persist from one request to another. When you normally surf the Web, the Web browser is what stores the HTTP attributes.

Listing 9-3. CookiesTrial.java
package com.commonsbook.chap9;
import org.apache.commons.httpclient.Cookie;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpState;
import org.apache.commons.httpclient.cookie.CookiePolicy;
import org.apache.commons.httpclient.methods.GetMethod;

public class CookiesTrial {

    private static String url =
         "http://127.0.0.1:8080/HttpServerSideApp/CookieMgt.jsp";

    public static void main(String[] args) throws Exception {

        //A new cookie for the domain 127.0.0.1
        //Cookie Name= ABCD   Value=00000   Path=/  MaxAge=-1   Secure=False
        Cookie mycookie = new Cookie("127.0.0.1", "ABCD", "00000", "/", -1, false);

        //Create a new HttpState container
        HttpState initialState = new HttpState();
        initialState.addCookie(mycookie);

        //Set to COMPATIBILITY for it to work in as many cases as possible
        initialState.setCookiePolicy(CookiePolicy.COMPATIBILITY);
        //create new client
        HttpClient httpclient = new HttpClient();
        //set the HttpState for the client
        httpclient.setState(initialState);

        GetMethod getMethod = new GetMethod(url);
        //Execute a GET method
        int result = httpclient.executeMethod(getMethod);

        System.out.println("statusLine>>>"+getMethod.getStatusLine());

        //Get cookies stored in the HttpState for this instance of HttpClient
        Cookie[] cookies = httpclient.getState().getCookies();

        for (int i = 0; i < cookies.length; i++) {
            System.out.println("\nCookieName="+cookies[i].getName());
            System.out.println("Value="+cookies[i].getValue());
            System.out.println("Domain="+cookies[i].getDomain());
        }

        getMethod.releaseConnection();
    }
}

In Listing 9-3, you use the HttpState instance to store a new cookie and then associate this instance with the HttpClient instance. You then invoke CookieMgt.jsp. This JSP is meant to print the cookies it finds in the request and then add a cookie of its own. The JSP code is as follows:

<%
        Cookie[] cookies= request.getCookies();

        for (int i = 0; i < cookies.length; i++) {
          System.out.println(cookies[i].getName() +" = "+cookies[i].getValue());
        }

        //Add a new cookie
        response.addCookie(new Cookie("XYZ","12345"));
%>

CAUTION HttpClient code uses the class org.apache.commons.httpclient.Cookie, and JSP and servlet code uses the class javax.servlet.http.Cookie.

The output on the application console upon executing the CookiesTrial class and invoking CookieMgt.jsp is as follows:

statusLine>>>HTTP/1.1 200 OK

CookieName=ABCD
Value=00000
Domain=127.0.0.1

CookieName=XYZ
Value=12345
Domain=127.0.0.1

CookieName=JSESSIONID
Value=C46581331881A84483F0004390F94508
Domain=127.0.0.1

In this output, note that although the cookie named ABCD has been created from CookiesTrial, the other cookie named XYZ is the one inserted by the JSP code. The cookie named JSESSIONID is meant for session tracking and gets created upon invoking the JSP. The output as displayed on the console of the server when the JSP is executed is as follows:

ABCD = 00000

This shows that when CookieMgt.jsp receives the request from the CookiesTrial class, the cookie named ABCD was the only cookie that existed. The sidebar “HTTPS and Proxy Servers” shows how you should handle requests over HTTPS and configure your client to go through a proxy.

HTTPS and Proxy Servers

Using HttpClient to try out URLs that involve HTTPS is the same as with ordinary URLs. Just state https://… as your URL, and it should work fine. You only need to have Java Secure Socket Extension (JSSE) running properly on your machine. JSSE ships as a part of Java Software Development Kit (JSDK) 1.4 and higher and does not require any separate download and installation.

If you have to go through a proxy server, introduce the following piece of code. Replace PROXYHOST with the host name and replace 9999 with the port number for your proxy server:

	  HttpClient client = new HttpClient();
HostConfiguration hConf= client.getHostConfiguration();
hConf.setProxy("PROXYHOST ", 9999);
If you also need to specify a username password for the proxy, you can do this using the setProxyCredentials method of the class HttpState. This method takes a Credentials object as a parameter. Credentials is a marker interface that has no methods and has a single implementation UsernamePasswordCredentials. You can use this class to create a Credentials object that holds the username and password required for Basic authentication.

You will now see the HttpClient component’s capability to use MultipartPostMethod to upload multiple files. You will look at this in tandem with the Commons FileUpload component. This Commons component is specifically meant to handle the server-side tasks associated with file uploads.

Introducing FileUpload

The FileUpload component has the capability of simplifying the handling of files uploaded to a server. Note that the FileUpload component is meant for use on the server side; in other words, it handles where the files are being uploaded to—not the client side where the files are uploaded from. Uploading files from an HTML form is pretty simple; however, handling these files when they get to the server is not that simple. If you want to apply any rules and store these files based on those rules, things get more difficult.

The FileUpload component remedies this situation, and in very few lines of code you can easily manage the files uploaded and store them in appropriate locations. You will now see an example where you upload some files first using a standard HTML form and then using HttpClient code.

Using HTML File Upload

The commonly used methodology to upload files is to have an HTML form where you define the files you want to upload. A common example of this HTML interface is the Web page you encounter when you want to attach files to an email while using any of the popular Web mail services.

In this example, you will create a simple HTML page where you provide for three files to be uploaded. Listing 9-4 shows the HTML for this page. Note that the enctype attribute for the form has the value multipart/form-data, and the input tag used is of type file. Based on the value of the action attribute, on form submission, the data is sent to ProcessFileUpload.jsp.

Listing 9-4. UploadFiles.html
<HTML>
  <HEAD>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252"/>
    <TITLE>File Upload Page</TITLE>
  </HEAD>
  <BODY>Upload Files
    <FORM name="filesForm" action="ProcessFileUpload.jsp"
    method="post" enctype="multipart/form-data">
        File 1:<input type="file" name="file1"/><br/>
        File 2:<input type="file" name="file2"/><br/>
        File 3:<input type="file" name="file3"/><br/>
        <input type="submit" name="Submit" value="Upload Files"/>
    </FORM>
  </BODY>
</HTML>

You can use a servlet to handle the file upload. I have used JSP to minimize the code you need to write. The task that the JSP has to accomplish is to pick up the files that are sent as part of the request and store these files on the server. In the JSP, instead of displaying the result of the upload in the Web browser, I have chosen to print messages on the server console so that you can use this same JSP when it is not invoked through an HTML form but by using HttpClient-based code.

Listing 9-5 shows the JSP code. Note the code that checks whether the item is a form field. This check is required because the Submit button contents are also sent as part of the request, and you want to distinguish between this data and the files that are part of the request. You have set the maximum file size to 1,000,000 bytes using the setSizeMax method.

Listing 9-5. ProcessFileUpload.jsp
<%@ page contentType="text/html;charset=windows-1252"%>
<%@ page import="org.apache.commons.fileupload.DiskFileUpload"%>
<%@ page import="org.apache.commons.fileupload.FileItem"%>
<%@ page import="java.util.List"%>
<%@ page import="java.util.Iterator"%>
<%@ page import="java.io.File"%>
html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Process File Upload</title>
</head>
<%
        System.out.println("Content Type ="+request.getContentType());

        DiskFileUpload fu = new DiskFileUpload();
        // If file size exceeds, a FileUploadException will be thrown
        fu.setSizeMax(1000000);

        List fileItems = fu.parseRequest(request);
        Iterator itr = fileItems.iterator();

        while(itr.hasNext()) {
          FileItem fi = (FileItem)itr.next();

          //Check if not form field so as to only handle the file inputs
          //else condition handles the submit button input
          if(!fi.isFormField()) {
            System.out.println("\nNAME: "+fi.getName());
            System.out.println("SIZE: "+fi.getSize());
            //System.out.println(fi.getOutputStream().toString());
            File fNew= new File(application.getRealPath("/"), fi.getName());

            System.out.println(fNew.getAbsolutePath());
            fi.write(fNew);
          }
          else {
            System.out.println("Field ="+fi.getFieldName());
          }
        }
%>
<body>
Upload Successful!!
</body>
</html>

CAUTION With FileUpload 1.0 I found that when the form was submitted using Opera version 7.11, the getName method of the class FileItem returns just the name of the file. However, if the form is submitted using Internet Explorer 5.5, the filename along with its entire path is returned by the same method. This can cause some problems.

To run this example, you can use any three files, as the contents of the files are not important. Upon submitting the form using Opera and uploading three random XML files, the output I got on the Tomcat server console was as follows:

Content Type =multipart/form-data; boundary=----------rz7ZNYDVpN1To8L73sZ6OE

NAME: academy.xml
SIZE: 951
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\academy.xml

NAME: academyRules.xml
SIZE: 1211
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\academyRules.xml

NAME: students.xml
SIZE: 279
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\students.xml
Field =Submit
However, when submitting this same form using Internet Explorer 5.5, the output on the server console was as follows:
Content Type =multipart/form-data; boundary=---------------------------7d3bb1de0
2e4

NAME: D:\temp\academy.xml
SIZE: 951
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\D:\temp\academy.xml

The browser displayed the following message: “The requested resource (D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\D:\temp\academy.xml (The filename, directory name, or volume label syntax is incorrect)) is not available.”

This contrasting behavior on different browsers can cause problems. One workaround that I found in an article at http://www.onjava.com/pub/a/onjava/2003/06/25/commons.html is to first create a file reference with whatever is supplied by the getName method and then create a new file reference using the name returned by the earlier file reference. Therefore, you can insert the following code to have your code work with both browsers (I wonder who the guilty party is…blaming Microsoft is always the easy way out)

File tempFileRef  = new File(fi.getName());
File fNew = new File(application.getRealPath("/"),tempFileRef.getName());

In this section, you uploaded files using a standard HTML form mechanism. However, often a need arises to be able to upload files from within your Java code, without any browser or form coming into the picture. In the next section, you will look at HttpClient-based file upload.

Using HttpClient-Based FileUpload

Earlier in the article you saw some of the capabilities of the HttpClient component. One capability I did not cover was its ability to send multipart requests. In this section, you will use this capability to upload a few files to the same JSP that you used for uploads using HTML.

The class org.apache.commons.httpclient.methods.MultipartPostMethod provides the multipart method capability to send multipart-encoded forms, and the package org.apache.commons.httpclient.methods.multipart has the support classes required. Sending a multipart form using HttpClient is quite simple. In the code in Listing 9-6, you send three files to ProcessFileUpload.jsp.

Listing 9-6. HttpMultiPartFileUpload.java
package com.commonsbook.chap9;
import java.io.File;
import java.io.IOException;

import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.MultipartPostMethod;

public class HttpMultiPartFileUpload {
    private static String url =
      "http://localhost:8080/HttpServerSideApp/ProcessFileUpload.jsp";

    public static void main(String[] args) throws IOException {
        HttpClient client = new HttpClient();
        MultipartPostMethod mPost = new MultipartPostMethod(url);
        client.setConnectionTimeout(8000);

        // Send any XML file as the body of the POST request
        File f1 = new File("students.xml");
        File f2 = new File("academy.xml");
        File f3 = new File("academyRules.xml");

        System.out.println("File1 Length = " + f1.length());
        System.out.println("File2 Length = " + f2.length());
        System.out.println("File3 Length = " + f3.length());

        mPost.addParameter(f1.getName(), f1);
        mPost.addParameter(f2.getName(), f2);
        mPost.addParameter(f3.getName(), f3);

        int statusCode1 = client.executeMethod(mPost);

        System.out.println("statusLine>>>" + mPost.getStatusLine());
        mPost.releaseConnection();
    }
}

In this code, you just add the files as parameters and execute the method. The ProcessFileUpload.jsp file gets invoked, and the output is as follows:

Content Type =multipart/form-data; boundary=----------------31415926535897932384
6

NAME: students.xml
SIZE: 279
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\students.xml

NAME: academy.xml
SIZE: 951
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\academy.xml

NAME: academyRules.xml
SIZE: 1211
D:\javaGizmos\jakarta-tomcat-4.0.1\webapps\HttpServerSideApp\academyRules.xml

Thus, file uploads on the server side become quite a simple task if you are using the Commons FileUpload component.

Summary

In this article, you saw the HttpClient and FileUpload components. Although HttpClient can be useful in any kind of applications that use HTTP for communication, the FileUpload component has a much more specific scope. One important plus for HttpClient is the existence of a decent user guide and tutorial. The FileUpload component can be just what you are looking for if you are wondering what to use and how to manage files uploaded through your application.

PRINTER FRIENDLY VERSION