XML Validation with the Java API

Both, Java and XML are spread widely and used intensively. This post sheds some light onto the possibilities on validating XML files with the JAVA API.  In all code listings, the exception handling is omitted as well as the imports. The classes used are from the javax.xml, java.xml.parser and java.xml.validation packages. Moreover, this post focus on simple validation code snippets.

First, we can check if the XML file is well-formed. This can be done by parsing the XML file into a DOM document.

Next, it is possible to validate the XML file against a XML schema. However, in this case we validate the XSD file first to ensure that it is valid itself.

With the valid XSD file we can validate the XML file against this schema.

As quite some XML files do use multiple XML Schemas, the code above will always fail. Therefore, we need to create a Schema which consists of multiple XSD files.

When shipping the XSD files within the jar, it is required to reference them by resource instead.

Converting VSD Drawings to PNGs – VBScript

This VBScript contains the logic to convert each drawing (page) of a specific Microsoft Visio file to a png named after its source page name and saves it to a target directory.

Option Explicit

' constants required for opening files in Visio
Const visOpenRO = 2
Const visOpenMinimized = 16
Const visOpenHidden = 64
Const visOpenMacrosDisabled = 128
Const visOpenNoWorkspace = 256

' constants required for setting ExportSize in Visio
Const visRasterFitToCustomSize = 3
Const visRasterPixel = 0


Sub export(filePath, exportDirectory, widthInPixels, heightInPixels)

    ' open file
    Dim visioApplication : Set visioApplication = CreateObject("Visio.Application")

    ' set export size
    visioApplication.Settings.SetRasterExportSize visRasterFitToCustomSize, widthInPixels, heightInPixels, visRasterPixel

    ' open document in Visio without showing it to the user
    visioApplication.Documents.OpenEx filePath, visOpenRO + visOpenMinimized + visOpenHidden + visOpenMacrosDisabled + visOpenNoWorkspace

    ' iterate over all pages and export each one
    Dim currentItemIndex
    For currentItemIndex = 1 To visioApplication.ActiveDocument.Pages.Count

        Dim currentItem : Set currentItem = visioApplication.ActiveDocument.Pages.Item(currentItemIndex)

        ' use the lowercase name for the file
        Dim exportPath : exportPath = exportDirectory & "\" & LCase(currentItem.Name) & ".png"

        ' export happens here!
    	currentItem.Export exportPath
    Next

    ' Quit Visio
    visioApplication.Quit
End Sub

' current directory
Dim currentDirectory : currentDirectory = CreateObject("Scripting.FileSystemObject").GetAbsolutePathName(".")

' file to open
Dim filePath : filePath = currentDirectory & "\AI - stundenplan.vsd"

' set export directory
Dim exportDirectory : exportDirectory = currentDirectory

export filePath, exportDirectory, 3557, 4114

Annotations about VBScript in general to better understand what is going on in this script.

  • The colon (:) is the statement separator. This can be used to declare and assign a variable in one line.
  • Use Dim NAME : NAME = VALUE for variables referencing not objects
  • Use Set NAME = OBJECT for variables referencing objects
  • Line Comments are started with '
  • No parantheses are allowed for calling Subs (procedures) or Functions
  • Stating Option Explicit at the first line requires each variable to be declared before it can be used
  • To determine what parameters to set, you can use the record macro function in Visio. This button is not directly available in Visio 2010, refer to this guide on how to make it visible.
  • The object explorer of Visio 2010 is very helpful to find the correct functions or procedures.
  • Use the ampersand (&) to concatenate strings

The Java Initializers

In Java, the concepts of classes and instances are the core concepts. A class as well as each instance has variables and methods. To differentiate, the variables and methods corresponding to the class have to be marked static while variables and methods default to the instance (if not marked static). In the following, I focus only on the variables.

The following example shows the declaration for class and instance variables:

public class Variables {
  static String classVariable; // initialized with default value null

  String instanceVariable; // initialized with default value null
}

Class Variables

Class variables are interpreted in the order they appear in the file. I can directly assign values to a class variable (direct initialization) or call a static method which initialises the variable through its return value. This, however, can lead to problems as shown in the next listing:

public class Variables {
  static String directlyInitialized = "class variable";
  static String directlyInitializedWithMethod = init();
  static String anotherDirectlyInitialized = "test";

  private static String init() {
    return anotherDirectlyInitialized;
  }

  public static void main(String[] args) {
    System.out.println(directlyInitializedWithMethod); // prints null
  }
}

This will print out null as the variable anotherDirectlyInitialized is initialized after the directlyInitializedWithMethod is initialized. The compiler does not detect it, it is the responsibility of the programmer to avoid such situations. We could solve this issue by reordering the statements, however, this is an area where we can do errors easily.

There is another alternative, namely, class initializers. These initializers are executed after all static variables have been initialized.

public class Variables {
  static String directlyInitialized = "class variable";
  static String directlyInitializedWithMethod;
  static String anotherDirectlyInitialized = "test";

  static {
    directlyInitializedWithMethod = anotherDirectlyInitialized;
  }

  public static void main(String[] args) {
    System.out.println(directlyInitializedWithMethod); // prints test
  }
}

This static block can contain any complex setup logic. It is executed after all static variables are initialised but before any method call to the class is issued. Good use cases are the computation of constants or preinitialization of other instances which require more than a simple constructor or method call.

Instance Variables

The approach of initializing static variables is also applied to instance variables. I can directly assign values to an instance variable (direct initialization) or call a method which initialises the variable through its return value. This, however, can lead to the same initialization problems.

public class Variables {
  String directlyInitialized = "class variable";
  String directlyInitializedWithMethod = init();
  String anotherDirectlyInitialized = "test";

  private String init() {
    return anotherDirectlyInitialized;
  }

  public static void main(String[] args) {
    InstanceVariables object = new InstanceVariables();
    System.out.println(object.directlyInitializedWithMethod); // prints null
  }
}

However, there is also an initializer construct for instance variables. This construct is called before any constructor is called and allows to initialize instance variables regardless of their ordering.

public class Variables {
  String directlyInitialized = "class variable";
  String directlyInitializedWithMethod;
  String anotherDirectlyInitialized = "test";

  {
    directlyInitializedWithMethod = anotherDirectlyInitialized;
  }

  public static void main(String[] args) {
    InstanceVariables object = new InstanceVariables();
    System.out.println(object.directlyInitializedWithMethod); // prints test
  }
}

Another real use case of these instance initializers is to add values to a HashMap on creation as seen in the next code snippet:

// regular approach
Map<String, String> myMap = new HashMap<>();
myMap.put("DE", "German");
myMap.put("EN", "English");
		
		
Map<String, String> otherMap = new HashMap<String, String>() {
	// anonymous subclass of HashMap

	{
		// instance initializer setting specific values
		this.put("DE", "German");
		this.put("EN", "English");
	}

};

The advantage is, that the latter can be used to initialize a field in a class directly with specific values.

Line Separators

Line separators are a funny problem in itself. Each operating system uses different kind of escape sequences to determine a new line.

  • Windows: \r\n
  • Unix/Linux/Mac OS X: \n
  • pre-OSX Mac: \r

Orignially, the write head of typewriters could only do few tasks. I listed the most important ones below:

  • Write symbol and move the cursor one column to the right.
  • \r | carriage return | CR | Move cursor to the first column of the line.
  • \n | line feed         | LF | Move cursor to the next row/line at the same column.
  • \b | backspace     | BS | Move cursor one column back.
  • \t | horizontal tab  | HT | Move cursor to the next tab-stop.
In order to get the cursor to the beginning of the new line, you needed to use a carriage return followed by a line feed. This behaviour resulted in the \r\n escape sequence of windows. Others simplified it by selecting either \n or \r.

In Java, which is plattform independent, you get the plattform dependent escape sequence by obtaining the system property named line.separator.

String lineSeparator = System.getProperty("line.separator");

Other plattform specific system properties as the path.separator (backslash or slash) can be found at this guide.

Objects – A Not Very Well Known Utility Class In Java

The Java API has some quite useful classes which are not well known. One of them is the Objects class with several static helper methods. Most of them help dealing with null values (which is quite a pain if you have to do it by yourself). The Objects class is available since Java 7.

I really like the requireNonNull methods as you often have to write code that checks for null values. As it can be seen in the code snippet below, the static method requireNonNull is available in two versions. I prefer the one with the additional message as a NullPointerException is not very expressive in itself.

public void solve(String problem) {
    // throws NullPointerException if problem is null
    Objects.requireNonNull(problem);

    // throws NullPointerException with passed text if problem is null
    Objects.requireNonNull(problem,"problem is null");

    // solve problem
}

Null can also be problematic if you want to create a String as it is automatically transformed to an empty String. This is not always desired as null and an empty String cannot be differentiated anymore. The Objects class provides the toString method, which is also available in two versions as it can be seen in the code listing below.

public void printProblem(String problem) {
     // converts null to "null", otherwise calls toString()
     String output = Objects.toString(problem);

     // converts null to "no problem given", otherwise calls toString()
     String anotherOutput = Objects.toString(problem,"no problem given");

     // print problem
}

There are also two methods regarding the generation of hashcodes. Using the Objects.hashcode(Object o) method, the hashcode method of the Object o is called. In case of a null reference, 0 is returned. For computing the hash value for several objects, you can resort to Objects.hash(Object… values) which delegates the hash generation to the Arrays.hashCode method. These methods are not that useful to me as almost always I use an IDE to generate the hashCode method of an Object using its fields for me.

Additionally, the methods named equals and deepEquals help to compare objects for equality taking the null value explicitly into account while compare itself uses a Comparator. I can not see any immediate use for me regarding these methods either, but perhaps an occasion for them may arise.

I recommend that you read through the source code of the Objects.class. It is quite simple and easy to understand. You could write a class with the same functionality by yourself, however, you get this behaviour fully tested and already implemented for free! So use it!

A subtle difference …

I am currently developing with Intellij IDEA Community Edition (which is open source and free as in beer). Intellij supports Java7 very well and according my subjective opinion a little richer on features regarding refactoring or intelligent code suggestions.

Aim

My aim was to load some XML Schema files for validation purposes. Such files use the file ending xsd standing for XML Schema Definition. Now to the fun part:

You can load such files directly using the File class. This is simple, but, when packaging your application as a jar, this does not work anymore if the xsd file is within your jar. You have to reference the xsd file pointing in a jar file. This can be done by leveraging the class path and its package structure using the class loader.

If you have an xsd file name test.xsd in a package called logic along with a java class Solver you can do the following things:

//inside instance methods of a Solver instance
InputStream stream = this.getClass().getResourceAsStream("test.xsd")
//inside static methods of Solver
InputStream stream = Solver.class.getResourceAsStream("test.xsd")

//inside instance methods of any instance
InputStream stream = this.getClass().getResourceAsStream("/logic/test.xsd")
//inside static methods of any class
InputStream stream = AnyClass.class.getResourceAsStream("/logic/test.xsd")

More information on this can be found here.

Problem

And here is the very important information: Intellij IDEA does NOT copy your xsd files automatically to your binary folder. Consequently, using getResource[asStream] will NOT work. In Eclipse, everything worked fine.

Why?

Intellij IDEA has semi-colon separated list of regular expressions for files to be copied which is sold as a feature. This approach uses a white list compared to a black list in Eclipse.

How to solve?

[Settings] -> [Compiler] -> [Resource Patterns] and append ;?*.xsd to the resource pattern input field.

Initializing Arrays in Java

Arrays are widely used in many (almost any) programming languages. In Java, you have to specific the length of an Array at the creation of an Array.

Here is a short example how to initalize an Array containting the numbers 1, 2 and 3:

int[] numbers = new int[3];
numbers[0] = 1;
numbers[1] = 2;
numbers[2] = 3;

In order to abbreviate this, there is a short form which can do this work in one line. This works as Java can determine the length of the array by looking at the number of elements within the curly brackets. At compile time, the short form is translated to the form above within the byte code.

int[] numbers2 = {1,2,3};

Now, there are some quirks to be aware of regarding the short form. You can use this only while defining the array variable. The following will result in a compile error:

int[] numbers2 = {1,2,3};
numbers2 = {4,5,6}; // compile error!

Why does this happen? The error message says: “Array constants can only be used in initializers”. But I am not quite sure, what the meaning behind is. I consider it a quirk of the Java language without any further research regarding this construct. 😉

However, today, I found a trick to circumvent this issue. You can find the solution in the Oracle Java Magazine, Section Fix This.

int[] numbers3 = {1,2,3};
numbers3 = new int[]{4,5,6};

I am not quite sure, why this works. Do you?