Serialization

From Informatics

Jump to: navigation, search

Serialization Issues

  • One of the main issues with saving geworkbench workspaces on one platform (windows) and opening them on another (linux) was that many of the classes did not have SUIDs specified (there were over 600 classes that did not have this). That is, if the SUID is not specified, it is automatically computed as a hash of various class elements: class name, implemented interfaces, declared nonstatic, nontransient fields, declared nonprivate methods, and so on ... this means that even if you make a compatible change for class, Foo, a new SUID will be created (which will be different from the one stored in the output stream), so upon deserialization of class Foo will most likely get a mismatch between the SUIDs (the SUID of Foo in the data stream and the SUID of class Foo in the virtual machine). Most IDEs will tell you if you are missing this (in eclipse, this shows up as a warning at the class level). Also note that if you are creating inner classes that implement Serializable, the inner class will need its own SUID.
  • Serialization can slow garbage collection. Every time an object is written to an output stream, the stream holds a reference to an object. This means that the program holds live refrences to the objects it has written until the stream is either reset or closed, preventing the objects from being garbage collected. Solution: Only save the entire state when the entire state is available, then close the stream immediately. Since this isn't possible for us, we can call the reset() method which will flush the ObjectOutputStream object's internal cache of objects so they can be garbage collected.

Compatible Changes

  • Changes to constructors and methods. Serialization doesn't touch the methods of a class.
  • Changes to static fields. Serialization ignores static fields.
  • Changes to transient fields. Serialization ignores transient fields.
  • Adding a new fields to a class after serializing an older version of the class. When the older instance is deserialized, the new field will take the default value (0 for numeric, false for boolean, null for object).
  • Adding or removing an interface.

Incompatible Changes

  • Change name of class.
  • Change type or name of an existing field.
  • Change field from non-static to static. Deserialization will expect to deserialize the field, but it cannot because it has been made static.
  • Change superclass.

To help identify compatible vs. incompatible changes, each class can use a version id called the stream unique identifier (SUID). Everytime you release a new version of the class that makes an incompatible change, this id should be changed.

Classes that implement Serializable but Are Not

  • If a class implements Serializable but holds a reference to an object that is not, then the class is also not serializable.
  • If a superclass of a serializable class is not itself serializable the superclass itself should still contain a no-arg constructor. If it does not, you will get a NotSerializableException.

Controlling Serialization

  • Simplest way to control if fields are serialized or not is to set the field to transient. This doesn't, however help you change the format in which the data is stored. To do this you can create methods writeObject() and readObject. Data will be written starting with the highest serializable superclass of the object and continue down the hierarchy. Before the data for each class is written out, the VM will check to see if the class in question has the methods with the signatures:
    • private void writeObject(ObjectOutputStream out) throws IOException
    • private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException
Personal tools