Monday, November 27, 2006

db4o for Web Applications and Managing Object Identity

Time and time again, people ask about how to best manage disconnected object identity when using db4o. This is primarily a problem in web applications or web services since objects tend to live much longer than the connection to the database. db4o only "knows" about an object if it's in the same virtual machine and the connection to the database remains open. In fact, this is probably the number one problem I had when I first started to use db4o since web apps and web services was what I did. So I'm going to present you with some tips and a way to deal with this problem.

1: Keep Your Object Model Very Simple

Keep It Simple, Stupid. Words to live by. Whoever coined K.I.S.S. deserves a big pat on the back. In any case, the point here is to keep everything simple. Here are a few tips:

  • Don't build huge, complex object graphs just because you can. It doesn't make you smarter.
  • Keep business logic out of your objects and stick to POJO's (I don't know what the .NET camp calls them, PO.NO's?).
  • Don't overuse inheritance or interfaces (trust me, when it comes to web services, simple concrete classes are the easiest way to go)
  • Your object model does not have to be fully connected.

These tips apply to your data objects that will be persisted and passed around, not your application logic.

2: Consider Performance

Use common sense for this one. Ask yourself questions like: What will the performance be like if I send an object across the line that has a collection of 10,000 other objects? Do I really need all those objects at the same time or would it make more sense to query for subsets when I really need them? And what about memory consumption??

3: References to Other Objects by Identifiers Sometimes DOES Make Sense

For instance, lets say you are a large online retailer with 100 product categories and each category has 10,000 products which in turn have options like colors, sizes, reviews, ratings, etc. You may be tempted to just load up your Store object which has a Collection of Categories, which has a Collection of Products, which has a Collection of Reviews, and so on and so forth. Very bad idea.

Instead, try one way references from the lower item in the hierarchy to it's parent, for instance Product might have a Category field. Or even just a categoryId field:

class Product { int categoryId; }

Then when you want the Products in a category, you simply query for them based on the categoryId and you gain the advantage of being able to get a smaller subset of the original Collection by applying more constraints to the query which you couldn't do with a Collection:

query.constraint(Product.class);
query.descend("categoryId").constrain(categoryId);
query.descend("color").constrain("red"); // finer control

(I can already feel the flames from the OODB purists for this one).

Another prime example of why this is good is for web services. Lets say for instance there is a web service method call like this:

List getCategoryStats(Category category, Date startDate, Date endDate);

The method getCategoryStats returns a list of statistics such as category views, product views, purchases, etc for each day over the date range. Now if Category has a collection of 10,000 products that is has to send for every call to this service, it would be very inefficent! With just a simple Category object, it would be much faster. Or better yet, how about just sending the Category id:

List getCategoryStats(int categoryId, Date startDate, Date endDate);

4: Assigning ID's

I generally use a wrapper class around ObjectContainer that allows me to do some extra processing before or after an ObjectContainer call. This wrapper implements an interface that is pretty much the same as ObjectContainer:

public interface Db {
void set(Object ob);
void close();
Query query();
void delete(Object ob);
boolean isClosed();
Object refresh(Object ob);
void commit();
boolean isAttached(Object obj);
}

The wrapper class itself looks like the following:


public class DbWrapper implements Db {

private ObjectContainer db;
private static Map idMap = new HashMap();

public DbWrapper(ObjectContainer db) {
this.db = db;
}

public void set(Object ob) {
if (ob instanceof Idable) {
Idable idOb = (Idable) ob;
assignId(db, idOb);
}
db.set(ob);
}

public void close() {
db.close();
}

public void commit(){
db.commit();
}

public boolean isAttached(Object obj) {
return db.ext().isActive(obj);
}

private static synchronized void assignId(ObjectContainer db, Idable idOb) {
if (idOb.getId() != null) return; // don't set if it already has been
Integer counter = (Integer) idMap.get(idOb.getClass().getName());
if (counter == null) {
counter = getMax(db, idOb);
}
counter++;
idMap.put(idOb.getClass().getName(), counter);
idOb.setId(counter);
}

private static Integer getMax(ObjectContainer db, Idable idOb) {
Query q = db.query();
q.constrain(idOb.getClass());
q.descend("id").orderDescending();
ObjectSet result = q.execute();
while (result.hasNext()) {
Idable ob = (Idable) result.next();
Integer ret = ob.getId();
if(ret == null) ret = 0;
return ret;
}
return 0;
}

public Query query() {
return db.query();
}

public void delete(Object ob) {
db.delete(ob);
}

public boolean isClosed() {
return db.ext().isClosed();
}

public Object refresh(Object ob) {
if (ob instanceof Idable) {
Idable idable = (Idable) ob;
Object ret = null;
Query q = db.query();
q.constrain(idable.getClass());
q.descend("id").constrain(idable.getId());
ObjectSet result = q.execute();
if (result.hasNext()) {
ret = result.next();
}
return ret;
} else {
throw new RuntimeException("Object " + ob + " does not implement Idable, cannot refresh.");
}
}
public ObjectContainer getObjectContainer() {
return db;
}
}

As you can see in the set(Object) method, it will assign an ID to any object that implements Idable:

public interface Idable {
public Integer getId();
public void setId(Integer id);
}

There are some issues with this for instance it will not assign ID's down the object graph and you can only use this if connecting to the database from a single application. You could change this to traverse the graph, but what I usually do is just call set() on each object immediately after I create it so it gets an ID assigned.

Also, notice the refresh(Object o) method. This will re-get your object fresh from the db and at the same time, attach it to the database session so it will also be in db4o's reference system. If you are using the connection-per-request pattern, it may be a good idea to do a refresh at the start of each request before changes are applied.

An easy way to start using the code above is to modify db4oUtil from here and change getObjectContainer() to getDb() and have it return a Db instance instead of ObjectContainer.

This is far from the best solution, but it works for a lot of cases. There are other solutions too for instance you can use external callbacks. You can also vote for COR-161 in JIRA which is for putting ID support natively in db4o.