Thursday 2 September 2010

SQLite: Is there anything you CAN'T do?

One of my key areas of focus in preparation for the next beta release of FDO Toolbox 1.0, was to bring the Join implementation up to par. Since all my efforts thus bar has been improving the Bulk Copy functionality, joins have been somewhat left out in the cold in terms of functionality and my attention.

As it stands, the primary problem with the current join implementation is one of performance. Without going into detail the current implementation is one that uses nested loops. This, being a O(m*n) operation will get slower as the size of the data you're working with increases.

In thinking about the different join algorithms available to tackle this problem, I came to the conclusion that implementing what is essentially a core DBMS component is just too much effort, so being the lazy programmer that I am, I've updated the join implementation to delegating the join functionality a DBMS which day-by-day continues to impress me with its amazing performance and capabilities: SQLite

So now, the updated implementation can be described as follows:
  • Create a temporary SQLite database with table structure identical to our "left" and "right" sources.
  • Pump the data from our "left" source into this temp SQLite database
  • Pump the data from our "right" source into this temp SQLite database
  • Create a view in our temp SQLite that encapsulates a join between these "left" and "right" tables
  • Do some metadata hacks to make this view be recognised as a FDO feature class.
  • Pump the data from this view out to our target data store.
How much faster is this approach? I've done some tests by joining the Sheboygan Parcel SHP file (only the bare ID/Geometry attributes, approx. 17600 records) with the corresponding Parcel data in an Access Database (approx. 16100 records):
  • Old approach: 16 minutes
  • SQLite approach: 21 seconds!
That's a 45x performance improvement! Now it is true that this is only a single data point :-), but if a single data point can indicate such a monumental improvement, then I can comfortably say that you will get similar results with different data sources.

I truly <3 SQLite!

2 comments:

Unknown said...

Which are the "metadata hacks to make this view be recognised as a FDO feature class" that you wrote ?

Jackie Ng said...

Sqlite databases created by fdo have a special metadata table called geometry_columns. This table has a row for each table or view that fdo will see as a feature class. To make a view as a feature class we just register this view inside geometry_columns. That's what I mean by metadata hack.