Time for an explanation. This screenshot was the result of an experimental mg-desktop addition to make feature joins perform much faster (around 100 times faster as evidenced by the screenshot), by taking advantage of new FDO Join APIs that were recently introduced with the 3.6 release. Why does the FDO Join APIs perform so much faster? Time for a little backstory...
Why Feature Joins suck (performance wise)
Feature Joins are performed by the GWS Query Engine component of MapGuide. From a high-level overview, this component takes two input iterators (with interfaces similar to a FDO/MG Feature Reader), and depending on the capabilities of the left and right sides of the join a "merged" feature reader is returned that will loop the two iterators in the following fashion:
- Nested Loops (both sides do not support ordering)
- Nested block join and sorted block (left side supports ordering, right side doesn't)
- Sort-Merge (both sides support ordering)
- Batch Sorted Block (left side does not support ordering, right side does)
MapGuide in its current form is not aware of join factors which impact performance:
- The size of both sides of the join (Nested loops is obviously bad, but the inability to determine size means this can't be avoided if this is the designated algorithm)
- The spread of join key values (Sorted blocks work best on lookup-value type spreads not foreign-key type spreads)
The FDO Join API (or: Let the data store figure it out)
So it's clear that MapGuide does a terrible and/or erratic job as a query optimizer, so assuming we want respectable performance of joined data, you would've probably avoided Feature Joins altogether and have the bits you want to join in question residing in the same centralised RDBMS, do the joins at the database level and wrap the result in a database view.
This approach is a good solution in terms of performance because figuring out the best way to join data is now a data store responsibility, and you can apply a whole assortment of indexes and other DBMS performance improvements that MapGuide is not aware of. From MapGuide's perspective all it sees is a table/view and it just queries that. But this approach presents its own problems:
- The need for metadata hacks for views to be properly recognised as a feature class (eg. SDO_GEOM_METADATA in Oracle)
- The type of view affecting selectability (eg. SQL Server, which requires schema overrides to work around)
Taking advantage of FDO Joins in MapGuide (with RFC123)
So as already explained, work on bringing support for this into MapGuide started with mg-desktop. Being essentially a self-contained version of MapGuide meant that mg-desktop was a nice sandbox for playing and experimenting with new features. So once support was built into mg-desktop, bringing it over to MapGuide proper was quite an easy affair.
So how does mg-desktop and MapGuide take advantage of FDO Joins?
We use the existing Extended Feature Class mechanism to construct an equivalent FDO Join select query, applying the proper property/class aliasing to ensure that the reader that is returned presents the same property list as a default feature join query.
When querying an Extended Feature Class, MapGuide does several checks to see if the FDO Join optimization path can be taken. These checks include:
- Does the underlying FDO provider for the Feature Source support FDO Joins? (obvious one!)
- Does the primary and joined feature classes of this Extended Feature Class originate from the same feature source? (this is important, because FDO Joins work within the context of the same connection, thus the classes we are joining must come from the same source)
- A whole bunch of other minor conditions which your Extended Feature Class will most likely satisfy (detailed in the RFC)
Additionally, Extended Feature Classes also double up as the metadata. You don't need to do SDO_GEOM_METADATA hacks or FDO Schema Overrides for these classes to be recognised. Because MapGuide already understands these elements out of the box!
Also, in case you're wondering, the performance numbers in MapGuide are even better than mg-desktop (up to 1s faster than the numbers in the above screenshot) due to connection pooling and object caching mechanisms present in the MapGuide Server code.
Conditions Apply
Now as you can see, only certain Extended Feature Classes can take advantage of this optimization. The other limitation is FDO provider support. As of writing this post, the following providers implement the FDO Join APIs:
- SQL Server
- SQLite
The beauty of FDO is that as soon as an FDO provider implements the FDO Join APIs (like King.Oracle wink, wink, nudge, nudge), such configured Extended Feature Classes in MapGuide can automatically take advantage of this optimization path.
So there you have it. 100x faster feature joins if your data ticks all the boxes.
^ There is a 5th type of join algorithm that is (suprisingly) not implemented by the GWS Query Engine: Hash Joins. Though probably the most memory intensive join algorithm (due to needing to store one side of the join in a hash-table), this should always perform better than the default nested loops. Anyway, it's 2011. Memory is cheap! Having this join type implemented should improve the Feature Join user story even further. Guess that's another item on the TODO list.
No comments:
Post a Comment