Thursday, 11 June 2020

Announcing: FDO Toolbox 1.4.1

This release should dramatically improve stability of FDO Toolbox and the new FdoCmd CLI when working with SQLite data files.

In fixing the stability problem, it also finally gave me the ultimate insight into the proper usage patterns around the FDO .net API. I had made my concerns known over a decade earlier about the flakiness of the FDO .net wrapper and the various answers given didn't quite give me a solid set of "rules of thumb" to avoid.

What lit the light bulb was looking the FDO SQLite provider codebase once more. I noticed that the main connection class not only implements the main connection interface, it also implements several other interfaces for convenience. The other insight was that there are several places in the FDO Toolbox code that it would be possible that a reference to a capability or a property dictionary (off of the capability, or some other top-level connection property) would out-live the underlying connection which meant that when it came time for the .net GC to start cleaning up references, it would call the finalizers on these references which would then proceed to subtract the ref count on the underlying native pointer.

But because in the case of SQLite the connection implements several FDO interfaces, we may be subtracting a ref count of a pointer to a non-connection interface, but the underlying implementation is actually the connection itself, causing an access violation from tearing down the same connection more than once or tearing down something that is connected to the torn down connection.

I didn't have full evidence to confirm that the above was indeed the case, but it was a solid enough theory that was backed by my observations in running isolated snippets of C# code using the FDO API targeting the SQLite provider.

Here's an example of a crashing snippet:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
using OSGeo.FDO.ClientServices;
using OSGeo.FDO.Commands;
using OSGeo.FDO.Commands.DataStore;
using OSGeo.FDO.Connections;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace FdoCrash
{
    class Program
    {
        static void Main(string[] args)
        {
            var conn = FeatureAccessManager.GetConnectionManager().CreateConnection("OSGeo.SQLite");
            using (conn)
            {
                if (HasCommand(conn, CommandType.CommandType_CreateDataStore, "Creating data stores", out var _))
                {
                    using (var cmd = (ICreateDataStore)conn.CreateCommand(CommandType.CommandType_CreateDataStore))
                    {
                        var dict = cmd.DataStoreProperties;
                        foreach (string name in dict.PropertyNames)
                        {
                            Console.WriteLine("{0}", name);
                        }
                    }
                }
            }
        }

        static bool HasCommand(IConnection conn, CommandType cmd, string capDesc, out int? retCode)
        {
            retCode = null;
            if (Array.IndexOf<int>(conn.CommandCapabilities.Commands, (int)cmd) < 0)
            {
                //WriteError("This provider does not support " + capDesc);
                //retCode = (int)CommandStatus.E_FAIL_UNSUPPORTED_CAPABILITY;
                return false;
            }
            return true;
        }
    }
}


And here's the same snippet, refactored to the point it does not crash with System.AccessViolationException:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
using OSGeo.FDO.ClientServices;
using OSGeo.FDO.Commands;
using OSGeo.FDO.Commands.DataStore;
using OSGeo.FDO.Connections;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace FdoCrash
{
    class Program
    {
        static void Main(string[] args)
        {
            var conn = FeatureAccessManager.GetConnectionManager().CreateConnection("OSGeo.SQLite");
            using (conn)
            {
                if (HasCommand(conn, CommandType.CommandType_CreateDataStore, "Creating data stores", out var _))
                {
                    using (var cmd = (ICreateDataStore)conn.CreateCommand(CommandType.CommandType_CreateDataStore))
                    {
                        using (var dict = cmd.DataStoreProperties) //Dispose the property dictionary asap
                        {
                            foreach (string name in dict.PropertyNames)
                            {
                                Console.WriteLine("{0}", name);
                            }
                        }
                    }
                }
            }
        }

        static bool HasCommand(IConnection conn, CommandType cmd, string capDesc, out int? retCode)
        {
            retCode = null;
            using (var cmdCaps = conn.CommandCapabilities) //Dispose the command capabilities asap
            {
                if (Array.IndexOf<int>(cmdCaps.Commands, (int)cmd) < 0)
                {
                    //WriteError("This provider does not support " + capDesc);
                    //retCode = (int)CommandStatus.E_FAIL_UNSUPPORTED_CAPABILITY;
                    return false;
                }
            }
            return true;
        }
    }
}


And with this snippet, I have come to what I confidently feel should be the "best practices" to using the FDO .net API.

Firstly, dispose of any reference to a top-level connection property (or a sub-property that hangs off of that property) as soon as you are done with it. You must do whatever you can to eliminate the possibility of such references out-living the main connection if/when the .net GC goes to cleanup. The C# using keyword helps streamlining this a lot. Other FDO objects do not need such aggressive disposal (the key point is that they don't directly hang off of the main connection or one of its top-level properties), but you should probably do it anyways out of habit. A thing that may result in confusion (it's confused me for about a decade!) is that disposing a .net FDO object is not the same as disposing a FDO object in C++. In C++, disposing is actual deleting/deallocation of memory, in .net disposing is just subtracting the refcount from the underlying C++ pointer (a way to tell it we're done with the object on the .net side). And because nearly every FDO class/interface exposed to .net implements the IDisposable interface, you are encouraged to dispose early and often once you're done with them.

Secondly, avoid compound statements (some.property.PropertyOrMethod) on anything involving classes/interfaces from the FDO .net API. These are the rules for working with its C++ API (to prevent memory leaks from not housing intermediate parts of a compound statement in smart pointers) and we should probably follow the same rules on the .net side just to be safe.

With these 2 rules established, I did a sweep of the FDO Toolbox codebase to make sure these 2 rules were being adhered to and the end result was that our powershell test harness no longer crashes when involving the SQLite provider, which objectively gave me confidence that this issue was finally addressed.

And that's the little (and hopefully insightful) side-story to pad out this blog post :)

Download

No comments: