Asynchronous XBRL Document Loading To Be Supported In Next Release

Now that the Sep 2013 CTP release of Gepsio has been made available at Codeplex and NuGet, focus has returned to getting the codebase to work in .NET 4.5, Windows Store (and its underlying Windows RT infrastructure) and Windows Phone 8. One of the new features in the version of C# that supports all of these platforms is support for asynchronous programming made possible through the async and await keywords. In this blog post, Microsoft announced that many of the WinRT APIs have embraced the asynchronous programming model:

To achieve those goals, we made many potentially I/O-bound APIs asynchronous in the Windows Runtime. These are the most likely candidates to visibly degrade performance if written synchronously (e.g. could likely take longer than 50 milliseconds to execute). This asynchronous approach to APIs sets you up to write code that is fast and fluid by default and promotes the importance of app responsiveness in Metro style app development.

In keeping with this design principle, and to ensure that Gepsio works well with Windows Store applications, a new method for XBRL document loading will be available in Gepsio’s next release. In addition to the already-available synchronous XbrlDocument.Load() method, the next release of Gepsio will include the asynchronous analogue of the XBRL document loading method:

public async Task XbrlDocument.LoadAsync(string Filename);

This method supports asynchronous loading of XBRL documents and is fully compatible with the async and await asynchronous programming model, as in the following example:

var xbrlDocument = new XbrlDocument();
await xbrlDocument.LoadAsync(file.Path);

The synchronous version of the method, XbrlDocument.Load(), will continue to be available, as it always has been.

Sep 2013 CTP Released: Better Load Performance, Better Validation Reporting

A new CTP of Gepsio, the Sep 2013 CTP, has been released! Download the latest .NET 3.5 assembly and documentation here.

The Sep 2013 CTP offers two high-level improvements from the previous CTP:

  1. Dramatically Improved Load Times: Calls to XbrlDocument.Load() for large, real-world XBRL instances will enjoy a dramatic decrease in execution times, thanks to a series of patches that turn some of the internal lists, and associated linear lookups, to dictionaries that use more efficient lookup methods during document validations. XBRL instances that were taking more than 20 seconds to load now load in about eight seconds, and more load time improvements are planned!
  2. New Design For Reporting Document Validity: The XbrlException method used in previous CTPs to report document validation errors has been replaced with a new XbrlDocument property called IsValid as well as an XbrlDocument collection called ValidationErrors that will report on all of the validation errors found in an XBRL instance. The blog post at https://gepsio.wordpress.com/2013/06/07/new-document-validation-design-shipping-in-next-ctp/ contains detailed information on this new design.

Enjoy this latest release!

Gepsio Performance Enhancements Available In About a Week

As mentioned in previous blog posts, two development efforts have been taking place recently:

    1. .NET 4.5
    2. Windows Store (also known as “Windows RT”) applications
    3. Windows Phone 8 devices
  1. Gepsio user who8877 has been submitting patches that dramatically improve Gepsio’s load-time performance

During an email exchange discussing the timing of the application of the patches, who8877 had this to say:

As well as helping me I think it would increase Gespio adoption. Perf seems to be one of the major barriers people are mentioning. Once people learn a different library they aren’t going to go back. If it weren’t for my absolute hatred of Java I’d probably be using a java library instead due to the perf issues.

That feedback is absolutely fair, and, while the support for Windows 8 and Windows Phone are important to me, performance is even more important to the users. With that in mind, I am going to shelve the Windows 8 changes and apply the patches. I’m going to take the following approach:

  1. Shelve the in-progress support for .NET 4.5, Windows RT and Windows Phone 8
  2. Get back to the Sep 2012 CTP code base (wow – it’s been a long time since a release)
  3. Write performance-based unit tests to load the large XBRL instances that who8877 referenced in the patch notes
  4. Run the performance-based unit tests to get load timings from the Sep 2012 CTP code base
  5. Apply the patches submitted by who8877 (which will, presumably, leave me with a more performant Sep 2013 CTP code base)
  6. Run the performance-based unit tests to get load timings from the Sep 2013 CTP code base
  7. Document the performance improvements
  8. Release a new Gepsio Sep 2013 CTP binary to Codeplex and Nuget
  9. Unshelve the in-progress support for .NET 4.5, Windows RT and Windows Phone 8 and merge changes with the Sep 2013 CTP code base
  10. Continue the in-progress support for .NET 4.5, Windows RT and Windows Phone 8

I’m going to be starting this activity this evening, so look for a new code base announcement soon!

Three More Performance Patches Submitted

Gepsio user who8877 has submitted three more great performances patches for the Gepsio codebase:

Check out all of the patches at the Gepsio Patch page.

As has been mentioned previously, the Gepsio code base is being modified to add support for Windows 8 and Windows Phone 8; however, given the nature of these patches, that work may be shelved while these patches have are folded into a release. A decision will be made in the next few days. A lot of work has gone into the Windows 8 and Windows Phone 8 support; however, these patches are significant as well.

Many thanks to who8877 for the work on these patches! It’s folks like you who make Gepsio better and better.

Performance Enhancement Patch Source Available

Gepsio user who8877 has submitted a patch to the Gepsio source:

This patch moves the element lookup from a linear search to an O(log n) search by using a Linq ILookup instead of just a List<>. The ILookup is destroyed whenever the underlying List<> changes, and recreated on demand.

The patch applies to the Item.cs and XbrlSchema.cs files and is reported to give a nice performance boost to Gepsio. The XBRL document at http://www.sec.gov/Archives/edgar/data/1562476/000119312513333269/tmhc-20130630.xml was loaded by a privately-patched-and-built version of Gepsio with a massive 67% reduction in load time as a result of this patch. Well done!

If you’re feeling adventurous and interested in applying this patch yourself, you can download the latest changeset from the Changeset list, downloading the patch file from the Patch list, and applying the patch using your favorite patch tool.

This patch will end up in the next release of Gepsio, so, if you’re patient, you won’t need to apply the patch manually. Work is currently progressing in getting Gepsio to work for .NET 4.5, Windows Store/Windows RT and Windows Phone 8, and this patch will be applied to the source code after the platform work has been stabilized.

This patch is reminiscent of another performance enhancement that should probably go into the next release of Gepsio. Gepsio stores internal collections of facts and units and the like using List<T> collections. By default, List<T> objects have a capacity threshold of five items. If more items are added to the list, the list’s internal storage is expanded by creating a larger storage buffer, copying the old storage’s contents into the new storage contents, and then destroying the old storage. You can read more about this design here. This is an important performance consideration, because there are hundreds of facts in a large XBRL document, and the constant resizing of the Gepsio lists used to maintain the object collections is a most likely a huge bottleneck (though that needs to be confirmed through some detailed code profiling). A better idea might be to set the list capacity at the outset to the number of nodes in the loaded XBRL document – because there can be no more items and units then the number of nodes in the document – and then trim off the extra memory after all of the loading is complete through a call to List<T>.TrimExcess().