I finally found some time to finish my SATR tiling algorithm for CloudAE. Unlike the previous STAR algorithms, which were built around segmented tiling, this approach uses more efficient spatial indexing to eliminate temporary files and to minimize both read and write operations. Another difference with this approach is that increasing the allowed memory usage for the tiling process will substantially improve the performance by lowering the duplicate read multiplier and taking advantage of more sequential reads. For instance, increasing the indexing segment size from 256 MB to 1 GB will generally reduce the tiling time by 50% on magnetic media, and 30% on an SSD.
SATR Development Finalized
Snail XML Parser
Years ago, in the days when I developed my PHP CMS, I decided to start a new project to give me more practice with new PHP language features. In the end, I decided to build a parser for malformed HTML.
Lets all just take a moment to let that sink in...I built an HTML parser in PHP, in spite of the availability of modules such as libxml. Now, it's not quite as bad as it sounds. I did first look into the current options for markup parsing and recognized that my parser was not going to improve on any of them. I was really just doing it for fun.
So, I jumped right in. HTML parsing is fairly straightforward, and in the end I had a robust DOM parser that handled all manner of different malformed conditions. Once that was complete, I continued adding functionality until eventually I had a full validating parser. Throughout this development, I didn't spend much time thinking about performance. So, as much as I liked it, it would never have been suitable for a production environment.
Recently, I ran across the old source code for that application and decided to port it to .NET, just out of curiosity. I knew that I had not considered performance much during that phase of my programming career, and I wondered just how slow it was. After converting it to C#, I benchmarked it against some current parsers, and discovered that it was abysmal.
So now it was time for a new project. I looked into modern DOM, SAX, and Pull parsers and did not find any .NET implementations that could rival the speed of parsers such as pugixml, AsmXml, and RapidXml (although in the latter case some of that speed comes from being a bit loose with the spec). Having done a lot of .NET benchmarking for my CloudAE project, I was curious how fast of a conforming HTML/XML parser I could write in C#.
Welcome the Snail XML Parser.
PropertyManager
When I started programming as a lad, I initially used INI files to manage configuration settings. Lightweight and easy to parse, they were a simple way to get started, whether I was rolling my own parsing or, later, using existing parsing utilities. Soon, however, the allure of the Windows Registry drew me in, and I began using it almost exclusively for configuration settings. I found the registry convenient for most purposes, and only resorted to INI files for portable applications that I would run from a disk. This state of affairs lasted almost a decade, until I started to encounter registry permission problems on newer operating systems with improved user security controls. I finally started adopting some different configuration mechanisms.
The Application Settings mechanism is the default way to persist and manage application configuration settings in .NET. For those who prefer to adjust the behavior, it supports custom persistence implementations. This feature allows design-time development of application and user settings, and is improved by adding ConfigurationValidatorAttribute
constraints.
And now, here I go, flying in the face of convention. I dislike managing settings outside the scope of the code that will use them, so I have written a PropertyManager
which uses generics, reflection and delegates to provide a good alternative to the built-in Application Settings. It allows the declaration of properties at a more reasonable scope, automated discovery, and simple run-time management.
private
static
readonly
IPropertyState<ByteSizesSmall> PROPERTY_SEGMENT_SIZE;
private
static
readonly
IPropertyState<bool
> PROPERTY_REUSE_TILING;
static
ProcessingSet()
{
PROPERTY_SEGMENT_SIZE = Context.RegisterOption(Context.OptionCategory.Tiling, "MaxSegmentSize"
, ByteSizesSmall.MB_256);
PROPERTY_REUSE_TILING = Context.RegisterOption(Context.OptionCategory.Tiling, "UseCache"
, true
);
}
Once the properties have been defined, they can be used easily through the Value property, similar to the usage of a Nullable<T>
.
if
(PROPERTY_REUSE_TILING.Value)
{
// do something exciting
}
So, what makes this all happen? Other than some validation, it boils down to the call to PropertyManager.Create()
.
public
static
IPropertyState<T> RegisterOption<T>(OptionCategory category, string
name, T defaultValue)
{
if
(!Enum.IsDefined(typeof
(OptionCategory), category))
throw
new
ArgumentException("Invalid category."
);
if
(string
.IsNullOrWhiteSpace(name))
throw
new
ArgumentException("Option registration is empty."
, "name"
);
if
(name.IndexOfAny(new
[] { '.'
, '\'
, ' '
, ':'
}) > 0)
throw
new
ArgumentException("Option registration contains invalid characters."
, "name"
);
string
categoryName = Enum.GetName(typeof
(OptionCategory), category);
string
optionName = String.Format("{0}.{1}"
, categoryName, name);
IPropertyState<T> state = null
;
var
propertyName = PropertyManager.CreatePropertyName(optionName);
if
(c_registeredProperties.ContainsKey(propertyName))
{
state = c_registeredProperties[propertyName] as
IPropertyState<T>;
if
(state == null
)
throw
new
Exception("Duplicate option registration with a different type for {0}."
);
WriteLine("Duplicate option registration: "
, propertyName);
}
else
{
state = PropertyManager.Create(propertyName, defaultValue);
c_registeredProperties.Add(propertyName, state);
c_registeredPropertiesList.Add(state);
}
return
state;
}
The static property manager contains the necessary methods to create, update and retrieve properties. It wraps an IPropertyManager
instance which knows the details about persistence and conversion for the storage mode that it represents. I have standard implementations for the Registry, XML, and a web service.
public
interface
IPropertyManager
{
PropertyName CreatePropertyName(string
name);
PropertyName CreatePropertyName(string
prefix, string
name);
IPropertyState<T> Create<T>(PropertyName name, T defaultValue);
bool
SetProperty(PropertyName name, ISerializeStateBinary value
);
bool
GetProperty(PropertyName name, ISerializeStateBinary value
);
bool
SetProperty(IPropertyState state);
bool
GetProperty(IPropertyState state);
}
As for data binding, just create a DataGrid
with a TwoWay
binding on Value
, and we have ourselves a property editor.
dataGrid.ItemsSource = Context.RegisteredProperties;
The main downside with this approach to application and user settings is that configuration validators cannot be used as attributes on the Value property of the IPropertyState<T>
. The workaround for this is validation delegates which work just as well, but are not quite as nice visually.
Point Enumeration in CloudAE
In my last post, I referenced the block-based nature of point cloud handling in CloudAE. The following example shows the basic format for enumerating over point clouds using the framework. At this level, the point source could be a text file, an LAS or LAZ file, or a composite of many individual files of the supported types. The enumeration hides all such details from the consumer.
using
(var
process = progressManager.StartProcess("ChunkProcess"
))
{
foreach
(var
chunk in
source.GetBlockEnumerator(process))
{
byte
* pb = chunk.PointDataPtr;
while
(pb < chunk.PointDataEndPtr)
{
SQuantizedPoint3D* p = (SQuantizedPoint3D*)pb;
// evaluate point
pb += chunk.PointSizeBytes;
}
}
}
This can be simplified even more by factoring the chunk handling into IChunkProcess instances, which can encapsulate analysis, conversion, or filtering operations.
var
chunkProcesses = new
ChunkProcessSet(
quantizationConverter,
tileFilter,
segmentBuffer
);
using
(var
process = progressManager.StartProcess("ChunkProcessSet"
))
{
foreach
(var
chunk in
source.GetBlockEnumerator(process))
chunkProcesses.Process(chunk);
}
The chunk enumerators handle progress reporting and checking for cancellation messages. In addition, they hide any source implementation details, transparently reading from whatever IStreamReader is implemented for the underlying sequential sources.
Using LASzip from C#
Compiling LASzip is simple, but what what does performance look like when using LASzip in a managed environment? The first thing to realize is that accessing points individually is very expensive across a managed boundary. That means that using an equivalent of P/Invoke individually for each point will add a substantial amount of overhead in a C# context. To reduce the number of interop thunks which need to occur, the most important step is to write an intermediate class in native C++ which can retrieve the individual points and return them in blocks, transforming calls from this:
bool LASunzipper::read(unsigned char * const * point);
...to something like this:
int LAZBlockReader::Read(unsigned char* buffer, int offset, int count);
The next, less important, performance consideration is to create a C++/CLI interop layer to interface with the block reader/writer. This allows us to hide details like marshaling and pinning, and uses the C++ Interop, which provides optimal performance compared to P/Invoke.
For my situation, this is exactly what I want, since CloudAE is built around chunk processing anyway. For other situations, both the "block" transformation and the interop layer can be an annoying sort of overhead, so it should definitely be benchmarked to determine whether the thunk reduction cost is worth it.
The final factor determining the performance of LASzip is the file I/O. In LAStools, Martin Isenburg uses a default io_buffer_size
parameter that is currently 64KB. Using a similarly appropriate buffer size is the easiest way to get reasonable performance. Choosing an ideal buffer size is a complex topic that has no single answer, but anything from 64KB to 1MB is generally acceptable. For those not familiar with the LASzip API, LASunzipper
can use either a FILE
handle or an iostream
instance, and either of these types can use a custom buffer size.
One caveat that I mentioned in my last post is that when compiling a C++/CLI project in VS 2010, the behavior of customizing iostream buffer sizes is buggy. As a result, I ended up using a FILE
handle and setvbuf()
. The downside of this approach is that LAZ support in my application cannot currently use all my optimized I/O options, such as using FILE_FLAG_NO_BUFFERING
when appropriate.
For an example of using the LASzip API from C++, check out the libLAS source.
The Cost of Double.TryParse
After extensive testing, I have optimized the text to binary conversion for ASCII XYZ files. These files are a simple delimited format with XYZ values along with any number of additional attributes. My original naive approach used StreamReader.ReadLine()
and Double.TryParse()
. I quickly discovered that when converting billions of points from XYZ to binary, the parse time was far slower than the output time, making it the key bottleneck for the process.
Although the TryParse methods are convenient for normal use, they are far too slow for my purposes, since the points are in a simple floating point representation. I implemented a reader to optimize the parse for that case, ignoring culture rules, scientific notation, and many other cases that are normally handled within the TryParse. In addition, the parse performance of atof()
-style operations varies considerably between implementations. The best performance I could come up with was a simple variation of a common idea with the addition of a lookup table. The main cost of the parse is still the conditional branches.
In the end, I used custom parsing to identify lines directly in bytes without the overhead of memory allocation/copying and converting to unicode strings. From there, I parse out the digits using the method I described. I also made a variation that used only incremented pointers instead of array indices, but in the current .NET version, the performance was practically identical, so I reverted to the index version for ease of debugging.
The following test code provides reasonable performance for parsing three double-precision values per line.
bool
ParseXYZ(byte
* p, int
start, int
end, double
* xyz)
{
for
(int
i = 0; i < 3; i++)
{
long
digits = 0;
// find start
while
(start < end && (p[start] < '0'
|| p[start] > '9'
))
++start;
// accumulate digits (before decimal separator)
int
currentstart = start;
while
(start < end && (p[start] >= '0'
&& p[start] <= '9'
))
{
digits = 10 * digits + (p[start] - '0'
);
++start;
}
// check for decimal separator
if
(start > currentstart && start < end && p[start] == '.'
)
{
int
decimalPos = start;
++start;
// accumulate digits (after decimal separator)
while
(start < end && (p[start] >= '0'
&& p[start] <= '9'
))
{
digits = 10 * digits + (p[start] - '0'
);
++start;
}
xyz[i] = digits * c_reciprocal[start - decimalPos - 1];
}
else
xyz[i] = digits;
if
(start == currentstart || digits < 0)
return
false
; // no digits or too many (overflow)
}
return
true
;
}