Strtod in C# – Part 2: The implementation

The way we’re going to parse the number is from the string is the same way as in “What Every Computer Scientist Should Know About Floating-Point Arithmetic” I linked to last time (note this is for a single precision number, hence nine digits – we’ll need to use more):

First read in the 9 decimal digits as an integer N, ignoring the decimal point. [..] Next find the appropriate power 10P necessary to scale N. This will be a combination of the exponent of the decimal number, together with the position of the (up until now) ignored decimal point. [..] Finally multiply (or divide if P < 0) N and 10|P|.

The style I’ve used to parse the string is a series of static functions that take the string, the current index and return the index of the end of their data, passing any extracted data through an out parameter. In code that looks like this:

private static int SkipWhiteSpace(string str, int index);
private static int ParseSign(string str, int index, out int value);

I could have made the string and index member variables, but if the parsing failed I would need to roll back the index. I also toyed with the idea of using an iterator, but I couldn’t see an advantage. Also, when parsing the significand two values need to be returned to take into account the decimal place. Sure, I could have used a Tuple to accomplish this, put I went for the above to keep everything consistent.

I’m not going to go through each method here (you can download the updated code here) but I’ll go through the main sequence of events.

  • First we validate the arguments. This will be one less worry for the private methods knowing the passed in values are sane.
  • Skip any whitespace, remembering the position of the first non-whitespace character for later.
  • Try to parse a number. This means searching for a sign, skip any leading zeros and then parse the significand. Parsing the significand will give us a number and an exponent, in case the number has a decimal point. If we’ve parsed any digits (including the leading zeros) then we can check for an exponent and adjust the exponent of the significand accordingly. Finally, we can create a double from the sign, significand (stored in a 64-bit integer) and exponent.
  • If we didn’t parse any digits then we need to search for +/- infinity or NaN, after skipping the whitespace.
  • Failing that then we couldn’t get any number so set the end to the starting index and return 0.

It’s actually quite difficult to explain in words, so here’s a code snippet:

public static double Parse(string input, int start, out int end)
{
    if (input == null)
    {
        throw new ArgumentNullException("input");
    }
    if (start < 0)
    {
        throw new ArgumentOutOfRangeException("start", "Value cannot be negative.");
    }
    if (start > input.Length)
    {
        throw new ArgumentOutOfRangeException("start", "Value must be less then input.Length");
    }

    int endOfWhitespace = SkipWhiteSpace(input, start);

    long significand;
    int significandsExponent, sign;
    int startOfDigits = ParseSign(input, endOfWhitespace, out sign);
    int index = SkipLeadingZeros(input, startOfDigits);
    index = ParseSignificand(input, index, out significand, out significandsExponent);

    // Have we parsed a number?
    if ((index - startOfDigits) > 0)
    {
        int exponent;
        end = ParseExponent(input, index, out exponent);
        return MakeDouble(significand * sign, exponent - significandsExponent);
    }

    // Not a number, is it a constant?
    double value;
    end = ParseNamedConstant(input, endOfWhitespace, out value);
    if (end != endOfWhitespace)
    {
        return value;
    }

    // If we're here then we couldn't parse anything.
    end = start;
    return default(double);
}

One thing that caught me out (but picked up in the unit tests) was creating the double from the significand and the exponent when the exponent is less than -308 – there exists a category of floating point numbers that are denormalized so I needed to add a check in the MakeDouble function to handle these, as it was rounding them down to zero.

Strtod in C# – Part 1: The specification

As mentioned, parsing the points in an SVG polygon would be a lot easier (and quicker?) if we had the strtod function in C#. Well, let’s give it a go! Now, dealing with floating point numbers is tricky so I’m probably going to mess it up on some corner cases, so to limit the damage we’re going to create some tests that our function must be able to handle. These tests will also help create our specification, hopefully!

Precision

First of all, how accurate do we have to be? Well, §4.3 states:

a number has the capacity for at least a single-precision floating point number

It also goes on to mention that it’s preferable that double-precision be used for calculations, so we may as well parse to double to aid in calculations, knowing that any value stored in the SVG should fall within a valid range.

So, how many digits are we going to parse? What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg is by far the best work on trying to understand floating point numbers. I don’t claim to understand it all, however, in the precision section he mentions that 17 digits are enough to recover a double precision binary number. However, that’s not quite correct – 17 significant digits are required, as 12345678901234567 and 000000000012345678901234567 are the same number.

Here’s what we’ll test (also making sure that the whole string is read):

Input Expected
12345678901234567 12345678901234567
000000000012345678901234567 12345678901234567
12345678901234567890 12345678901234567000
1.2345678901234567 1.2345678901234567
0.00000000012345678901234567 0.00000000012345678901234567
1.00000000012345678901234567 1.0000000001234567
1234567890.00000000001234567 1234567890

Underflow/Overflow

What happens when the value is too small (as in very close to zero, not as in a negative number is outside of the valid range) or too large to represent?

System.Double.Parse makes numbers that are too small to be represented by double silently underflow into zero. This also happens when converting a very small double to float. However, if the number is outside the range of double then System.Double.Parse throws a System.OverflowException. This doesn’t make sense to me, especially since casting a big double to float will convert it to infinity. In this situation, strtod returns HUGE_VAL and this is the route I’ll take – numbers that are too big to fit inside the range of a double will be returned as +/- infinity and numbers that are very close to zero that double cannot represent will be truncated to zero.

Therefore, our tests will make sure the following happen (again making sure all input is read):

Input Expected
+4.9406564584124654E-324 4.9406564584124654E-324
+1.7976931348623157E+308 1.7976931348623157E+308
-4.9406564584124654E-324 -4.9406564584124654E-324
-1.7976931348623157E+308 -1.7976931348623157E+308
+1E-325 0
+1E+309 Infinity
-1E-325 0
-1E+309 -Infinity

Infinity/Not a Number

There’s an interesting point to take into consideration when parsing – the SVG specification seems to only allows numbers, yet in XML, INF, -INF and NaN are valid.

When parsing we’ll try to be as flexible as possible, so will allow "Inf", "Infinity" and "NaN" (all case-insensitive).

Valid Formats

The number section of the standard gives the following EBNF grammar for a valid number:

integer ::= [+-]? [0-9]+
number  ::= integer ([Ee] integer)?
            | [+-]? [0-9]* "." [0-9]+ ([Ee] integer)?

What’s interesting about this is that numbers with a trailing decimal point are invalid (i.e 0. doesn’t match the grammar). We’ll assume that’s an oversight and allow it (System.Double.Parse and strtod have no problems with it.) However, we need to be careful that a single decimal point is not parsed.

Testing this is a bit more involved, as strtod will parse as much of the input as it can, so while 0e++0 looks invalid, our function should be able to parse the first zero and then stop when it gets to the e. To test this we therefore need to make sure that our function does not consume the whole string, just the first few characters.

The Code

Think that’s all for now. Here’s the test cases; the actual class will follow later.

Eventually I’d like to allow for different culture settings, but for now I’m concentrating on the SVG spec.

SVG browser resizing

The advantage of using SVG is in its name – Scalable Vector Graphics. You can create your image at whatever size you want and later you can resize it without it getting blocky/blurry etc.

That’s great in theory. I’ve had problems in the past where I would set the logo of a website as a semi-transparent background but the page would centre it instead of scaling it. This quickly led me to use the viewBox property and not the width + height properties.

This led to a problem in webkit based browsers. Turns out you need to provide both – the viewBox and the width + height. See this post for an excellent comparison of SVG content in the different browsers.