Archive

Archive for April, 2012

A Sudoku solver in T-SQL

Writing a Sudoku solver has become one of my favorite exercise puzzles. It is a small task but still complex enough to be challenging when tested in a new domain or context. So far I have written Sudoku solvers in C++, C#, F# and Powershell. This time my eyes fell upon T-SQL, which is the SQL dialect used by SQL Server. Why? (Do I need a reason? Smile) Well, at least, it seemed like a fun way to gain some more experience in that area.  When thinking back, I must admit that it was more on the terrible, frustrating side of the scale than anything else. Still very interesting, though. On the top of my head, some of the things that buggered me:

    I do not say that T-SQL sucks (well, maybe a little bit… Smile) but these are things that we have become accustomed to and expect in other contexts and other, more normal, programming languages. There are probably good reasons for every bump on the road.
    Anyway, as I said, it was still interesting so lets have a look at the solution. (Note: I used SQL Server 2012 but it might be possible to run the code in earlier versions.)
    I started out by writing code for creating the Sudoku board. I felt it was natural to represent the data using a table having to following definition:

CREATE TYPE BoardType AS TABLE (R INT, C INT, V INT)

Here is the function that generates such a table. Unfortunately, due to the restrictions of function return types I have to repeat the definition of BoardType in the function definition.

CREATE FUNCTION CreateBoard() RETURNS @board TABLE(R INT, C INT, V INT) AS

BEGIN

— Insert data

DECLARE @sudoku TABLE (id INT IDENTITY, V INT)

 

— ref: http://www.dctech.com/sudoku/samples/

INSERT INTO @sudoku (V) VALUES

  (0),(0),(9), (6),(0),(7), (4),(3),(1),

  (8),(0),(0), (0),(5),(3), (0),(0),(9),

  (0),(6),(0), (2),(0),(0), (5),(0),(0),                                                                    

 

  (0),(0),(8), (9),(0),(0), (0),(0),(6),

  (0),(0),(2), (0),(4),(0), (7),(0),(5),

  (0),(0),(0), (0),(0),(1), (0),(0),(0),                                                                    

 

  (0),(0),(0), (5),(9),(4), (3),(0),(2),

  (0),(2),(7), (0),(3),(0), (0),(1),(0),

  (4),(0),(0), (1),(0),(2), (6),(5),(0);

 

DECLARE @valueMap TABLE (VS INT, VT INT)

 

INSERT INTO @valueMap (VS, VT) VALUES

(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9),

(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9)

 

INSERT INTO @board

SELECT (ID1)/9 AS R, (ID1)%9 AS C, VT AS V FROM @sudoku

INNER JOIN @valueMap ON V = VS   

 

RETURN

END

 

As you may have noticed, I use a temporary table and create the final one from a join, in order to avoid writing too much.

 

Extracting rows, columns and blocks are naturally written using select statements.

 

CREATE FUNCTION GetRow(@board BoardType READONLY, @row INT, @col INT) RETURNS TABLE AS

RETURN SELECT A.R, A.C, A.V, A.D FROM (SELECT R, C, V, dbo.GetCellDimension(@board, R,C) AS D FROM @board) AS A

WHERE A.R = @row AND A.C <> @col AND A.D = 1

 

CREATE FUNCTION GetColumn(@board BoardType READONLY, @row INT, @col INT) RETURNS TABLE AS

RETURN SELECT A.R, A.C, A.V, A.D FROM (SELECT R, C, V, dbo.GetCellDimension(@board,R,C) AS D FROM @board) AS A

WHERE A.C = @col AND A.R <> @row AND A.D = 1

 

CREATE FUNCTION GetBlock(@board BoardType READONLY, @row INT, @col INT) RETURNS TABLE AS

RETURN SELECT A.R, A.C, A.V, A.D, A.BR, A.BC

FROM (SELECT R, C, V, dbo.GetCellDimension(@board,R,C) AS D, R/3 AS BR, C/3 AS BC FROM @board) AS A

WHERE A.BR = @row/3 AND A.BC = @col/3 AND NOT (A.C = @col AND A.R = @row) AND A.D = 1

 

Due to the the fact that execution order is different from the statements order, I use an inner select statement, which complicates matter slightly. This article on stackoverflow suggests this work around.

 

The next thing was to add constraint propagation. Instead of propagating constraints for one cell at the time, I found a way that was more in the flavor of set theory and thus simpler to express in T-SQL:

 

CREATE FUNCTION PropagateConstraints(@board BoardType READONLY) RETURNS @result TABLE(R INT, C INT, V INT) AS

BEGIN

INSERT INTO @result

SELECT R, C, V FROM @board

 

DELETE FROM @result

WHERE dbo.GetCellDimension(@board, R, C) > 1 AND V IN (

          SELECT V FROM dbo.GetRow(@board, R, C) UNION

          SELECT V FROM dbo.GetColumn(@board, R, C) UNION

          SELECT V FROM dbo.GetBlock(@board, R, C))

 

RETURN

END

 

The downside of this method is that it may not propagate constraints entirely, leaving cells having one value left that is still illegal. Detecting illegal boards like this is done by the GetErrorCount function which, along with the solver method, was the hardest to write.

 

CREATE FUNCTION GetErrorCount(@board BoardType READONLY) RETURNS INT AS

BEGIN

— Are all cells there?

DECLARE @cellCount INT

SELECT @cellCount = COUNT(DISTINCT R*9+C) FROM @board

IF @cellCount <> 81

          RETURN 1000

 

— Create a board where ambiguous cells are removed

DECLARE @distincts TABLE(IND INT, CNT INT)

 

INSERT INTO @distincts

SELECT R*9+C AS I, COUNT(*) AS CNT FROM @board GROUP BY (R*9+C)

 

DELETE FROM @distincts

WHERE CNT <> 1

 

DECLARE @distinctBoard AS TABLE(R INT, C INT, V INT)

 

INSERT INTO @distinctBoard

SELECT R, C, V

FROM @board

WHERE R*9+C IN (SELECT IND FROM @distincts)

 

— Figure out if the board is valid

DECLARE @errorCount INT

 

; WITH VerificationSet(A, B) AS

(

          SELECT COUNT(DISTINCT V), COUNT(V) FROM @distinctBoard GROUP BY R

          UNION ALL

          SELECT COUNT(DISTINCT V), COUNT(V) FROM @distinctBoard GROUP BY C

          UNION ALL

          SELECT COUNT(DISTINCT V), COUNT(V) FROM @distinctBoard GROUP BY R/3*3+C/3

)

SELECT @errorCount = SUM(CASE WHEN A = B THEN 0 ELSE 1 END)

FROM VerificationSet

 

RETURN @errorCount

END

 

The idea is to compare the distinct values to the total value count for each row, column and block. If the number of distinct values are less, we have an error.

 

Finally, we got the solver which is just to large to fit in here. Feel free to have a look in the final code at bitucket.

 

Having all the pieces in place we can now solve a puzzle using:

 

— Create board

DECLARE @board BoardType

INSERT INTO @board

SELECT R, C, V FROM dbo.CreateBoard();

 

— Solve puzzle

SELECT * FROM dbo.Solve(@board)

 

Ok, that was all. Thanks for reading.

Categories: Uncategorized

Base64 encoding

Base64 encoding is a way to convert, or encode, binary data using text characters only. I stumbled upon a question on stackoverflow concering how to use the boost library for such a task. Indeed, it turns out to be possible but the template type definition turns out to be, well, not for the faint hearted.

 typedef 
     insert_linebreaks<         // insert line breaks every 72 characters
         base64_from_binary<    // convert binary values of base64 characters
             transform_width<   // retrieve 6 bit integers from a sequence of 8 bit bytes
                 const char *,
                 6,
                 8
             >
         >
         ,72
     >
     base64_text; // compose all the above operations in to a new iterator

 

Wow. No matter how impressive this is, it is also hard to read and use, unfortunately. Just for curiosity, I wondered how much code that a plain, no tricks at all, implementation would require. So… I wrote one:

#include "stdafx.h"

#include <string.h>

 

unsigned char* base64_encode(const unsigned char* in, size_t length)

{

    const char * lookup_table =

        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

        "abcdefghijklmnopqrstuvwxyz"

        "0123456789"

        "+/";

 

    const size_t rlength = length%3==0 ? length/3*4+1 : length/3*4+5;

    unsigned char* out = new unsigned char[rlength];

    size_t i=0, j=0;

    for(; i<length-2; i+=3, j+=4)

    {

        out[j] = lookup_table[in[i] >> 2];

        out[j+1] = lookup_table[((in[i] & 0x03) << 4) + (in[i+1] >> 4)];

        out[j+2] = lookup_table[((in[i+1] & 0x0F) << 2) + (in[i+2] >> 6)];

        out[j+3] = lookup_table[in[i+2] & 0x3F];

    }

 

    if (i < length)

    {

        out[j] = lookup_table[in[i] >> 2];

        out[j+1] = lookup_table[((in[i] & 0x03) << 4) + (i+1 < length ? (in[i+1] >> 4) : 0)];

        out[j+2] = i+1 < length ? lookup_table[(in[i+1] & 0x0F) << 2] : ‘=’;   

        out[j+3] = ‘=’;

        j+=4;

    }

 

    out[j] = 0;

    return out;

}

 

char* base64_encode_str(const char* in, size_t length)

{

    return reinterpret_cast<char*>(base64_encode( reinterpret_cast<const unsigned char*>(in), length));

}

 

int _tmain(int argc, _TCHAR* argv[])

{

    const char* in = "this is sample data that could have been binary as well";

    char* out = base64_encode_str(in, ::strlen(in));

    printf("%s -> %s\n", in, out);

    delete[] out;

}

 

To be fair, this one does not handle line breaks which the boost version did. On the other hand it handles data sizes that are not multiples of three. So, this was my version. If you need a library you might want to have a look at this one instead. A gpu version (which is incredibly cool of course) exists on codeplex. In .NET it is more convenient to turn to Convert.ToBase64.

Categories: Uncategorized