PostgreSQL enhancements in Norm.net

Recently, I've made a couple of enhancements to my Norm data access library, specifically, to simplify and improve performances when working with PostgreSQL.

If you don't know already, Norm is a data-access library for .NET Core that falls into the same category as Dapper, but with a couple of unique features such as tuples and async streams.

You can check it out on GitHub repo https://github.com/vbilopav/NoOrm.Net and also you can check out my earlier article on the subject https://dev.to/vbilopav/net-identity-with-custom-postgresql-store-migrations-unit-tests-and-norm-net-57p8

Prepared statements

Most of the databases do support the creation of a prepared (or compiled) version of the command sent to a data source. But, not all will benefit from explicitly preparing statements.

SQL Server caches and resues execution plans sent to the server. PostgreSQL don't.

It has to be explicitly told to do so. Usually by calling Prepare. And, it is scoped to a single connection.

According to this article from Npgsql developer - http://www.roji.org/prepared-statements-in-npgsql-3-2 - PostgreSQL will manifest approximately 17% percent performance gain from using prepared statements.

And since Npgsql implements internal connection pooling there is no worry that prepared statements will be discarded when the new connection is served.

However, as that article states, popular ORM libraries like Dapper, Entity Framework Core do not provide statement preparation (since SQL Server does that automatically they didn't even bother I guess).

To circumvent around that problem Npgsql team has implemented a couple of additional connection string parameters that let you do the automatic preparation.

From the documentation: https://www.npgsql.org/doc/prepare.html#automatic-preparation:

Npgsql 3.2 introduces automatic preparation. When turned on, this will make Npgsql track the statements you execute and automatically prepare them when you reach a certain threshold. When you reach that threshold, the statement is automatically prepared, and from that point on will be executed as prepared, yielding all the performance benefits discussed above. To turn on this feature, you simply need to set the Max Auto Prepare connection string parameter, which determines how many statements can be automatically prepared on the connection at any given time (this parameter defaults to 0, disabling the feature). A second parameter, Auto Prepare Min Usages, determines how many times a statement needs to be executed before it is auto-prepared (defaults to 5). Since no code changes are required, you can simply try setting Max Auto Prepare and running your application to see an immediate speed increase. Note also that, like explicitly-prepared statements, auto-prepared statements are persistent, allowing you to reap the performance benefits in short-lived connection applications.

However, the documentation recommends preparing your statements explicitly rather than automatically since:

Automatic preparation does incur a slight performance cost compared to explicit preparation, because of the internal LRU cache and various book-keeping data structures. Explicitly preparing also allows you to better control exactly which statements are prepared and which aren't, and ensures your statements will always stay prepared, and never get ejected because of the LRU mechanism.

New Norm features let's you do exactly that - explicitly prepare a statement on the server:

Simply add Prepared() call before sending a statement to PostgreSQL like this:

connection.Prepared().Execute("...");

Call Prepared() will only apply to the very next statement. Any other statements in the call chain will be executed normally (unprepared). For example:

connection.Prepared().Execute("statement 1").Execute("statement 2");

In this example, only statement 1 will be executed as prepared.

You can check out at any time the full list of prepared statements on your connection by calling select * from pg_prepared_statements.

In order to avoid an overloading server with too many prepared statements, parameters are saved separately and the statement is parametrized, so we can have one version for regardless of parameter values.

Documentation also states:

You must also set the DbType or NpgsqlDbType on your parameters to unambiguously specify the data type (setting the value isn't support).

This is a bit ambiguous, my tests have shown that statements are prepared successfully regardless of how parameters are supplied (positional, named, with or without type).

So, with Norm data access you can fine-tune prepared statements on your PostgreSQL server.

Postgres format parameters mode

Imagine a typical scenario - you need to send some updates to the database under a transaction and you need, of course, to roll them back if anything goes wrong. And of course, log detailed error description.

How would you do it?

Perhaps the classical approach would work?

var tran = connection.BeginTransaction();
try
{
    connection
        .Execute("insert into test_table values ('test');")
        .Execute("insert into test_table values (null);");
    tran.Commit();
}
catch (Exception e)
{
    tran.Rollback();
    _logger.Error(e);                
}

But, wait for a second!

With this approach, we are actually sending 4 statements to the server. That means that network latency probably will have the biggest impact on performances, especially in the cloud environment.

We are, of course, modern cloud developers, so we'll rewrite this to reduce latency a bit:

try
{
    connection.Execute(@"

        begin;
            insert into test_table values ('test');
            insert into test_table values (null);
        end;
    ");
}
catch (Exception e)
{
    connection.Execute("rollback");
    _logger.Error(e);                
}

That's a bit better, we are sending our 4 statements in a single call. And we can parametrize that normally as any other statement if we want.

Ok, but what the real issue here?

When this call breaks - and it will on insert into test_table values (null); because this test field is not nullable - the transaction will not be actually rolled back, it will go to the aborted state.

That's why the first statement in the catch block is rollback.

That wouldn't be such an issue that any new statement on the same connection will cause following the error (and the new exception):

25P02: current transaction is aborted, commands ignored until end of transaction block

And you may notice that between transaction going to aborted state - we must wait for the server response and send another statement which will rollback transaction after the rollback is transmitted trough network while we have to wait for network latency and bandwidth and all that.

In that time, another thread may want to reuse the same connection (remember, Npgsql have connection pooling by default) - which will then cause dreaded current transaction is aborted, commands ignored until end of transaction block.

If you think that is very unlikely that can happen all I can say that I've seen log files absolutely flooded with current transaction is aborted, commands ignored until end of transaction block and in many cases obfuscating the original error message.

So, what we can do to avoid that?

The solution I've found is to send an anonymous procedural script block as a statement like this:

try
{
    connection.Execute(@"

        do $$
        begin
            insert into test_table values ('test');
            insert into test_table values (null);
        end
        $$;

    ");
}
catch (Exception e)
{
    _logger.Error(e);                
}

This PostgreSQL procedural language (pgpsql) anonymous script block. The pgpsql is a procedural extension to a normal SQL language implemented into PostgreSQL that besides normal SQL have procedural extensions such as control flow, variables, exception handling, etc. It is the same as writing PostgreSQL function or procedure, only anonymous.

What is important here that this will, in case of exception, normally roll back your transaction so don't have to.

So we don't have to call rollback explicitly and more importantly - we will not see that 25P02: current transaction is aborted, commands ignored until end of transaction block in our logs anymore.

But, wait, that is just an example, in the real world we would send bunch of parameters to that script, would that work?

Actually no.

Ngpsql does not send any parameters to pgpsql scripts.

This:

connection
    .Execute(@"

        do $$
        begin
            insert into test_table values (@foo1);
            insert into test_table values (@foo2);
            insert into test_table values (@foo1);
            insert into test_table values (@foo2);
        end
        $$;

", ("foo1", "foo1"), ("foo2", "foo2"));

will not parse parameters at all, and @foo1 will not be recognized.

That is, not if you don't use new Norm functionality UsingPostgresFormatParamsMode:

connection
    .UsingPostgresFormatParamsMode()
    .Execute(@"

        do $$
        begin
            insert into test_table values (@foo1);
            insert into test_table values (@foo2);
            insert into test_table values (@foo1);
            insert into test_table values (@foo2);
        end
        $$;

", ("foo1", "foo1"), ("foo2", "foo2"));

This will instruct Norm to replace parameters with PostgreSQL format function that concatenates corresponding parameter values. Internal PostgreSQL format function will parse user input for SQL injection.

Neat. So, with this new Norm functionality UsingPostgresFormatParamsMode - you can parametrize safely your PostgreSQL procedural anonymous script blocks and avoid that pesky current transaction is aborted exceptions and even in most cases enjoy greater performance benefits by minimizing your network communication with the database. In the cloud era, that could be significant.

Also, to be noted - you can't use UsingPostgresFormatParamsModev with prepared statements. I've prevented that internally by throwing exception :)

If you have any questions, do comment.

About me

I'm an independent software developer and consultant - with more than 20 years of experience in the development of database-backed business applications.

If you're interested in this line of work or related consulting, send me a message or an email (details on my GitHub account or you can find me on LinkedIn) to see if I'm available at the moment.

You can also follow on LinkedIn where I post daily