libpqxx  7.8.1
Supporting additional data types

Communication with the database mostly happens in a text format. When you include an integer value in a query, either you use to_string to convert it to that text format, or under the bonnet, libpqxx does it for you. When you get a query result field "as a float," libpqxx converts from the text format to a floating-point type. These conversions are everywhere in libpqxx.

The conversion system supports many built-in types, but it is also extensible. You can "teach" libpqxx (in the scope of your own application) to convert additional types of values to and from PostgreSQL's string format.

This is massively useful, but it's not for the faint of heart. You'll need to specialise some templates. And, the API for doing this can change with any major libpqxx release.

If that happens, your code may fail to compile with the newer libpqxx version, and you'll have to go through the NEWS file to find the API changes. Usually it'll be a small change, like an additional function you need to implement, or a constant you need to define.

Converting types

In your application, a conversion is driven entirely by a C++ type you specify. The value's SQL type on the database side has nothing to do with it. Nor is there anything in the string that would identify its type. Your code says "convert to this type" and libpqxx does it.

So, if you've SELECTed a 64-bit integer from the database, and you try to convert it to a C++ short, one of two things will happen: either the number is small enough to fit in your short and it just works, or else it throws a conversion exception. Similarly, if you try to read a 32-bit SQL int as a C++ 32-bit unsigned int, that'll work fine, unless the value happens to be negative. In such cases the conversion will throw a conversion_error.

Or, your database table might have a text column, but a given field may contain a string that looks just like a number. You can convert that value to an integer type just fine. Or to a floating-point type. All that matters to the conversion is the actual value, and the type your code specifies.

In some cases the templates for these conversions can tell the type from the arguments you pass them:

auto x = to_string(99);

In other cases you may need to instantiate template explicitly:

auto y = from_string<int>("99");

Supporting a new type

Let's say you have some other SQL type which you want to be able to store in, or retrieve from, the database. What would it take to support that?

Sometimes you do not need complete support. You might need a conversion to a string but not from a string, for example. You write out the conversion at compile time, so don't be too afraid to be incomplete. If you leave out one of these steps, it's not going to crash at run time or mess up your data. The worst that can happen is that your code won't build.

So what do you need for a complete conversion?

First off, of course, you need a C++ type. It may be your own, but it doesn't have to be. It could be a type from a third-party library, or even one from the standard library that libpqxx does not yet support.

First thing to do is specialise the pqxx::type_name variable to give the type a human-readable name. That allows libpqxx error messages and such to talk about the type. If you don't define a name, libpqxx will try to figure one out with some help from the compiler, but it may not always be easy to read.

Then, does your type have a built-in null value? For example, a char * can be null on the C++ side. Or some types are always null, such as nullptr. You specialise the pqxx::nullness template to specify the details.

Finally, you specialise the pqxx::string_traits template. This is where you define the actual conversions.

Let's go through these steps one by one.

Your type

You'll need a type for which the conversions are not yet defined, because the C++ type is what determines the right conversion. One type, one set of conversions.

The type doesn't have to be one that you create. The conversion logic was designed such that you can build it around any type. So you can just as easily build a conversion for a type that's defined somewhere else. There's no need to include any special methods or other members inside the type itself. That's also why libpqxx can convert built-in types like int.

By the way, if the type is an enum, you don't need to do any of this. Just invoke the preprocessor macro PQXX_DECLARE_ENUM_CONVERSION, from the global namespace near the top of your translation unit, and pass the type as an argument.

The library also provides specialisations for std::optional<T>, std::shared_ptr<T>, and std::unique_ptr<T>. If you have conversions for T, you'll also automatically have conversions for those.

Specialise type_name

When errors happen during conversion, libpqxx will compose error messages for the user. Sometimes these will include the name of the type that's being converted.

To tell libpqxx the name of each type, there's a template variable called pqxx::type_name. For any given type T, it should have a specialisation that provides that T's human-readable name:

// T is your type.
namespace pqxx
{
template<> std::string const type_name<T>{"My T type's name"};
}

(Yes, this means that you need to define something inside the pqxx namespace. Future versions of libpqxx may move this into a separate namespace.)

Define this early on in your translation unit, before any code that might cause libpqxx to need the name. That way, the libpqxx code which needs to know the type's name can see your definition.

Specialise nullness

A struct template pqxx::nullness defines whether your type has a natural "null value" built in. If so, it also provides member functions for producing and recognising null values.

The simplest scenario is also the most common: most types don't have a null value built in. There is no "null `int`" in C++. In that kind of case, just derive your nullness traits from pqxx::no_null as a shorthand:

// T is your type.
namespace pqxx
{
template<> struct nullness<T> : pqxx::no_null<T> {};
}

(Here again you're defining this in the pqxx namespace.)

If your type does have a natural null value, the definition gets a little more complex:

namespace pqxx
{
// T is your type.
template<> struct nullness<T>
{
// Does T have a value that should translate to an SQL null?
static constexpr bool has_null{true};
// Does this C++ type always denote an SQL null, like with nullptr_t?
static constexpr bool always_null{false};
static bool is_null(T const &value)
{
// Return whether "value" is null.
return ...;
}
[[nodiscard]] static T null()
{
// Return a null value.
return ...;
}
};
}

You may be wondering why there's a function to produce a null value, but also a function to check whether a value is null. Why not just compare the value to the result of null()? Because two null values may not be equal (like in SQL, where NULL <> NULL). Or T may have multiple different null values. Or T may override the comparison operator to behave in some unusual way.

As a third case, your type may be one that always represents a null value. This is the case for std::nullptr_t and std::nullopt_t. In that case, you set nullness<TYPE>::always_null to true (as well as has_null of course), and you won't need to define any actual conversions.

Specialise string_traits

This part is the most work. You can skip it for types that are always null, but those will be rare.

The APIs for doing this are designed so that you don't need to allocate memory on the free store, also known as "the heap": new/delete. Memory allocation can be hidden inside std::string, std::vector, etc. The conversion API allows you to use std::string for convenience, or memory buffers for speed.

Start by specialising the pqxx::string_traits template. You don't absolutely have to implement all parts of this API. Generally, if it compilers, you're OK for the time being. Just bear in mind that future libpqxx versions may change the API — or how it uses the API internally.

namespace pqxx
{
// T is your type.
template<> struct string_traits<T>
{
// Do you support converting T to PostgreSQL string format?
static constexpr converts_to_string{true};
// Do you support converting PostgreSQL string format to T?
static constexpr converts_from_string{true};
// If converts_to_string is true:
// Write string version into buffer, or return a constant string.
static zview to_buf(char *begin, char *end, T const &value);
// Write string version into buffer.
static char *into_buf(char *begin, char *end, T const &value);
// Converting value to string may require this much buffer space at most.
static std::size_t size_buffer(T const &value) noexcept;
// If converts_from_string is true:
// Parse text as a T value.
static T from_string(std::string_view text);
};
}

You'll also need to write those member functions, or as many of them as needed to get your code to build.

from_string

We start off simple: from_string parses a string as a value of T, and returns that value.

The string may or may not be zero-terminated; it's just the string_view from beginning to end (with end being exclusive). In your tests, be sure to cover cases where the string does not end in a zero byte!

It's perfectly possible that the string doesn't actually represent a T value. Mistakes happen. There can be corner cases. When you run into this, throw a pqxx::conversion_error.

(Of course it's also possible that you run into some other error, so it's fine to throw different exceptions as well. But when it's definitely "this is not the right format for a `T`," throw conversion_error.)

to_buf

In this function, you convert a value of T into a string that the postgres server will understand.

The caller will provide you with a buffer where you can write the string, if you need it: from begin to end exclusive. It's a half-open interval, so don't access *end.

If the buffer is insufficient for you to do the conversion, throw a pqxx::conversion_overrun. It doesn't have to be exact: you can be a little pessimistic and demand a bit more space than you need. Just be sure to throw the exception if there's any risk of overrunning the buffer.

You don't have to use the buffer for this function though. For example, pqxx::string_traits<bool>::to_buf returns a compile-time constant string and completely ignores the buffer.

Even if you do use the buffer, your string does not have to start at the beginning of the buffer. For example, the integer conversions may work from right to left, if that's easier: they can start by writing the least significant digit to the end of the buffer, divide the remainder by 10, and repeat for the next digit.

Return a pqxx::zview. This is basically a std::string_view, but with one difference: when you create a zview you guarantee that there is a valid zero byte right after the string_view. The zero byte does not count as part of its size, but it has to be there.

Expressed in code, this rule must hold:

void invariant(zview z)
{
assert(z[std::size(z)] == 0);
}

The trailing zero should not go inside the zview, but if you convert into the buffer, do make sure you that trailing stays inside the buffer, i.e. before the end. (If there's no room for that zero inside the buffer, throw pqxx::conversion_error).

Beware of locales when converting. If you use standard library features like sprintf, they may obey whatever locale is currently set on the system where the code runs. That means that a simple integer like 1000000 may come out as "1000000" on your system, but as "1,000,000" on mine, or as "1.000.000" for somebody else, and on an Indian system it may be "1,00,000". Don't let that happen, or it will confuse things. Use only non-locale-sensitive library functions. Values coming from or going to the database should be in fixed, non-localised formats.

If your conversions need to deal with fields in types that libpqxx already supports, you can use the conversion functions for those: pqxx::from_string, pqxx::to_string, pqxx::to_buf. They in turn will call the string_traits specialisations for those types. Or, you can call their string_traits directly.

into_buf

This is a stricter version of to_buf. All the same requirements apply, but in addition you must write your string into the given buffer, starting exactly at begin.

That's why this function returns just a simple pointer: the address right behind the trailing zero. If the caller wants to use the string, they can find it at begin. If they want to write another value into the rest of the buffer, they can continue writing at the location you returned.

size_buffer

Here you estimate how much buffer space you need for converting a T to a string. Be precise if you can, but pessimistic if you must. It's usually better to waste a few bytes of space than to spend a lot of time computing the exact buffer space you need. And failing the conversion because you under-budgeted the buffer is worst of all.

Include the trailing zero in the buffer size. If your to_buf takes more space than just what's needed to store the result, include that too.

Make size_buffer a constexpr function if you can. It can allow the caller to allocate the buffer on the stack, with a size known at compile time.

Optional: Specialise is_unquoted_safe

When converting arrays or composite values to strings, libpqxx may need to quote values and escape any special characters. This takes time.

Some types though, such as integral or floating-point types, can never have any special characters such as quotes, commas, or backslashes in their string representations. In such cases, there's no need to quote or escape such values in SQL arrays or composite types.

If your type is like that, you can tell libpqxx about this by defining:

namespace pqxx
{
// T is your type.
template<> inline constexpr bool is_unquoted_safe<T>{true};
}

The code that converts this type of field to strings in an array or a composite type can then use a simpler, more efficient variant of the code. It's always safe to leave this out; it's just an optimisation for when you're completely sure that it's safe.

Do not do this if a string representation of your type may contain a comma; semicolon; parenthesis; brace; quote; backslash; newline; or any other character that might need escaping.

Optional: Specialise param_format

This one you don't generally need to worry about. Read on if you're writing a type which represents raw binary data, or if you're writing a template where some specialisations may contain raw binary data.

When you call parameterised statements, or prepared statements with parameters, libpqxx needs to pass your parameters on to libpq, the underlying C-level PostgreSQL client library.

There are two formats for doing that: text and binary. In the first, we represent all values as strings in the PostgreSQL text format, and the server then converts them into its own internal binary representation. That's what those string conversions above are all about, and it's what we do for almost all types of parameters.

But we do it differently when the parameter is a contiguous series of raw bytes and the corresponding SQL type is BYTEA. There is a text format for those, but we bypass it for efficiency. The server can use the binary data in the exact same form, without any conversion or extra processing. The binary data is also twice as compact during transport.

(People sometimes ask why we can't just treat all types as binary. However the general case isn't so clear-cut. The binary formats are not documented, there are no guarantees that they will be platform-independent or that they will remain stable across postgres releases, and there's no really solid way to detect when we might get the format wrong. On top of all that, the conversions aren't necessarily as straightforward and efficient as they sound. So, for the general case, libpqxx sticks with the text formats. Raw binary data alone stands out as a clear win.)

Long story short, the machinery for passing parameters needs to know: is this parameter a binary string, or not? In the normal case it can assume "no," and that's what it does. The text format is always a safe choice; we just try to use the binary format where it's faster.

The param_format function template is what makes the decision. We specialise it for types which may be binary strings, and use the default for all other types.

"Types which _may_ be binary"? You might think we know whether a type is a binary type or not. But there are some complications with generic types.

Templates like std::shared_ptr, std::optional, and so on act like "wrappers" for another type. A std::optional<T> is binary if T is binary. Otherwise, it's not. If you're building support for a template of this nature, you'll probably want to implement param_format for it.

The decision to use binary format is made based on a given object, not necessarily based on the type in general. Look at std::variant. If you have a std::variant type which can hold an int or a binary string, is that a binary parameter? We can't decide without knowing the individual object.

Containers are another hard case. Should we pass std::vector<T> in binary? Even when T is a binary type, we don't currently have any way to pass an array in binary format, so we always pass it as text.