Denodo Delegated Queries

A tale of virtual data, delegated queries and truncated text

— Binary Adventures

Virtual what?

Denodo, or any data virtualisation technology, has many interesting use cases, ranging from unified authorisation, single access point for all data, uniform data presentation, ETL prototyping, … There’s too much to sum up in just a single paragraph, but I’ll finish with this: data virtualisation is at its best when you don’t notice it’s there. If you’re interested in finding out more, have a look here.

When looking at performance issues, which are often experienced at the reporting layer, it’s necessary to do a deep dive into what’s happening underneath the bonnet. There’s a myriad of potential causes: rendering issues, resource issues on the application server, network latency, … And sometimes, it’s the (generated) queries which are suboptimal and need further tuning. As Denodo is a virtual layer, the queries you execute on it (it presents itself as an ordinary relational database) are translated into one or more queries, sent downstream to the various data sources. This process is called query delegation.

While an execution trace (explain plan) might look fine when generated in Denodo, the explain plan of one of the delegated queries could be absolutely horrendous performance-wise. So how do we find out if that’s the case? That’s why you’re here, isn’t it?

Going Down

Denodo has a stored procedure, GET_DELEGATED_SQLSENTENCE() which translates the query from the virtual layer into the query executed on the data source. Alas, one of the big drawbacks here is that it only works if the query can be pushed down to a single data source, whereas one of the benefits of having virtualisation technology, is to transparently query multiple data sources simultaneously.

First things first, we need to capture the queries being executed in our virtual layer. No worries, there’s something called Denodo Monitor which does just that (and more). You can even configure it to log those queries directly into a database of your choice (anything that has a JDBC driver for it), making it even easier to find the culprits causing performance woes.

Note: As an alternative, you could opt to perform a trace on database level.

We’ve identified the queries coming from our reporting layer, but now we need to find out how they’re translated on the data source. We could just as well come full circle and define the logging table as a view in Denodo, query it, and run the SQL statements through GET_DELEGATED_SQLSENTENCE() in a single go. Theoretically yes, but remember how the stored procedure only works for queries that target a single data source? When a query doesn’t, GET_DELEGATED_SQLSENTENCE() returns an error instead of silently failing, thereby interrupting execution. Thus, you need to execute it for each query individually, and capture any errors as you go.

And that’s where the below script comes in. Simply said, you feed it the WHERE clause used to retrieve the relevant log records, and point it to the corresponding Denodo instance.

Close, but no cigar

So, we’re done here? Not quite. There’s a rather peculiar issue with the .NET DataTable class and ODBC, which will trigger the error Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints." when the query contains a column defined as VARCHAR(MAX). The error seems to be pointing at a schema issue, but after further investigation, it has nothing to do with it. There’s an open question on Stack Overflow, but so far no magic bullet.

Workarounds are available, such as casting the datatype to TEXT or to a VARCHAR of predefined size, and configuring the ODBC DSN correspondingly. This is why you’re might notice the CAST(query as TEXT) bit in the script, or the CAST(query as VARCHAR(…)). For Microsoft SQL Server, the TEXT datatype offers the same size as VARCHAR(MAX), so there’s no potential data truncation.

Alas, when we query Denodo (whose ODBC driver is derived from PostgreSQL), the same solution can’t be applied and we have to convert to a fixed size VARCHAR, which is further limited to 65K bytes (limitation imposed by the driver). The DSN also needs to pass the same value for the MaxLongVarcharSize parameter.

In the example script, the logging database is SQL Server, which is why you’ll see the CAST(query as TEXT):

$Query = @"
SELECT autoincrementid , cast(query as TEXT) as query, duration, numrows
  FROM denodo_logs.dbo.request_notification
 WHERE notificationtype = 'endRequest'
   AND requesttype  = 'SELECT VIEW'
   AND state = 'OK'
   --
   AND $WhereClause
"@

The script checks the length of the delegated query returned, and if it matches the maximum size, it will issue a warning.

if ($SourceQuery.Length -eq $STRING_MAX_SIZE) {
    Write-Warning "Source query might have been truncated for ID $($Row.autoincrementid) (max. of $STRING_MAX_SIZE characters)"
}

Are we there yet?

Another issue (more? sheesh!) is the export to Excel. Whether you love or loathe it, most of our data ends up in Excel sooner or later. As it’s such a convenient method to share data, I used the ImportExcel module to export the delegated queries to Excel. However, depending on the complexity of your queries, you might end up with monstrosities of more than 32.767 characters, which is the maximum number of characters Excel supports in a cell. That’s the reason for the Select-Object cmdlet with the calculated property to truncate the text to 32.767 characters, in order to prevent file corruption.

Select-Object `
    autoincrementid, duration, numrows, `
    @{name='query'; expression = {$_.query.substring(0, [System.Math]::Min($_.query.length, $EXCEL_CELL_LIMIT)) } } `
    @{name='query_source'; expression = {$_.query_source.substring(0, [System.Math]::Min($_.query_source.length, $EXCEL_CELL_LIMIT)) } }

Remarks

  • The script depends on odbc_utils.ps1 to generate the connection strings (only for Denodo and MSSQL currently) and execute the query through ODBC.
  • You’ll have to modify the calls to the connection string functions to correspond to your own environment.

Code

References