Wednesday, December 21, 2016

SQL Relay 1.0.0 Benchmarks

Introduction

There are a lot of reasons to use SQL Relay, but one of the most popular is just to improve throughput, usually measured in queries-per-second.

Logging in to most databases takes a noticable amount of time...

So much time, in fact, that transient apps (such as web-based applications) which start up, connect, query, disconnect, and shut down to deliver each bit of content, often take longer to log in than to run the queries. As a result, overall throughput (queries-per-second) is generally low unless many queries are run per connection.

SQL Relay maintains a pool of already-logged-in database connections, and logging in to a connection pool is substantially faster than logging into the database directly.

As a result, overall throughput (queries-per-second) should be maximized, and shouldn't depend nearly as much on the number of queries that are run per connection.

In theory.

Anecdotally this is the case and informal testing has always confirmed it. But, I wanted hard data and pretty graphs to prove it. Also, it had been a long time since I'd done any formal performance testing. For all I knew, I'd done something recently to make everything run horribly.


sqlr-bench

Of course there are popular tools for benchmarking apps and databases. JBench, for example. But they all use intermediate layers - JDBC, ODBC, web pages... I wanted raw comparisons: the database's native API vs. SQL Relay's native API. I also wanted it to be easy to run from the command-line, and automatable.

Months ago, I'd begun writing a program called "sqlr-bench" which aimed to run a battery of tests against the database directly, run the same tests through SQL Relay, collect stats, and produce a graph. It had to support every database that SQL Relay supported. It needed a dozen or more tunable paramters... It was a fairly ambitious project. I started and stopped work on it several times but I eventually got it working well enough.

It creates a table, populates it, and runs a series of selects. First it runs 1 select per connection, then 2 per connection, then 3, etc. It times all of this and calculates queries-per-second as a function of queries-per-connection. The whole process is done first against SQL Relay, and then directly against the database. The data is then correlated and gnuplot is used to produce a graph.

Various parameters are configurable, but the defaults are:

  • rows: 256
  • colums: 16
  • column size: 32
  • rows to fetch at once: 10
  • samples: 10
  • queries: 30

Initial Results

I ran the tests against each database, and against instances of SQL Relay running on the local machine (over a unix socket).

Theoretically I figured I should see...

  • ...against the database directly: The number of queries-per-second should increase steadily as the number of queries-per-connection increases. It should start off to be pretty bad, then get better and better, and eventually level off so that running more queries-per-connection doesn't make any more difference.
  • ...against SQL Relay: The number of queries-per-second should be independent of the number of queries-per-connection, and should be roughly the same as the maximum number of queries-per-second that can be run against the database directly.

Performance against the database directly was exactly as I expected, and it generally seemed to level off at about 20 queries-per-connection.

SQL Relay actually does have a small ramp-up period (2 or 3 queries per-connection is slower than 4 or more), which I didn't expect, but the number of queries-per-second were otherwise uncorrelated with the number of queries-per-connection, which I did.

Overall though, SQL Relay did not immediately perform as well as I expected. In fact, for most databases, there was a "break-even" between 5 and 10 queries per-connection.

While apps that use SQL Relay would still likely perform better than if run directly against the database (how many web pages run more than 5 queries that return 256 rows each), I really expected it to perform better.


Tweaks

Thus began days of profiling and analyzation. It appeared that, indeed, over the years new features had steadily introduced inefficiencies.

In most cases, things just needed to be reused instead of reallocated. Everything from database cursors to byte arrays. I replaced a few regular expressions with simple string compares, and made a few case-insensitive string compares case-sensitive. There were also some alignment and padding issues that prevented optimized strcpy/memcpy from being used. My socket buffer sizes were sub-optimal...

Lots of little changes.


Results

In the end, I was pleased with the results. Most of them, anyway.

Informix:

Oh yeah, that looks great.

MySQL:

Still looking good.

Oracle:

Not bad, but why the falloff?

DB2:

It's good where it counts, but why does DB2 eventually outperform SQL Relay?

SAP/Sybase:

Again, it's good where it counts, but throughput is generally low. And again, why the falloff?

PostgreSQL:

My goodness, PostgreSQL is fast!

I checked the code over and over to make sure my test program was legitimately running queries and not just erroring out or something. It was. Why is it so much faster than everything else? Strace showed that the socket is in non-blocking mode and that they do a poll() then read(), basically blocking manually rather than doing a blocking read. My preliminary tests with a similar strategy didn't show much improvement with that over a blocking read though. Non-blocking poll()/read() is a good way to multiplex reads through a single thread, but if you're only reading from one socket at a time, I wouldn't expect it to help at all.

Very strange.

SQLite:

Well, I guess I shouldn't really be disappointed that SQL Relay is slower than SQLite :)

FreeTDS:

82 queries-per-second??? Does SQL Server limit throughput?


What now?

The falloffs are strange. It's odd that it's so inconsistent. There doesn't appear to be a memory leak. Heap fragmentation?

Performance against DB2 and SAP is odd. It shouldn't break even so quickly.

Performance against PostgreSQL is the most strange. Whatever they're doing, SQL Relay needs to be doing it too.

And what's the deal with MS SQL Server?

These things deserve investigation.

I ran tests with SQL Relay on a server between the client and database too. The results were similar, but there was more of a ramp-up. I should publish those results too.


Test Environment

To perform these tests, I configured 2 Fedora Linux VMware VM's. One for the databases and one to run SQL Relay. Both VM's had 1 CPU. The database VM had 2Gb of RAM. The SQL Relay VM had 512MB.

The DB VM ran modern versions of all databases, in their default configurations, without any performance tweaks.

I built Rudiments and SQL Relay without debug and with -O3 optimizations like: CPPFLAGS="-O3" ./configure

The SQL Relay configurations were all pretty standard too. For example, the Oracle configuration was something like:

 <instance id="oracletest" port="9000" socket="/tmp/test.socket" dbase="oracle">
  <users>
   <user user="test" password="test"/>
  </users>
  <connections>
   <connection string="user=...;password=...;oracle_sid=..."/>
  </connections>
 </instance>

For the PostgreSQL tests I disabled typemangling.

The sqlr-bench program can be found in the test/bench directory of the SQL Relay distribution. It must be run from that directory. I ran it with default parameters. There's a benchall.sh script there too, which starts and stops the appropriate SQL Relay instances and runs sqlr-bench. I actually used that script rather than running the tests individually.