Tuesday, October 23, 2018

Wrangling mydumper and Auto-Increment Columns

One of my client's apps has a large MySQL database that I regularly have to get a dump of and reload a local database from the dump for dev and testing. To expedite this process, I recently started using the excellent mydumper/myloader in place of the venerable mysqldump. It does shave considerable time off of the reload process, but I got strange results the first few times I tried to use it.

Basically, there were a few tables that look like this in production:

col1  |  col2
------+----------------
  0   |  some value
  1   |  some other value
  2   |  some third value
 ...  |  ...and so on...

When I'd use mysqldump to dump them, and source the dump to load my local db, it would reliably dump/load the same data, in the same order.

When I'd use mydumper/myloader, I'd get something like:

col1  |  col2
------+----------------
  1   |  some other value
  2   |  some third value
 ...  |  ...and so on...
  20  |  some value

The first row (with col1 = 0) would end up at the end of the table and col1 would have the wrong value. The app really cares too, so I had to figure out what was happening, and how to fix it.

This took a while, but the problem ultimately came down to:

  • col1 is an auto-increment column
  • mydumper and mysqldump both output instructions to:
    • create the table
    • reset the next-auto-increment value to whatever it was in the database that is being dumped
    • insert rows, using the exact values from the database that is being dumped
  • by default, if you insert a 0 into an auto-increment column, mysql substitutes the next auto-increment value for the 0
  • mysqldump outputs one big .sql file to create tables and insert rows, and it includes instructions to disable this behavior
  • mydumper outputs lots of individual files (for each table - a script to create the table, and a binary full of data to load into it) but none of them contain instructions to disable this behavior

So, basically if you source a script created by mysqldump, if col1 = 0, then you get a 0 for col1, but if you use myloader to load the output of mydumper, if col1 = 0, then you get whatever the next auto-increment value is. For all other values of col1, both work as expected.

Working around the problem was tricky. There's no obvious way to add a SET SESSION sql_mode='...,NO_AUTO_VALUE_ON_ZERO' to the mydumper binary data file. I thought about adding it to each of the table-create scripts, but I wasn't sure if the data-load would necessarily be run in the same session as the create, so the SET SESSION might not even be in effect when the data was being loaded.

I could have run a SET GLOBAL sql_mode='...,NO_AUTO_VALUE_ON_ZERO' prior to the dump, but I reboot the system all the time and I'd certainly forget to re-run it the next time, and even if I remembered, I'd forget exactly what it was that I needed to run. The only sure-fire solution was to alter the sql_mode of the server itself.

This involved adding a line to /etc/mysql/my.cnf, in the [mysqld] section, like this:

[mysqld]
...
sql-mode = NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO

...and bouncing the server. Actually, there's probably some way to get the server to reload the options, but I could easily bounce mine, so I did.

The NO_AUTO_VALUE_ON_ZERO part was the operative part, but I had to add the other options because they were apparently already in effect, according to:

select @@sql_mode

The sketchy part of this solution is that I'm not 100% sure that somewhere deep in that app (or in another app that uses the same DB), there isn't some query that relies on NO_AUTO_VALUE_ON_ZERO being disabled. So, arguably, this isn't the best solution. I don't know of a better one though, so for now I'm going with this.

Ideally, it would be great if mydumper handled this, per-session, itself. I think I'll submit a feature request...

(Update: looks like there's already an open issue for this: https://github.com/maxbube/mydumper/issues/142)