Tuesday, October 30, 2018

Accessing Teradata from SQL Relay

Intro

Teradata, the company, has been around, in one form or another, since 1979. Their EDW-focused database has been around since the mid-80's. That's years before "data-warehouse" was even a term. I always love when something has been around longer than the word for what the thing is.

In the late 90's I remember reading that Wal-Mart was data-mining more than a terabyte database of customer info. At the time, I couldn't even get my mind around how much data that was. I couldn't even imagine the storage array. Turns out, that was a Teradata database, and it had apparently been running for years (since 1992) when I finally heard about it. In the early 90's, I'm sure the term "terabyte" technically existed, but I doubt many people had actually spoken the word.

Pretty amazing!

I recently needed to get SQL Relay talking to a Teradata database. A quick search revealed an Express version and ODBC drivers for every conceivable platform, so I figured it would be straightforward. It turned out to be reasonably straightforward, but I did hit a few snags worth mentioning to anyone else pursuing the same goal.

Details follow...


Teradata Express

Teradata Express is available as a VMware image. You have to create an account to download it, but once you do, you're presented with a 7z file that you can extract on linux as follows:

7za x TDExpress16.20.12.01_Sles11_20180620112938.7z

It expands into a directory named TDE16.20.12.01 which contains a vmx file and a couple of vmdk files. Running it is as simple as aiming VMware Player or Workstation at the vmx and clicking Start. It requires 4G of memory to run though, so make sure to shut down or suspend whatever else you might have to to free that up before you actually click Start or your system will be paging for the next 10 minutes.

When you start it up, VMware will ask you if you moved or copied the VM. In this case, it's safe to click either "I Moved It" or "I Copied It". "I Copied It" will just assign the NIC a new virtual MAC address.

The OS is apparently SuSE Enterprise 11, and VMware tools is already installed. It boots up to a super primitive-looking X login, but after you log in, the desktop appears to be Gnome 2 with a few SuSE customizations.

The root password is "root". Poking around, there appear to be other logins too, but I haven't tried any of them.

The database starts by default. If you manage to stop it, there's an icon on the desktop to restart it.

The database also comes configured with a user named "dbc" with password "dbc".

There's an icon for the Teradata Studio Express on the desktop. It's a full-featured graphical database shell, like Toad, or RazorSQL, or other similar tools.

It appears to be based on Eclipse too, which I thought was neat. I'd long heard that Eclipse isn't just an IDE, but rather a framework for building generic IDEs, and Eclipse-proper is more like a flagship-example of the technology. I'd never actually seen (or noticed) another Eclipse-framework-based tool though, until this one.

You can create a new database connection to the local server using:

  • Connection Profile Type: Teradata
  • Database Server Name: localhost
  • User Name[Domain]: dbc
  • Password: dbc

After connecting, you can run queries and play around, in general.

If you want to create another user, you can log in as dbc and run a command like:

create user testuser as password=testpassword perm=524288000 spool=524288000;

The perm and spool arguments are required. The perm parameter specifies the maximum amount of permanent storage allocated to the user. The spool parameter specifies the amount of "spool space" allocated to the user. It's not immediately clear what either of these parameters mean, but they're both required. I don't remember where I even got those sizes from, probably from some example online, but I haven't run into any problems with them yet, in my Express instance.

One quirk though... After running the command, the UI replaces the password with *'s, and then displays a red X to the right of the query. Mouse-overing the red X pops up an error. This may lead you to think that the query failed. It didn't. Rather, the error is just complaining about the *'s being invalid SQL. It kind-of makes sense, but it's confusing.

To drop a user, log in as dbc and run a command like:

drop user testuser;

Various Teradata command line tools are also installed, like bteq (the command line db shell), fastload, and tdload. If you want to write some programs, ODBC and JDBC drivers are also installed, and the VM comes with gcc 4.3 and Oracle Java 8. ODBC doesn't appear to be configured though.

I'm sure that the database has various limits imposed on it. I don't know what they are offhand, or whether I've even run up against them or not. If you're just interested in getting familiar with the technology though, it's a pretty good environment.

The virtual NIC is configured to grab an address from DHCP by default, but you'll probably want to give it a static IP if you plan on hitting the database from another machine. Just navigate to Computer -> Control Center -> Network Settings to access the network configuration tool. The only quirky bit is that the VM is apparently configured without a hostname, and the tool is rather adamant that you give it one.

In case you want to access the VM remotely, the ssh server is running, and root logins aren't disabled.


Installing Teradata ODBC

I wanted to access the database from SQL Relay on a remote machine - another VM, running Fedora 26. The most straightforward way to do this seemed to be to install and configure the Teradata ODBC Driver for Linux and then configure SQL Relay to use that.

The driver seems semi-straightforward to install. BUT! It tends to inadvertently sabotage your existing ODBC infrastructure, if you have one. So, it actually requires a bit of special handling.

The problem is that it contains its own copies of:

  • /lib/libodbc.so
  • /lib/libodbcinst.so
  • /lib64/libodbc.so
  • /lib64/libodbcinst.so

I guess this is so that it doesn't have to depend on an existing unixODBC installation. I don't know. All I know is, that if you install the software, it will overwrite those files, and the new files tend to break previously-working ODBC configurations.

So, if you have an existing unixODBC installation, then to safely install the Teradata driver alongside of it, you have to move those files out of the way, do the installation, them move them back. The Teradata driver appears to work fine with the libraries provided by unixODBC, at least in Fedora 26.

If you have an existing unixODBC installation, here's what you have to do to install the Teradata Driver:

If you have 32-bit unixODBC installed:

cd /lib
sudo mv libodbc.so libodbc.so.save
sudo mv libodbcinst.so libodbcinst.so.save

If you have 64-bit unixODBC installed:

cd /lib64
sudo mv libodbc.so libodbc.so.save
sudo mv libodbcinst.so libodbcinst.so.save

If you're running some version of Linux other than Fedora, then you'll have to find and move the appropriate libraries for your platform.

To actually install the driver:

tar xfz tdodbc1620__linux_indep.16.20.00.36-1.tar.gz
cd tdodbc1620
sudo ./setup_wrapper.sh

Hit return to allow it to install in /opt

Afterwards, it will run unattended and install tdodbc1620-16.20.00.36-1.noarch.rpm. Oddly, this rpm is marked "noarch" but it actually installs binaries for both x86 and x64 platforms.

Post-install, you have to move the newly installed libodbc.so and libiodbc.so links out of the way, as follows:

cd /lib
sudo mv libodbc.so libodbc.so.teradata
sudo mv libodbcinst.so libodbcinst.so.teradata
cd /lib64
sudo mv libodbc.so libodbc.so.teradata
sudo mv libodbcinst.so libodbcinst.so.teradata

Then, if you have 32-bit unixODBC installed:

cd /lib
sudo mv libodbc.so.save libodbc.so
sudo mv libodbcinst.so.save libodbcinst.so

Or, if you have 64-bit unixODBC installed:

cd /lib64
sudo mv libodbc.so.save libodbc.so
sudo mv libodbcinst.so.save libodbcinst.so

And that is it. The ODBC driver for Teradata is now installed.


Configuring Teradata ODBC

Configuring the Teradata ODBC Driver is a lot simpler than installing it.

You don't have to add anything to /etc/odbcinst.ini, just append a DSN like the following to /etc/odbc.ini

[teradata]
# This key is not necessary and is only to give a description of the data source.
Description=Teradata Database ODBC Driver 16.20

# Driver: The location where the ODBC driver is installed to.
Driver=/opt/teradata/client/16.20/lib64/tdataodbc_sb64.so

# Required: These values can also be specified in the connection string.
DBCName=192.168.123.101
UID=testuser
PWD=testpassword

# Optional
AccountString=
CharacterSet=ASCII
DatasourceDNSEntries=
DateTimeFormat=IAA
DefaultDatabase=
DontUseHelpDatabase=0
DontUseTitles=1
EnableExtendedStmtInfo=1
EnableReadAhead=1
IgnoreODBCSearchPattern=0
LogErrorEvents=0
LoginTimeout=20
MaxRespSize=65536
MaxSingleLOBBytes=0
MaxTotalLOBBytesPerRow=0
MechanismName=
NoScan=0
PrintOption=N
retryOnEINTR=1
ReturnGeneratedKeys=N
SessionMode=System Default
SplOption=Y
TABLEQUALIFIER=0
TCPNoDelay=1
TdmstPortNumber=1025
UPTMode=Not set
USE2XAPPCUSTOMCATALOGMODE=0
UseDataEncryption=0
UseDateDataForTimeStampParams=0

The most important parameters are:

  • DBCName - the hostname or IP of the database (192.168.123.101 in my case, but maybe different in your environment)
  • UID - the username to log in to the database with (testuser in my case, but could also be dbc or another user)
  • PWD - the password corresponding to the UID

The rest of the parameters do various things which can be researched online, but aren't critical to change for general operation.

Once configured, you can connect to the database using isql, provided by unixODBC.

$ isql teradata
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select 1
+-----+
| 1   |
+-----+
| 1   |
+-----+
SQLRowCount returns 1
1 rows fetched
SQL>

If that works, then the Teradata ODBC Driver has been configured successfully.


Configuring SQL Relay

The ultimate goal here is to access Teradata through SQL Relay. After getting everything else working, that last bit is pretty simple. Assuming SQL Relay is already installed, all you have to do is update the configuration file (either sqlrelay.conf or a file in sqlrelay.conf.d) with a teradata instance, as follows:

<?xml version="1.0"?>
<instances>

    <instance id="teradataexample" dbase="odbc">
        <users>
            <user user="exampleuser" password="examplepassword"/>
        </users>
        <connections>
            <connection string="dsn=teradata;user=testuser;password=testpassword;autocommit=yes"/>
        </connections>
    </instance>

</instances>

In the connection tag, the dsn option must match the DSN defined in /etc/odbc.ini, and the user/password options must match the UID/PID definied in that DSN.

The user/password defined in the user tag are the user/password that you'll use to log into SQL Relay itself.

To start it up:

sqlr-start -id teradataexample

To access the database:

$ sqlrsh -host localhost -user exampleuser -password examplepassword
sqlrsh - Version 1.4.0
 Connected to: localhost:9000 as exampleuser

 type help; for help.

0> create table test (col1 int, col2 varchar(200));
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.063005 sec

0> insert into test values (1,'hello');
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.029303 sec

0> select * from test;
col1 col2 
==========
1    hello

 Rows Returned   : 1
 Fields Returned : 2
 Elapsed Time    : 0.030262 sec

0> drop table test;
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.077361 sec

0> quit;

To shut it down:

sqlr-stop -id teradataexample

If all of that worked, then you can now access Teradata from SQL Relay.


Quirks

SQL Relay -> ODBC -> Teradata is generally usable, but ODBC drivers are quirky, as a rule, so there are probably things that don't work correctly. I've already discovered a few esoteric ones and I'll be updating the ODBC connection module with Teradata-specific workarounds as appropriate.

If you run into anything weird, please report it to support@firstworks.com.

Thanks!

Tuesday, October 23, 2018

Wrangling mydumper and Auto-Increment Columns

One of my client's apps has a large MySQL database that I regularly have to get a dump of and reload a local database from the dump for dev and testing. To expedite this process, I recently started using the excellent mydumper/myloader in place of the venerable mysqldump. It does shave considerable time off of the reload process, but I got strange results the first few times I tried to use it.

Basically, there were a few tables that look like this in production:

col1  |  col2
------+----------------
  0   |  some value
  1   |  some other value
  2   |  some third value
 ...  |  ...and so on...

When I'd use mysqldump to dump them, and source the dump to load my local db, it would reliably dump/load the same data, in the same order.

When I'd use mydumper/myloader, I'd get something like:

col1  |  col2
------+----------------
  1   |  some other value
  2   |  some third value
 ...  |  ...and so on...
  20  |  some value

The first row (with col1 = 0) would end up at the end of the table and col1 would have the wrong value. The app really cares too, so I had to figure out what was happening, and how to fix it.

This took a while, but the problem ultimately came down to:

  • col1 is an auto-increment column
  • mydumper and mysqldump both output instructions to:
    • create the table
    • reset the next-auto-increment value to whatever it was in the database that is being dumped
    • insert rows, using the exact values from the database that is being dumped
  • by default, if you insert a 0 into an auto-increment column, mysql substitutes the next auto-increment value for the 0
  • mysqldump outputs one big .sql file to create tables and insert rows, and it includes instructions to disable this behavior
  • mydumper outputs lots of individual files (for each table - a script to create the table, and a binary full of data to load into it) but none of them contain instructions to disable this behavior

So, basically if you source a script created by mysqldump, if col1 = 0, then you get a 0 for col1, but if you use myloader to load the output of mydumper, if col1 = 0, then you get whatever the next auto-increment value is. For all other values of col1, both work as expected.

Working around the problem was tricky. There's no obvious way to add a SET SESSION sql_mode='...,NO_AUTO_VALUE_ON_ZERO' to the mydumper binary data file. I thought about adding it to each of the table-create scripts, but I wasn't sure if the data-load would necessarily be run in the same session as the create, so the SET SESSION might not even be in effect when the data was being loaded.

I could have run a SET GLOBAL sql_mode='...,NO_AUTO_VALUE_ON_ZERO' prior to the dump, but I reboot the system all the time and I'd certainly forget to re-run it the next time, and even if I remembered, I'd forget exactly what it was that I needed to run. The only sure-fire solution was to alter the sql_mode of the server itself.

This involved adding a line to /etc/mysql/my.cnf, in the [mysqld] section, like this:

[mysqld]
...
sql-mode = NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO

...and bouncing the server. Actually, there's probably some way to get the server to reload the options, but I could easily bounce mine, so I did.

The NO_AUTO_VALUE_ON_ZERO part was the operative part, but I had to add the other options because they were apparently already in effect, according to:

select @@sql_mode

The sketchy part of this solution is that I'm not 100% sure that somewhere deep in that app (or in another app that uses the same DB), there isn't some query that relies on NO_AUTO_VALUE_ON_ZERO being disabled. So, arguably, this isn't the best solution. I don't know of a better one though, so for now I'm going with this.

Ideally, it would be great if mydumper handled this, per-session, itself. I think I'll submit a feature request...

(Update: looks like there's already an open issue for this: https://github.com/maxbube/mydumper/issues/142)

PPTP on CentOS 7

I recently needed to get a PPTP VPN connection working from a CentOS 7 machine to one of my clients. I'd gotten the same working from MINT Linux a few weeks earlier, and it was as simple as clicking the network icon on the toolbar, selecting VPN Connections -> Configure VPN, and adding a PPTP connection. I figured it would be equally simple on CentOS 7, but I was surprised to find that the list of VPN types didn't include PPTP.

This was confusing because NetworkManager-pptp and NetworkManager-pptp-gnome packages were both installed. Long story short... As of 10/23/18, CentOS 7 uses NetworkManager 1.10 and compatible versions of the PPTP-related RPMs aren't available in the CentoOS repos. If you want to get it working, you have to build and install new enough RPMs from source.

Steps to accomplish this follow...

Make sure NetworkManager-pptp and NetworkManager-pptp-gnome are installed, just to be sure that you have various dependencies installed as well:

sudo yum install NetworkManager-pptp NetworkManager-pptp-gnome

Set up an RPM-build environment (if you don't already have one set up). This is most easily done by installing the fedora-packager meta-package and running a handy script that it provides:

sudo yum install fedora-packager @development-tools
rpmdev-setuptree

Note that rpmdev-setuptree wasn't run using sudo. This script should be run as you. It basically creates a .rpmmacro file and rpmbuild tree under your home directory.

Download a new enough source RPM for NetworkManager-pptp from somewhere. A new enough version is included in Fedora Core 28, for example:

wget https://dl.fedoraproject.org/pub/fedora/linux/updates/28/Everything/SRPMS/Packages/n/NetworkManager-pptp-1.2.8-1.fc28.src.rpm

Install packages that you'll need to build the RPMs:

sudo yum install glib2-devel gtk3-devel NetworkManager-libnm-devel libnma-devel ppp-devel libtool intltool libsecret-devel NetworkManager-devel NetworkManager-glib-evel libnm-gtk-devel

Build the RPMs:

rpmbuild --rebuild NetworkManager-pptp-1.2.8-1.fc28.src.rpm

And, finally, install the RPMs that you just built:

cd ~/rpmbuild/RPMS/x86_64
sudo yum install NetworkManager-pptp-1.2.8-1.el7.x86_64.rpm NetworkManager-pptp-gnome-1.2.8-1.el7.x86_64.rpm

(Of course, this assumes that you're building on an x86_64 host. Adjust the path and file names accordingly if you are not.)

At this point, if you try to create a new VPN connection, you should see "Point-to-Point Tunneling Protocol (PPTP)" in the list of available VPN types.

You can also safely remove NetworkManager-pptp-1.2.8-1.fc28.src.rpm at this point. If you don't think you'll be building any more RPMs any time soon, then you can also remove ~/.rpmmacros and the ~/rpmbuild tree. If you find that you do need to build more RPMs, then you can just rpmdev-setuptree again to re-setup the RPM-build environment.

Good luck!