Tuesday, October 30, 2018

Accessing Teradata from SQL Relay

Intro

Teradata, the company, has been around, in one form or another, since 1979. Their EDW-focused database has been around since the mid-80's. That's years before "data-warehouse" was even a term. I always love when something has been around longer than the word for what the thing is.

In the late 90's I remember reading that Wal-Mart was data-mining more than a terabyte database of customer info. At the time, I couldn't even get my mind around how much data that was. I couldn't even imagine the storage array. Turns out, that was a Teradata database, and it had apparently been running for years (since 1992) when I finally heard about it. In the early 90's, I'm sure the term "terabyte" technically existed, but I doubt many people had actually spoken the word.

Pretty amazing!

I recently needed to get SQL Relay talking to a Teradata database. A quick search revealed an Express version and ODBC drivers for every conceivable platform, so I figured it would be straightforward. It turned out to be reasonably straightforward, but I did hit a few snags worth mentioning to anyone else pursuing the same goal.

Details follow...


Teradata Express

Teradata Express is available as a VMware image. You have to create an account to download it, but once you do, you're presented with a 7z file that you can extract on linux as follows:

7za x TDExpress16.20.12.01_Sles11_20180620112938.7z

It expands into a directory named TDE16.20.12.01 which contains a vmx file and a couple of vmdk files. Running it is as simple as aiming VMware Player or Workstation at the vmx and clicking Start. It requires 4G of memory to run though, so make sure to shut down or suspend whatever else you might have to to free that up before you actually click Start or your system will be paging for the next 10 minutes.

When you start it up, VMware will ask you if you moved or copied the VM. In this case, it's safe to click either "I Moved It" or "I Copied It". "I Copied It" will just assign the NIC a new virtual MAC address.

The OS is apparently SuSE Enterprise 11, and VMware tools is already installed. It boots up to a super primitive-looking X login, but after you log in, the desktop appears to be Gnome 2 with a few SuSE customizations.

The root password is "root". Poking around, there appear to be other logins too, but I haven't tried any of them.

The database starts by default. If you manage to stop it, there's an icon on the desktop to restart it.

The database also comes configured with a user named "dbc" with password "dbc".

There's an icon for the Teradata Studio Express on the desktop. It's a full-featured graphical database shell, like Toad, or RazorSQL, or other similar tools.

It appears to be based on Eclipse too, which I thought was neat. I'd long heard that Eclipse isn't just an IDE, but rather a framework for building generic IDEs, and Eclipse-proper is more like a flagship-example of the technology. I'd never actually seen (or noticed) another Eclipse-framework-based tool though, until this one.

You can create a new database connection to the local server using:

  • Connection Profile Type: Teradata
  • Database Server Name: localhost
  • User Name[Domain]: dbc
  • Password: dbc

After connecting, you can run queries and play around, in general.

If you want to create another user, you can log in as dbc and run a command like:

create user testuser as password=testpassword perm=524288000 spool=524288000;

The perm and spool arguments are required. The perm parameter specifies the maximum amount of permanent storage allocated to the user. The spool parameter specifies the amount of "spool space" allocated to the user. It's not immediately clear what either of these parameters mean, but they're both required. I don't remember where I even got those sizes from, probably from some example online, but I haven't run into any problems with them yet, in my Express instance.

One quirk though... After running the command, the UI replaces the password with *'s, and then displays a red X to the right of the query. Mouse-overing the red X pops up an error. This may lead you to think that the query failed. It didn't. Rather, the error is just complaining about the *'s being invalid SQL. It kind-of makes sense, but it's confusing.

To drop a user, log in as dbc and run a command like:

drop user testuser;

Various Teradata command line tools are also installed, like bteq (the command line db shell), fastload, and tdload. If you want to write some programs, ODBC and JDBC drivers are also installed, and the VM comes with gcc 4.3 and Oracle Java 8. ODBC doesn't appear to be configured though.

I'm sure that the database has various limits imposed on it. I don't know what they are offhand, or whether I've even run up against them or not. If you're just interested in getting familiar with the technology though, it's a pretty good environment.

The virtual NIC is configured to grab an address from DHCP by default, but you'll probably want to give it a static IP if you plan on hitting the database from another machine. Just navigate to Computer -> Control Center -> Network Settings to access the network configuration tool. The only quirky bit is that the VM is apparently configured without a hostname, and the tool is rather adamant that you give it one.

In case you want to access the VM remotely, the ssh server is running, and root logins aren't disabled.


Installing Teradata ODBC

I wanted to access the database from SQL Relay on a remote machine - another VM, running Fedora 26. The most straightforward way to do this seemed to be to install and configure the Teradata ODBC Driver for Linux and then configure SQL Relay to use that.

The driver seems semi-straightforward to install. BUT! It tends to inadvertently sabotage your existing ODBC infrastructure, if you have one. So, it actually requires a bit of special handling.

The problem is that it contains its own copies of:

  • /lib/libodbc.so
  • /lib/libodbcinst.so
  • /lib64/libodbc.so
  • /lib64/libodbcinst.so

I guess this is so that it doesn't have to depend on an existing unixODBC installation. I don't know. All I know is, that if you install the software, it will overwrite those files, and the new files tend to break previously-working ODBC configurations.

So, if you have an existing unixODBC installation, then to safely install the Teradata driver alongside of it, you have to move those files out of the way, do the installation, them move them back. The Teradata driver appears to work fine with the libraries provided by unixODBC, at least in Fedora 26.

If you have an existing unixODBC installation, here's what you have to do to install the Teradata Driver:

If you have 32-bit unixODBC installed:

cd /lib
sudo mv libodbc.so libodbc.so.save
sudo mv libodbcinst.so libodbcinst.so.save

If you have 64-bit unixODBC installed:

cd /lib64
sudo mv libodbc.so libodbc.so.save
sudo mv libodbcinst.so libodbcinst.so.save

If you're running some version of Linux other than Fedora, then you'll have to find and move the appropriate libraries for your platform.

To actually install the driver:

tar xfz tdodbc1620__linux_indep.16.20.00.36-1.tar.gz
cd tdodbc1620
sudo ./setup_wrapper.sh

Hit return to allow it to install in /opt

Afterwards, it will run unattended and install tdodbc1620-16.20.00.36-1.noarch.rpm. Oddly, this rpm is marked "noarch" but it actually installs binaries for both x86 and x64 platforms.

Post-install, you have to move the newly installed libodbc.so and libiodbc.so links out of the way, as follows:

cd /lib
sudo mv libodbc.so libodbc.so.teradata
sudo mv libodbcinst.so libodbcinst.so.teradata
cd /lib64
sudo mv libodbc.so libodbc.so.teradata
sudo mv libodbcinst.so libodbcinst.so.teradata

Then, if you have 32-bit unixODBC installed:

cd /lib
sudo mv libodbc.so.save libodbc.so
sudo mv libodbcinst.so.save libodbcinst.so

Or, if you have 64-bit unixODBC installed:

cd /lib64
sudo mv libodbc.so.save libodbc.so
sudo mv libodbcinst.so.save libodbcinst.so

And that is it. The ODBC driver for Teradata is now installed.


Configuring Teradata ODBC

Configuring the Teradata ODBC Driver is a lot simpler than installing it.

You don't have to add anything to /etc/odbcinst.ini, just append a DSN like the following to /etc/odbc.ini

[teradata]
# This key is not necessary and is only to give a description of the data source.
Description=Teradata Database ODBC Driver 16.20

# Driver: The location where the ODBC driver is installed to.
Driver=/opt/teradata/client/16.20/lib64/tdataodbc_sb64.so

# Required: These values can also be specified in the connection string.
DBCName=192.168.123.101
UID=testuser
PWD=testpassword

# Optional
AccountString=
CharacterSet=ASCII
DatasourceDNSEntries=
DateTimeFormat=IAA
DefaultDatabase=
DontUseHelpDatabase=0
DontUseTitles=1
EnableExtendedStmtInfo=1
EnableReadAhead=1
IgnoreODBCSearchPattern=0
LogErrorEvents=0
LoginTimeout=20
MaxRespSize=65536
MaxSingleLOBBytes=0
MaxTotalLOBBytesPerRow=0
MechanismName=
NoScan=0
PrintOption=N
retryOnEINTR=1
ReturnGeneratedKeys=N
SessionMode=System Default
SplOption=Y
TABLEQUALIFIER=0
TCPNoDelay=1
TdmstPortNumber=1025
UPTMode=Not set
USE2XAPPCUSTOMCATALOGMODE=0
UseDataEncryption=0
UseDateDataForTimeStampParams=0

The most important parameters are:

  • DBCName - the hostname or IP of the database (192.168.123.101 in my case, but maybe different in your environment)
  • UID - the username to log in to the database with (testuser in my case, but could also be dbc or another user)
  • PWD - the password corresponding to the UID

The rest of the parameters do various things which can be researched online, but aren't critical to change for general operation.

Once configured, you can connect to the database using isql, provided by unixODBC.

$ isql teradata
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select 1
+-----+
| 1   |
+-----+
| 1   |
+-----+
SQLRowCount returns 1
1 rows fetched
SQL>

If that works, then the Teradata ODBC Driver has been configured successfully.


Configuring SQL Relay

The ultimate goal here is to access Teradata through SQL Relay. After getting everything else working, that last bit is pretty simple. Assuming SQL Relay is already installed, all you have to do is update the configuration file (either sqlrelay.conf or a file in sqlrelay.conf.d) with a teradata instance, as follows:

<?xml version="1.0"?>
<instances>

    <instance id="teradataexample" dbase="odbc">
        <users>
            <user user="exampleuser" password="examplepassword"/>
        </users>
        <connections>
            <connection string="dsn=teradata;user=testuser;password=testpassword;autocommit=yes"/>
        </connections>
    </instance>

</instances>

In the connection tag, the dsn option must match the DSN defined in /etc/odbc.ini, and the user/password options must match the UID/PID definied in that DSN.

The user/password defined in the user tag are the user/password that you'll use to log into SQL Relay itself.

To start it up:

sqlr-start -id teradataexample

To access the database:

$ sqlrsh -host localhost -user exampleuser -password examplepassword
sqlrsh - Version 1.4.0
 Connected to: localhost:9000 as exampleuser

 type help; for help.

0> create table test (col1 int, col2 varchar(200));
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.063005 sec

0> insert into test values (1,'hello');
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.029303 sec

0> select * from test;
col1 col2 
==========
1    hello

 Rows Returned   : 1
 Fields Returned : 2
 Elapsed Time    : 0.030262 sec

0> drop table test;
 Rows Returned   : 0
 Fields Returned : 0
 Elapsed Time    : 0.077361 sec

0> quit;

To shut it down:

sqlr-stop -id teradataexample

If all of that worked, then you can now access Teradata from SQL Relay.


Quirks

SQL Relay -> ODBC -> Teradata is generally usable, but ODBC drivers are quirky, as a rule, so there are probably things that don't work correctly. I've already discovered a few esoteric ones and I'll be updating the ODBC connection module with Teradata-specific workarounds as appropriate.

If you run into anything weird, please report it to support@firstworks.com.

Thanks!