How to connect to SQL Server from UNIX

Introduction

There are multiple ways to connect to SQL Server from UNIX but the most common are via Java/JDBC and C/ODBC; I'm concentrating on methods involving C (or languages that can interface to C e.g., Perl) here. If you want to connect to Microsoft SQL Server from Java I can only recommend the Easysoft JDBC-ODBC Bridge.

The ODBC API was created as a Pascal API although it is better known as a C API. In ODBC, you basically have 3 components; your application written in C, Perl etc, the ODBC driver manager (usually unixODBC although on some platforms e.g. OSX, iODBC is more popular) and an ODBC driver. For a decent explanation of UNIX/Linux/OSX and ODBC see Linux/Unix ODBC (which I authored but cannot reproduce here).

No matter what language you are using whether it be Perl, Ruby, Python, PHP etc you will need an ODBC driver Manager and an ODBC driver. By far the most popular ODBC driver manager is unixODBC which is an open source project distributed with most of the main UNIX/Linux distributions. In Linux, this is usually available as a package you can download via your package manager in a binary version but make sure you download the unixodbc-dev development package as well as you will need the C header files that are packaged separately. In other UNIXes it is fairly simple to build yourself from sources so long as you have a C compiler.

After you have installed an ODBC driver manager you need to pick an ODBC driver. The main ones are free drivers from freeTDS and Open Client Sybase drivers (which work with most older MS SQL Servers) or non-free drivers from the likes of Easysoft, Openlink and DataDirect. Before I continue I should tell you I have an affiliation with Easysoft. In addition to ODBC drivers that communicate directly with MS SQL Server there are ODBC bridges which connect an application on UNIX/Linux with the manufacturers ODBC driver on a remote machine e.g., the Easysoft ODBC-ODBC bridge. For a description of the different types of ODBC drivers (or access to ODBC drivers) see ODBC Driver Types.

So where should you start? This largely depends on the level of support you want/need, how confident you are at building and installing software and the level of ODBC support you need. If you need a support contract guaranteeing levels of support, you don't have a C compiler, you are not confident building open source software yourself or you cannot depend on bug fixes in a open source project not having any guarantees then you need a commercial solution which is pre-built for your operating system and often includes both the ODBC driver and ODBC driver manager. If you've got no money to spend on software, have a C compiler and all the configure tools and are confident building and deploying software yourself then some of the open source solutions might be the right path for you. If you are going to build an ODBC Driver manager and ODBC driver yourself there are a number of things you should know as it is not as straight forward as you might think (see ODBC Driver Manager and ODBC Driver compatibility).

ODBC Driver Types

ODBC-ODBC Bridges

ODBC-ODBC Bridges provide an ODBC driver for your client operating system which talks to a process/service on a server machine which uses the native ODBC Driver for the database (which you will obviously need to install). Usually, the bridge client uses a proprietary protocol when communicating with the server but a) as far as the application is concerned it is using a normal ODBC driver and b) the service/service translates the network protocol back into ODBC calls in the real ODBC driver.

ODBC-ODBC Bridges are worth considering if:

  • cannot get an ODBC driver for the database you need to access for the operating system your application runs on.
  • you want/need to go through a firewall
  • you need functionality which is only provided by the ODBC driver for your DBMS e.g., encryption or proprietary connection/login methods.
  • you can install software on the remote server machine.
  • you need the security of using the database manufacturers ODBC or that is the only allowed access to the database.
  • you need to be immune from protocol changes that can occur in the network protocol the ODBC driver uses to talk to the database e.g., some third party ODBC driver manufacturers communicate directly with the database across the network (commonly referred to as wire protocol drivers) but the protocol is not always open; if the database manufacturer produces an update that changes that protocol your ODBC driver may stop working. If you use a bridge and the database manufacturers ODBC driver you should be immune to this problem as when the database system is upgraded new ODBC drivers should be available.
  • you need access to multiple remote ODBC drivers and want to access them through a single point with a single client-end ODBC driver.
  • the native ODBC driver does not support any kind of encryption, with a bridge you can run traffic though an IP tunnel e.g., ssh.

There are other advantages depending on the bridge:

  • additional access control by user, machine or network and/or per DSN (Data Source Name).
  • additional logging facilities e.g., logging ODBC calls that fail or SQL that does not parse/execute.
  • auditing of connections.
  • caching of ODBC API results to avoid round trips to the server etc.
  • massaging inefficient ODBC API call patterns at the client end into more efficient calls at the server end e.g., binding columns for applications that use SQLGetData.
  • network packet compression.
  • support for fallback servers
  • can be used to protect thread-unsafe ODBC drivers at the server end.
  • throttling by limiting connections per client or total connections.

There are a number of different types of bridges. Some provide an ODBC driver at the client end which happily works with any ODBC enabled application (these are the best in my opinion and true ODBC bridges) and some provide another API (not ODBC) to access a server component which translates the client API into ODBC calls (not strictly ODBC-ODBC bridges). The best mirror the ODBC calls at the client end with identical calls to the same ODBC API at the server end and are totally transparent as far as the client application is concerned (almost as if they were talking directly to the ODBC driver on the remote machine).

"Wire-Protocol" Drivers

The so called "wire-protocol" drivers for SQL Server talk directly to SQL Server over the network.

Open Source SQL Server ODBC Drivers:

Commercial SQL Server ODBC Drivers:

The main advantage of "wire-protocol" drivers is that you only have to install something at the client-end. Of course, this is part of the main disadvantage too as if the database manufacturer changes the protocol your driver may stop working or be unable to use some new feature.

Perl

Perl happily interfaces with C via XS and there are a number of possibilities for accessing Microsoft SQL Server from Perl. The main Perl modules to consider are:

  • DBD::ODBC with any of the free or commercial ODBC Drivers or ODBC-ODBC Bridges. This has the advantages that DBD::ODBC is a mature driver which supports virtually all of the DBI specification (with a few minor exceptions), some of the DBI specification is actually copied from ODBC (typeinfo, column_info, get_info etc), it is maintained by me (the author of the Easysoft ODBC-ODBC Bridge so I know a thing or two about ODBC), has full UNICODE support and in principle (in principle because your SQL syntax can vary across databases) you can write Perl code that will work with any ODBC driver. Disadvantages - I can't see any really but let me know if you find any.
  • DBD::Sybase and either the Sybase Open Client or FreeTDS. Disadvantages are that the Open Client is not available for that many platforms (although FreeTDS is available in source so you could in theory build it for any platform you like), DBD::Sybase via the Open Client libraries does not support ? placeholders (here again later FreeTDS libraries gets around this limitation), there are possible minor issues with LongReadLen/TEXTSIZE and neither the Open Client of FreeTDS supports SQLDescribeParam which DBD::ODBC likes to use.
  • DBI::ProxyServer. Disadvantages are that a) you need to install Perl on a Windows server machine b) you will need to install DBI::ProxyServer and run it as a daemon on the same machine and b) it is proxying the DBI API and not the ODBC API so you are restricted to DBI method calls (e.g., no support for more_results). Advantages are that you can use any Microsoft SQL Server driver you install on the Windows server and use all the facilities that supports.
  • DBD::Gofer. A stateless proxy DBI driver and as a result it does not support transactions.
  • UnixODBC provides a Perl binding to the ODBC API (excluding descriptor support). I've not tried this myself but looking at it if you really needed low-level ODBC support for which you cannot find an equivalent in DBI this may help. You will obviously need the unixODBC driver manager installed and an ODBC Driver to use this module.

For a decent tutorial on installing DBD::ODBC with an ODBC driver manager and an ODBC driver see Enabling ODBC support in Perl with Perl DBI and DBD::ODBC.

Note: If you want to use unicode on unix with DBD::ODBC you need to add the -u switch when running the Makefile.PL e.g., "perl Makefile.PL -u". On Windows, DBD::ODBC currently defaults to using unicode.

Be careful choosing an ODBC Driver to use with DBD::ODBC as many don't pass the DBD::ODBC test suite. However, just because the ODBC driver you are looking at does not pass the test suite it does not mean it is useless. Some of the tests in the DBD::ODBC test suite are rather tasking and few ODBC drivers pass all the tests. Having said that if an Easysoft ODBC drivers fail any of the tests please notify them immediately as they are specifically tested with DBD::ODBC and should pass all tests. FreeTDS 0.82 (current as of the writing of this document) fails a number of the tests and can hang in the Multiple Active Statement tests - this latter issue may not be of any consequence to you if you don't want to use MAS).

Perl/DBI/ODBC Tutorials

Enabling ODBC support in Perl with Perl DBI and DBD::ODBC

Drivers, Data Sources and Connection - Perl DBI/DBD::ODBC Tutorial Part 1

Introduction to retrieving data from your database - Perl DBI/DBD::ODBC Tutorial Part 2

Connecting Perl on UNIX or Linux to Microsoft SQL Server - Perl DBI/DBD::ODBC Tutorial Part 3

Perl DBI - Put Your Data On The Web

Debugging Perl DBI

Python

The situation in Python is much the same as Perl. There are principally two ways to access ODBC drivers via Python using the Python Database API Specification v2.0 a) mxODBC a) and b) pyodbc.

mxODBC (from eGenix) is a commerical product and was first on the scene having a long history of good database coverage and commercial support. The Python Database API Specification v2.0 was written by Marc-André Lemburg, CEO and founder of eGenix.com. In addition to mxODBC eGenix have mxODBC Connect which is a client-server product avoiding the requirement to have the ODBC driver on your client (python end) operating system (this works a little like an ODBC-ODBC Bridge but in this case you run a server process which uses mxODBC to connect to the native ODBC Driver and a client your Python uses at the Python end which talks to the server).

pyodbc is the newcomer (relative to mxODBC) but extends the Python Database API Specification v2.0 in the following ways. pyodbc uses the MIT license, so it is free for commercial and personal use.

PHP

Coming soon.

mssql

ODBC (Unified)

Ruby

Coming soon.

SQL Server ODBC Driver Comparisons

Coming soon.

ODBC Driver Manager and ODBC Driver compatibility

The unixODBC ODBC Driver manager aims to be completely compatible with the ODBC specification but there are a few areas where the specification has either a) changed over time or b) is not too clear. There are three things in particular you need to be aware of:

  1. Thread Safety

    The ODBC specification suggests ODBC drivers should be thread-safe i.e., multiple threads should be ok using the same ODBC Driver. Although that is mostly true for Windows ODBC Drivers (with an exception for some older ODBC Drivers or older versions of Windows ODBC Drivers) the actual implementation of threads on UNIX is not a UNIX-universal specification. In particular, on Linux and Solaris, there are multiple threading models you can choose from (e.g., sun threads, POSIX threads, GNU threads and Linux threads) and on most UNIXes the driver needs to be built with a) thread-safe APIs (e.g., gethostbyname_r) and b) with thread specific libraries. As a result, on UNIX, ODBC Drivers are usually supplied with multiple versions (a thread-safe one and a thread-unsafe one linked with a specific thread library).

    In addition unixODBC may be built to support thread-unsafe drivers and attempt to protect them via the "Threading" attribute in the odbcinst.ini which can be used to stop multiple threads accessing the environment, connection or statement handles in ODBC.

    You are already probably thinking this sounds like a PITA (and it is to some extent) but on most platforms there is only one choice of threading libraries so the choice comes down to whether your application uses threads or not. e.g., on Linux, Linux threads is the standard way to go and on Solaris POSIX threads are implemented on top of the sun threads library.

    If you have a threaded application and a thread-unsafe ODBC Driver then you need to read up about the unixODBC "Threading" attribute which can be used to stop multiple threads accessing the environment, connection or statement handles simultaneously.

  2. SQLLEN/SQLULEN and 32/64 bit platforms

    See 64-bit ODBC.

    In the dim distant past when the ODBC specification was put together by Microsoft there was only WIN32 systems which had long integers in C at 4 bytes and pointers at 4 bytes. Now that we have 64 bit systems a long integer can be 8 bytes in length and a pointer is 8 bytes in length but the original ODBC specification declared some arguments to ODBC APIs in terms of a "32bit value". When 64bit systems came along and more memory could be addressed Microsoft changed the specification and introduced a SQLLEN/SQLULEN type for some arguments to ODBC APIs. On 32bit systems an SQLLEN/SQLULEN stayed at 4 bytes but on 64bit systems it became 8 bytes.

    This introduced a big conundrum for some ODBC Driver writers as in some cases they had been distributing ODBC Drivers for 64 bit platforms for years before Microsoft changed the specification (e.g., Linux/alpha and DEC/Compaq tru64 then later Sparc 64, PowerPC 64 and Itanium). Some ODBC driver writers stuck to the original specification as long as they could and kept the ODBC APIs described as taking "32bit values" at 32 bit values. Later, when Microsoft introduced the SQLLEN/SQLULEN types it was obvious Microsoft intended those same arguments to be 64-bit values. All of a sudden this meant ODBC Drivers and ODBC Driver Managers for 64-bit platforms could be built in one of 2 ways i.e., with the original specification where SQLLEN/SQLULEN did not exist and all API arguments were at most 32 bits or with the new specification which changed some APIs to take SQLLEN/SQLULEN which were defined as 32 bits on 32-bit systems and 64 bits on 64-bit systems. Bare in mind this is ignoring the problems of writing an ODBC-ODBC Bridge where the client could be running on a 64-bit alpha and the database/ODBC Driver could be running on a 32-bit system.

    Now add to this the fact that on most UNIXes you can happily build an application 32-bit or 64-bit on a 64-bit processor. Usually the OS changes the dynamic linker and ELF format so that 32-bit applications only find 32-bit built libraries and 64-bit built applications only find 64-bit built libraries. Not all UNIXes do this the same way. Some define LD_LIBRARY_PATH_32 and LD_LIBRARY_PATH_64 environment variables and some place shared objects built for 32-bit in /usr/lib32 and shared objects built for 64-bit in /usr/lib64 (or variations of this). The one fact to come out of this is that if your application is built 32-bit (on UNIX) it can only link with libraries built for 32-bit (and similarly for 64-bit). This means you cannot take an application like hsodbc which currently is built 32-bit and link it with 64-bit ODBC drivers - you'll get wrong ELF class if you do i.e., the application and shared library are built with a different ELF type or size of integer/pointer). On Windows the problem is equally as troublesome as Microsoft provide a 64-bit ODBC Driver manager and a 32-bit ODBC Driver Manager and if you define a data source in one you cannot access it in the other.

    As a result, this whole area is a bag or worms. unixODBC was built with ODBC API args defaulting to 32-bit values for years and had a switch to override this (seldom used) and now defaults to SQLLEN/SQLULEN definitions on 64-bit platforms which are 64 bits instead of 32-bits now. Even if you know all this you still need to know if your application was built 64-bit for a 64-bit operating system or 32-bit in a 64-bit operating system.

    What does this all boil down to:

    • You must link an ODBC application built for 32-bit with an ODBC Driver manager and ODBC Driver also built 32-bit or (on UNIX) you'll get wrong ELF class errors or worse, cannot find the unixODBC Driver manger or ODBC Driver.
    • If you link an ODBC application (or ODBC Driver Manager) with SQLLEN/SQLULEN defined differently you'll get spurious segmentation violations when one component writes 64 bits into an address expecting 32-bits.

    Bare in mind that just because you've got the latest 64-bit processor that does not mean that all your applications and drivers are 64-bit - some could be built 32-bit.

  3. Unicode

    Unicode is simple - isn't it? No!

    Here again you must remember the ODBC specification was written by Microsoft and they have changed it over the years. Originally (and still today), the ODBC specification has a number of inconsistencies which make programming ODBC applications for unicode difficult. Add to that, unicode on UNIX is very different from Windows. Originally, the ODBC specification on Windows defined SQLWxxx functions which were Wide equivalents of the ANSI SQLxxx functions; they took UCS-2 characters. On UNIX we had wchar_t types which could be 2 bytes or 4 bytes. Then there is UTF-8 encoding (which is a great idea) but does not fit into the ODBC specification because some of the ODBC APIs are declared in terms of characters and some in terms of bytes (e.g. SQLGetData). Later ODBC on Windows seemed to change to UTF-16 encoded characters which almost fits into the SQLWxxx APIs but does not match well with UNIX. As a result, some ODBC Drivers for UNIX proclaim to support UTF-8 encoding via ANSI APIs but that flies in the face of the fact that ODBC has contradictions in the specification in terms of characters and bytes (e.g., SQLGetData which returns in StrLen_or_IndPtr the length or indicator value which is not defined as bytes or characters so if too small a buffer is passed to SQLGetData for the column, does the StrLen_orIndPtr contain the number of characters required in the buffer to retrieve the whole column or the number of bytes. What if the buffer passed to SQLGetData can only accept n and 1/2 UTF-8 encoded characters? - does it part fill the buffer or split a UTF-8 encoded sequence in two).

    What we know today is that the only reliable way to use unicode on UNIX is to use the SQLWxxx functions and pass UTF-16 encoded characters to it. For an example of how this can work see Perl's DBD::ODBC with unixODBC which happily converts UTF-8 encoded Perl strings into UTF-16 encoded strings passed to the SQLWxxx functions and decodes UTF-16 encoded result-sets into UTF-8 encoded strings which are accepted by Perl as unicode strings. It does this for all the connection functions so you can connect to data sources containing unicode strings and supports unicode in result-sets and meta-data.