Saturday 12 January 2008

Python As an Integration Tool

Python can integrate a variety of disparate systems; you may hear it referred to as a glue language, because it's a powerful way to glue systems together. We have broken the basic integration technologies available on Windows into five groups: files, DLLs, COM, networking, and distributed objects. We'll take a quick look at the Python features that support each one.

Working with Files

The most fundamental technique for making systems talk is working with files. They are at the foundation of every operating system, and huge and reliable systems can be built and maintained by batch-processing files. Every programming language can work with files, but some make it easier than others. Here are some key features:

?Python can read a file into a string (or read a multiline text file into a list of strings) in one line. Strings have no limitations on what they can hold: null bytes and non-ASCII encodings are fine.

?Python can capture and redirect its own standard input and output; subroutines that print to standard output can thus be diverted to different destinations.

?It provides a platform-independent API for working with filenames and paths, selecting multiple files, and even recursing through directory trees.

?For binary files, Python can read and write arrays of uniform types.

?A variety of text-parsing tools are available, ranging from string splitting and joining operations and a pattern-matching language, up to complete data-driven parsers. The key parts of these are written in C, allowing Python text-processing programs to run as fast as fully compiled languages.

?When generating output, Python allows you to create multiline templates with formatting codes and perform text substitutions to them from a set of keys and values. In essence, you can do a mailmerge in one line at incredibly high speeds.

Chapter 17, Processes and Files, provides a comprehensive introduction to these features.

Working with DLLs and C Programs

Windows uses dynamic link libraries extensively. DLLs allow collections of functions, usually written in C or C++, to be stored in one file and loaded dynamically by many different programs. DLLs influence everything that happens on Windows; indeed, the Windows API is a collection of such DLLs.

Python is written in ANSI C, and one of its original design goals was to be easy to extend and embed at the C level. Most of its functionality lives in a DLL, so that other programs can import Python at runtime and start using it to execute and evaluate expressions. Python extension modules can also be written in C, C++, or Delphi to add new capabilities to the language that can be imported at runtime.

The Win32 extensions for Python, which we cover throughout this book, are a collection of such libraries that expose much of the Windows API to Python.

The basic Python distribution includes a manual called Extending and Embedding the Python Interpreter, which describes the process in detail. Chapter 22, Extending and Embedding with Visual C++ and Delphi, shows you how to work with Python at this level on Windows.

COM

The Component Object Model (COM) is Microsoft's newest integration technology and pervades Windows 95, 98, NT, and 2000. The DLL lets you call functions someone else has written; COM lets you talk to objects someone else has written. They don't even have to be on the same computer!

Windows provides a host of API calls to get things done, but using the calls generally requires C programming expertise, and they have a tortuous syntax. COM provides alternative, easier-to-use interfaces to a wide range of operating-system services, and it lets applications expose and share their functionality as well. COM is now mature, stable, and as fast as using DLLs, but much easier to use, and so opens up many new possibilities. Want a spreadsheet and chart within your application? Borrow the ones in Excel. To a programmer with a COM-enabled language (and most of them are by now), Windows feels like a sea of objects, each with its own capabilities, standing by and waiting to help you get your job done.

Python's support for COM is superb and is the thrust for a large portion of this book.

Networking

The fourth integration technology we'll talk about is the network. Most of the world's networks now run on TCP/IP, the Internet protocol. There is a standard programming API to TCP/IP, the sockets interface, which is available at the C level on Windows and almost every other operating system. Python exposes the sockets API and allows you to directly write network applications and protocols. We cover sockets in Chapter 19, Communications.

You may not want to work with sockets directly, but you will certainly have use for the higher-level protocols built on top of it, such as Telnet, FTP, and HTTP. Python's standard library provides modules that implement these protocols, allowing you to automate FTP sessions or retrieval of data from email servers and the Web. It even includes ready-made web servers for you to customize. Chapter 14, Working with Email, and Chapter 15, Using the Basic Internet Protocols, cover these standard library features.

Distributed Objects

The most sophisticated level of integration yet seen in computing is the field of distributed objects: essentially, letting objects on different machines (and written in different languages) talk to each other. Many large corporations are moving from two-tier applications with databases and GUIs to three-tier applications that have a layer of business objects in the middle. These objects offer a higher level of abstraction than the database row and can represent tangible things in the business such as a customer or an invoice. The two main contenders in this arena are COM, which is a Windows-only solution and Common Object Request Broker Architecture (CORBA), which is multiplatform. Python is used extensively with both. Our focus is on COM, and we show how to build a distributed Python application in Chapter 11, Distributing Our Application. Building a distributed applica-

tion is absurdly easy; COM does all the work, and it's a matter of configuring the machine correctly.

Python's support for all five technologies and the fact that it runs on many different operating systems are what makes it a superb integration tool. We believe that Python can be used to acquire data easily from anything, anywhere.

No comments: