Open Source Migration Guide
This document sets out to describe the various routes that organisations can take when migrating towards the use of Open Source Software (OSS). We believe that there are many benefits to be gained from increasing the use of OSS within the typical organisation's IT portfolio, including reduced total cost of ownership, higher stability, increased security and greater overall control.
If you spot glaring errors or inconsistencies or can add useful further information, then we will try to incorporate your comments where possible. Please send your comments to migrationguide@gbdirect.co.uk, preferably with a detailed suggestion for corrections or enhancements and which paragraph or subsection of the document you are referring to.
The planned contents list and thrust is as follows:
- The business case. Descriptions of the benefits of using Open Source software, high-level commercial and strategic issues.
- Case studies drawn from a range of industries cross-linked with Open Source component descriptions in part 3.
- A framework for the migration process.
- Background and capability descriptions of specific Open Source components such as Samba, Apache, Linux etc. cross-linked with case studies.
- Supporting appendices providing more detailed information on licensing issues, development models, statistics, market share studies and other suitable information. Links and references to other important resources.
- Examples of software (not necessarily Open Source) which are widely used and supported by vendors on or in relation to Open Source software components. Instances are likely to include Oracle, WebSphere and numerous others.
The Document
Introduction
- Scope and purpose of the Guide
- How to use the Guide
Management Briefing
Two-page summary of key issues intended to persuade decision-makers to read and investigate further.
The Business Case for Open Source
- This is a benefit-oriented description of what Open Source provides, including:
- Stability
- Reliability
- Auditability
- Cost
- Flexibility and Freedom
- Support and Accountability
See here for the draft of the benefits document.
Background and Capability of Specific Open Source Components
There are numerous Open Source Software projects ranging from fundamental infrastructure tools through to specific niche products. The entire internet is supported by open source tools such as BIND and Sendmail, and the vast majority of Internet data originates in and is routed by software derived from (or which still is) Open Source. Not all Open Source projects choose the same licenses, but those listed below all use licences that we consider to be open enough. Some also have commercial counterparts available.
This section will expand to provide more detailed descriptions of each component, see at present the Apache description as an example. It has been divided into three sections based on editorial opinion as to whether each component is of leading, significant, or other importance. Clearly it is impossible to cover all of the thousands of Open Source projects here, our decision is based on general commercial impact. We welcome comments about glaring omissions but our decision is based on our opinion of what is important to business migration to Open Source use.
This listing forms a representative selection of some of the most relevant Open Source projects. This guide is not intended to be comprehensive but instead representative. Our goal is to provide information about the scope and range of what is available in the Open Source world rather than to enumerate every single project or package, which would require a huge directory and massive maintenance.
Leading Open Source Projects
Apache Webserver
As businesses move their IT infrastructure to a web services model, the need for powerful and reliable web server software is becoming ever more crucial. Apache is the world's leading web server. Surveys conducted by NetCraft indicate that for a number of years, Apache has been the server software chosen by a majority of users. At the time of writing it runs on over 55% of all web servers – about 10 million at present. Moreover, according to Netcraft's latest figures, its usage levels are growing nearly twice as fast as those of its nearest competitor. This link should show current figures from Netcraft.
Why do so many people rely on Apache? Apache has all the advantages that serious users have come to expect from open-source software: reliability, security through auditability, flexibility, efficiency, standards compliance, and low cost.
Reliability
Apache has long proven to be among the most reliable of web servers. Netcraft measure web site uptimes, and list a league table of the top fifty longest running sites. Apache drives all but four of them. Many high-profile sites (The Register, Amazon, Verio, Hewlett-Packard, IBM, Deutsche Bank, European Central Bank, Bank Italia, Abbey National) choose Apache because its uptime is usually limited only by the reliability of the underlying operating system. Moreover, many of these sites must handle many millions of HTTP clients each day.
Security
It is extremely hard (if not impossible) to guarantee that any complex piece of software is free of security vulnerabilities. However, high-quality software is carefully written to minimise both the likelihood and the severity of security flaws. Apache falls into this category. Though it has contained vulnerabilities, they have tended to be relatively minor, easy to fix, and few in number.
The fact that Apache is open-source software constitutes a significant advantage in this respect. As with all open-source software, Apache has large numbers of people using the software, discovering bugs in it, auditing it, and ultimately correcting it – and note that availability of source code is crucial in this respect.
It is instructive to compare this situation with that for Microsoft's IIS, Apache's nearest competitor in terms of market share. IIS has had a number of bugs which permit remote attackers to execute any program on the server, and these bugs have been widely exploited. One such exploit was the so-called ‘Code Red’ worm, which defaces pages on infected machines. Once Code Red has infected a susceptible IIS server, it aggressively tries to search out other machines to infect. This leads to an explosive growth in both the number of machines infected and the amount of network bandwidth devoted to this worm's self-propagation. Later, more virulent strains of Code Red also enabled attackers to acquire system-level access to compromised machines.
The effects of the Code Red worm were serious. Many high-profile websites — including some machines running Microsoft's own Hotmail service — were compromised. Some analysts estimated the costs of the damage caused world-wide to be in the billions of dollars, and while this may be an over-estimate, it is undeniable that the costs were significant. In the wake of these events, the respected analysis firm Gartner advised that “enterprises hit by both Code Red and Nimda [another IIS-targeting worm] immediately investigate alternatives to IIS, including moving Web applications to Web server software from other vendors, such as iPlanet and Apache.” (Gartner Group ‘ditch IIS’ report.)
Flexibility
Extendable. Cross-platform.
Performance
Apache is not designed specifically as a high-performance webserver although the current release has recently been reengineered to provide specific performance enhancements. Performance of the server software is almost never an issue in most applications and we would advise against taking this to be an important or even relevant question. The fact that so many high-profile sites run with Apache is probably evidence enough of the adequacy of its performance for all normal tasks.
Standards compliance
Full HTTP/1.1 implementation. Commitment to track future web standards. Earliest HTTP/1.1 server used in the wild; exposed client implementation bugs in IE, JDK, Navigator, AOL, etc.
Low cost
As an open-source application, Apache may be freely downloaded from the Internet for the cost of the download. Most serious users are well aware that initial purchase cost is a small part of the total cost of ownership of a piece of software. However, the inherent insecurity of many of Apache's competitors, including IIS, means that system-administration staff must spend significant amounts of time tracking and installing security patches. Apache's superior security record means that both its initial purchase cost and its total cost of ownership are low.
- BIND is the name for the Domain Name Server (DNS) software which underpins the entire Internet. Identical software runs on the Internet Root Name Servers as can also be found in any Linux or other Open Source system distribution. The importance of this software in the world's network infrastructure cannot be overstated.
- Sendmail continues to carry an estimated 80% of the entire world's email traffic. Although some other projects are starting to compete with Sendmail, it remains a cornerstone of the international infrastructure. The principal task performed by Sendmail is as a mail transfer agent, handling the interchange and queueing of email messages on outbound and intermediate servers. Most free software distributions continue to use it as their email engine of choice. Sendmail should not be confused with the ‘user agent’, software used by a particular individual to compose and read mail. There are numerous user agents such as Elm, Mutt, Pine, Eudora, Microsoft Outlook — all of which are often used in conjuction with Sendmail to build a complete email solution.
- BSD – URL to follow. These are a family of Unix reimplementations, based more or less on the original Berkely Unix distributions. Although not as well known in some sectors as Linux, they have a strong following and are argued in some quarters to be more robust and reliable than Linux.
- Gnome is a serious attempt to provide a fully-networked desktop environment for the various Unix-like platforms. It is shipped as standard with all the major Linux distributions and is now considered stable and effective. Various add-on projects exist to extend Gnome and build a suite of office applications, for example Gnumeric, a spreadsheet.
- KDE is a similar project to Gnome, though possibly more polished. Another desktop project with many followers and … once blessed with suitable applications … a serious threat to the established monopoly.
- GNU A multitude of Open Source projects live under the GNU banner .. indeed, Gnome is just one of them. The founding father of GNU, Richard Stallman, can take the credit for much of what we now see as the Open Source Movement. In particular, effectively all the other Unix look-alikes are deeply indebted to GNU for the compilers and huge range of software tools that stem from the GNU work. By rights, what is commonly called ‘Linux’ should really be known as ‘GNU/Linux’ since the bulk of what constitutes ‘Linux’ is in fact the GNU infrastructure. Huge, excellent and enormously influential.
- Samba provides interworking between practically any operating systems and the Microsoft world of file and printer sharing (including domain controller services). Samba is in widespread use in many large organisations, replacing expensive servers and their proprietary licences with licence-free, low cost, commodity solutions. Most people who are used to using Samba find it incomprehensible that other organisations haven't realised how effective, stable and reliable this solution is.
- Linux kernel. The Linux kernel is a robust and stable implementation of the core standards which can loosely be described as ‘Unix’ together with a wide range of device drivers, a wealth of network protocols, support for various filesystems and running on a wide range of hardware platforms. Coupled with the GNU tools and most of the other mainstream Open Source software described here, it is bundled and shipped as ‘distributions’ by a number of commercial and non-commercial organisations. It is most common on PC platforms, where its devotees see it as increasingly a replacement for proprietary operating systems. There are thousands of informational web sites for Linux around the world.
- Open Office this is the Open-Source version of the Star Office product from Sun Microsystems. Both are considered competitive with the market-leading integrated desktop software.
- MySQL is a hugely popular relational database server, used in thousands of websites and commercial applications all over the world. The Windows version is now nipping at the heels of the Microsoft product. MySQL is very fast, extremely reliable and an excellent lightweight RDBMS solution, though not positioned as a competitor to the industry heavyweights. It works very well in applications suited to its strengths and continues to grow in capability. Packaged via Foxserv it is proving highly popular in Windows™ environments.
- Perl is an established programming language with a strong following amongst thinking programmers. It is perhaps best known as a website development tool — but only to those who don't know its true capabilities. It has also spun off (see CPAN below) a huge army of ‘module’ developers and is increasingly one of the main programming languages of choice amongst the more talented software developers.
- XFree86 — "The XFree86 Project, Inc is the organisation which produces XFree86 , a freely redistributable open-source implementation of the X Window System which runs on UNIX(R) and UNIX-like operating systems such as Linux, all of the BSD variants, Sun Solaris x86, Mac OS X (via Darwin), as well as other platforms like OS/2 and Cygwin." This is the de-facto standard graphical display system for the entire Open Source community, forming the platform for developments such as Gnome and KDE.
- CPAN — the Comprehensive Perl Archive Network - could be overlooked by those not in the know. This is a repository for thousands of Perl modules (pluggable extensions, or libraries); many of which are significant software projects in their own right. A Perl developer who needed a templating language for a website, or an XML parser (for example), would first check CPAN to see if it contains what is needed. There are excellent modules for a wide range of tasks: those who don't know Perl are usually staggered by the range and quality of what is available. The CPAN archive far outstrips the range of class libraries available for Java.
- PHP is a scripting language for websites which are backed by databases. Not only is it widely used on Unix systems, it is now starting to supplant Active Server Pages on Microsoft platforms too, because of its power and portability. Usage of PHP is growing rapidly. Packaged via Foxservit is proving highly popular in Windows™ environments.
- Gimp — more properly ‘The GIMP’ (GNU Image Manipulation Program) is considered to be a strong competitor to Adobe Photoshop as a tool for manipulating raster images. It is particularly interesting as one of the first domain-specific end-user applications to emerge amongst what had until recently been mostly infrastructure or horizontally aimed Open Source developments.
Significant
- Jakarta / Tomcat A spin-off from the Apache project (with many subprojects), "The Jakarta Project creates and maintains open source solutions on the Java platform for distribution to the public at no charge." Probably the best-known part of it is Tomcat. Tomcat 4 is the official Reference Implementation of the Servlet 2.3 and JavaServer Pages 1.2 technologies. The umbrella Jakarta name covers much more than that and has become something of a juggernaut by itself.
- Squid Squid is widely used as an application-level web proxy and cache. Non-cacheing proxies and firewalls require pages to be fetched from the server on every request; cacheing proxies help to reduce bandwidth demands and responsiveness in many cases. Squid is a significant project with numerous features such as peer-to-peer querying and hooks for extensions.
- Postgres Postgres quote: "is a sophisticated Object-Relational DBMS, supporting almost all SQL constructs, including subselects, transactions, and user-defined types and functions. It is the most advanced open-source database available anywhere." So say its authors with considerable justification. At present it is probably not as widely deployed as MySQL but is considered to be a more heavyweight database project with closer adherence to SQL standards and features typically found in ‘industrial’ database systems. MySQL by comparison is usually characterised as fast, robust but relatively light on features.
- SAX
- Gnome ORB (get right name)
Other
- Koffice — The KDE spin-off creating office applications which integrate with the KDE desktop; parts of which are now considered usable
- Evolution (Outlook Clone)
- Gnome Office — the Gnome office meta-project embracing a wealth of desktop packages. Some are very well developed (c.f Gimp), others much less so.
- Gnu Cash
- Ximian
- Mono
- Open .net
Appendices
News Sources
- Linux Weekly News Extensive reporting of Open Source issues. From a commercial perspective, the signal-to-noise ratio is not good because inevitably technical developments outnumber much in the way of serious commercial stories.
- The Register Mainstream IT and techical news site, slightly UK-centric but worldwide overall. Irreverent tone and unpompous. Focus more towards technical than business issues.
- Linux Today general news site covering Open Source matters, attempting to give commercial as well as technical coverage.
- ZDNet Linux Forum — ?? is this worthwhile, does it still exist?
- Slashdot Somewhat geeky technically oriented and slightly scurrilous or tongue-in-cheek news site with extensive coverage of Open Source and vaguely related topics, not focused on business issues
Software Directories and Information Sources
- Freshmeat — monstrous repository of open-source software projects
- SourceForge — the place where developers congregate for the majority of open-source projects. Supports the CVS repositories and a wealth of other resources
- Tucows — well-respected Linux-oriented source of applications and games for (mostly free) download
- GNU.org
Other Open-Source software
Closed Source But Runs On OS Platforms
- Databases
- Oracle
- Informix
- DB2
- Development Tools
- Java (general)
- Borland
- Kylix
- Jbuilder
- C++ Builder
- Websphere
- Office
- Bynari
- Domino
Links
References
The references section is now in a separate document.