Last updated on February 23th 2010


Contents

Introduction
Setting Up the Linux OS
Networking Setup and Issues
Setting up the Apache Server
Testing Your Configuration
Putting Up Your First Content
Where to Go from Here
Conclusion

Introduction

In this article, you'll learn how to setup a basic Web server on your Linux box using the Apache Web server. I'll go from the initial setup of your networking options to configuring Apache and then finally, how to start putting up your own content.

Setting Up the Linux OS

Setting up your Linux system varies from distribution to distribution, such as Red Hat, Debian, Caldera or Mandrake, among others. As such, it's hard to give a generic introduction as to how to do this - and why re-invent the wheel? Your distribution should have plenty of documentation to go with it, whether online or printed.

For the sake of this article, I'll assume that you have a basic Linux system up and running, and hopefully - are using it to read this already. In the next section, I'll go over some networking options, such as connectivity and how you're going to use Apache.

Networking Setup and Issues

You have a few choices here initially, and that depends on the kind of connection you have to a network and/or the Internet - or not. You needn't be connected at all if you just wish to test a Website in progress that you're developing on a laptop, for example.

If you have a permanent connection to the Internet that's on 24 hours a day, seven days a week - then you have the most useful Web server environment. This means that other people can get to your site at any time. There are various ways to do this, and that's either through work (make sure it's cool with the boss first), through an ISP as a co-located (CoLo) thing or at home via DSL or cable modem. The two distinguishing things about this type of access is usually the speed at which you connect, being generally faster than by modem and your IP address doesn't change, or changes very seldomly.

A temporary connection, like a regular modem (56K or lower) or ISDN - presents problems if you wish to share your resources with the world. People can only access your material if

Your IP address usually changes with each connection to the Internet through your ISP and you're available only as long as you're online. This makes serving Web pages highly impractical - but it can be done. For example, if you wish you show someone a Website as in a demo or need to share some information - you can work out a time with the other party or parties as to when you're online and give them your IP address.

If you have no connection at all, or a temporary one like the previous situation, you can still run a Web server, if only to work on a "real" or "live" server. You need this if you're developing CGI scripts and programs, or just authoring content and want to preview it through an actual server, rather than just local files - which only works for typical HTML content and not for CGI development.

Whether you're using one of the dynamic DNS services out there or just wish to create a name for yourself, you'll need to register a domain name. One example would be "ibm.com" or "linux.org" - note that "www" is not part of the domain name. The prefix "www" is slowly fading from the norm, as it used to signify that you're connecting to a Web server and not a gopher or ftp server. You could really just make up a name, or use your host name. Say if your machine is called "picasso" and you registered a domain called "painters.org" - your URL would be something like:

     http://picasso.painters.org

However, what you want to do is have a name that doesn't change often, and "www" does make sense. Use it, or not. In the case of a dynamic DNS service, you can use your domain and run scripts on your machine that will update the service with the new IP when you go online. This way, people only need to know your URL, and not your specific IP address each time.

See your distribution's documentation if you need help with getting connected to the Internet via modem or other method or any specific network configuration and commands needed.

Setting Up the Apache Server

Depending on your distribution again, or your installation choices - Apache may already be installed on your system, or only a package away. For most all distributions, Apache is available as a package that's simple to install and get running. This is the easiest and fastest way to proceed. To get more technical and to add options to Apache as well as for tuning reasons, you may want to compile your own server.

Compiling your own server is beyond the scope of this document, and instructions exist in great detail elsewhere. Some links to help get you started are:

In the case of Red Hat for example, Apache is well-integrated into the layout of the system:

Other distributions and even operating systems - or if you compile your own Apache server - place Apache into it's own self-contained directory under /usr/local/apache. Some people prefer the centrally-located approach and others like for Apache to become "part of the operating system." Unlike Microsoft, you can easily install and remove it...

I prefer a combination approach, and that is to keep Apache, it's binaries and associated material in one directory (/usr/local/apache) and have a directory such as /home/httpd/html or even /www (could be a symilnk) contain all the content, logfiles and CGI scripts and programs. The benefit of this approach becomes clear if you have multiple "virtual hosts" running. Using one copy of Apache, you could run multiple, individual Websites quite easily. I do this by having a fully qualified domain name serve as the directory name, for example, "/www/picasso.painter.org" with further directories below that for "cgi-bin," "html" and "logs." It would look like this:

     /www
     /www/cgi-bin
     /www/html
     /www/logs

The other benefit of arranging content like this is that you can have a separate hard drive or partition mounted as /www. If you're looking to make a high-performance or production Web server, this is often the case. I use this on a particular Sun box I maintain, and /www is actually a RAID array that's mounted with the "noatime" option. This option disables Solaris from updating the access time of each file everytime it's accessed - which serves to further speed up your server. Note that this is for Solaris, and I won't get into much more detail about this here.

Starting up your server is either a manual or automatic thing. In the case of Red Hat, and depending on your needs - it starts up automatically everytime you boot. In a production environment, you certainly want this. If you're only using the server for part-time development and want to keep system resource use light, you'll want to turn it on and off yourself as needed.

Under Red Hat, there's a script in /etc/rc.d/init.d called "http" which you use as either "/etc/rc.d/init.d/http start" to start the server or "/etc/rc.d/init.d/http stop" to stop it. You must be root to do this - regular user cannot (but it is possible). In the event you have your Apache stuff in one place as described above, you would look in either the "../apache/bin/" or "../apache/sbin/" directory for a program called "apachectl." This script works just like above - you pass it either a start or stop command such as "apachectl start" for example.

Testing Your Configuration

In many distributions, once the Apache package is installed, it's more-or-less set to go out of the box. All that's needed is to start the server and start populating it with content. You may want to do a little bit of tweaking in the configuration file, but it's generally not necessary to make your introduction.

If you look at the configuration file, usually called "httpd.conf" you will see a huge number of configuration options. They're all detailed very well on Apache's site - and the documentation may also already exist on your system (usually under a directory called "../manual").

One of the more important options is "DocumentRoot" and this tells Apache where to look for your Website files, including all your HTML and images. Another important one is "ScriptAlias" for "cgi-bin" as this specifies where your CGI scripts are located, and are accessed as:

     http://picasso.painter.org/cgi-bin/myscript.cgi

One important thing to remember is that you should always make a backup copy of any configuration file you're about to change. This way, if something goes wrong, you have a fresh copy to start from again if need be. Another important thing to remember is to restart your Apache server before any modifications of the configuration files will be effective. This is accomplished as above with the start/stop commands, but with "restart" instead. You could also stop the server and then start it up again.

Another line you might notice in the config file is one that reads, "Port 80" which specifies the TCP/IP "port number" for Apache to respond to. By default - and the standard is port 80. You could use port 8080, which is common, for testing or whatever reason. This is a simple step to take so that someone hitting your machine won't by default, get a response from your Web server - unless you tell them the port. They could just do a port scan on your machine and find which on it's running on, so it's not a real security measure. It is a way to run multiple servers on one machine besides the virtual host method, or to just take a simple step to keep prying eyes away.

To make editing of your configuration files a little easier, you may wish to take a look at a program I wrote for X that's written in Perl/Tk. It's called TkApache and is a GUI interface to the most common configuration file parameters.

To test whether or not your server is up, you can use a browser, or even simpler, telnet to port 80 (or other, as above) of your machine:

$ telnet localhost 80
Trying 127.0.0.1...
Escape character is '^]'.
GET / HTTP 1.0

HTTP/1.0 200 OK
Date: Sun, 16 Jul 2000 19:09:37 GMT
Server: Apache/1.1.3
Content-type: text/html

Note that YOU actually have to type the line "GET / HTTP 1.0" followed by two carriage returns.

What you'll get in return is the response header "HTTP/1.0 200 OK" which includes the response code "200" - meaning that your server responded as it should. You'll also notice the line "Server: Apache/1.1.3" or similar, depending on your version of Apache. This serves to identify the software you're using to serve your material. All servers do this, and is how Netcraft performs their Web server software survey.

You can also test your Apache server using either "lynx" which is a text-based browser, or Netscape - and going to the URL of:

     http://localhost (may not work)
     http://127.0.0.1 (should work!)

At this point it is assumed you have your server up and running on your machine. The next section will discuss how and where to place your HTML files and graphics and start seeing some Web pages.

Putting Up Your First Content

By looking at the "DocumentRoot" directive in the configuration file, you can determine where you should place your HTML content. Recall from above where we discussed locations such as "/home/httpd/html" in the case of Red Hat, or "/www" is you set your system up that way.

By default, an Apache installation comes with a few files just to let you know things are working. A file called "index.html" (which is the default filename for most Web servers) and "apachepb.gif" (the logo for Apache, which is a feather). If view your local machine in a browser such as Netscape, you'll be presented with a white page bearing the Apache logo and a line similar to, "It worked!"

You might want to save these files to a backup directory somewhere and "wipe the slate clean" in order to populate your server with your own content. One thing that is very often overlooked is the hierarchy in which you layout your directories and their content. For this first introduction, you can safely put everything in one directory; that is, HTML files, graphics, audio or anything else you wish you present.

As your site grows, you'll quickly find the "everything in one directory" approach very limited. For example, you can only have one "index.html" file, which will load by default when you go to the URL which relates to that directory. For example, your document root directory's "index.html" file is the first page people get when they go to the URL:

     http://picasso.painter.org, or
     http://picasso.painter.org/index.html

As you create directories under that top level, you can break your site down into logical chunks - much more efficient in that you can keep like-content together and make navigation and site maps all more logical. One method is:

     /documentroot/html         HTML files
     /documentroot/graphics     graphical images
     /documentroot/sounds       audio files
     /documentroot/...          etc.

While this works, it's often hard to see what files belong together to a document or set of documents, and still presents the problem of having two files with the same name. While there really only needs to be one "index.html" file in your entire site, it helps if you break directories down and place an "index.html" file in each one. This way, the URL can be called as a directory:

     http://picasso.painter.org/paintings

or with a file as in:

     http://picasso.painter.org/paintings/index.html
You could also name your files whatever you wish - and just refer to them this way:

     http://picasso.painter.org/paintings/paintings.html

It's all a matter of taste and how you wish to build the site. Using the multiple-directory approach, you can keep all files associated with that logical block together. This makes organization easy and your Website design uncluttered and neat. As your site grows, you'll thank yourself later.

There are numerous tutorials and books both in print and online out there that show how to write your own HTML. You could also use a GUI-fied program such as Netscape Composer to create your pages. Which you use is up to you and depends on your style. Most often, you might create it with a GUI tool, but hand-tweak it later in a text editor.

To create graphics, you could ues any number of tools and sources. One great program for this is the GIMP - a Photoshop-like program. Text effects, navigation buttons and multi-media images such as vivid JPEG files or animated GIF files are all possible.

Conclusion

By now you should have a pretty good idea of what's involved in getting your first Web server up and running. While it sounds very complicated, in practice it's pretty easy. After you setup two or three, it'll become second nature.

As you get more comfortable with Apache, you'll be editing the configuration files and making Apache operate more the way you want it to. You may start writing your own CGI prorgams or Apache modules. You're only limited by the time you have to spend learning and doing - all the tools are available, they're all free and based on standards. You can go a long way with the Open Source tools available to build a full-on, professional site.

With Apache as your Web server, you'll be assured long and reliable service as it's a solid, configurable and capable Web server.


All images are (C) 1994-2005 by Michael Holve