Quick HOWTO : Ch32 : Controlling Web Access with Squid

From Linux Home Networking
Revision as of 23:40, 8 January 2011 by Admin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Introduction

In a vote, a proxy is a single person who represents the interests of many others and votes on their behalf. For example, in the United States, a single vote from a senator represents all the voters in his or her state.

Squid is a proxy in a different sense. It aggregates the requests of many web surfers that use it into a single stream of requests. When the Squid server aggregates multiple outbound connections, it is called a proxy. When it aggregates multiple inbound connections it is called a reverse proxy. This is also called “accelerator mode”. There are many reasons to configure Squid to function in either role. Some of them will be discussed next.

Reasons to Create a Squid Proxy

Two important goals of many small businesses are to:

  • Reduce Internet bandwidth charges
  • Limit access to the Web to only authorized users.

The Squid web caching proxy server can achieve these fairly easily.

Users configure their web browsers to use the Squid proxy server instead of going to the web directly. The Squid server then checks its web cache for the web information requested by the user. It will return any matching information that finds in its cache, and if not, it will go to the web to find it on behalf of the user. Once it finds the information, it will populate its cache with it and also forward it to the user's web browser.

As you can see, this reduces the amount of data accessed from the web. Another advantage is that you can configure your firewall to only accept HTTP web traffic from the Squid server and no one else. Squid can then be configured to request usernames and passwords for each user that users its services. This provides simple access control to the Internet.

Reasons to Create a Squid Reverse Proxy

The Apache web server distributes its load across multiple sister threads. When the number of queries from web surfers gets too high, more httpd short lived thread processes are created to handle the increased connections. The creation of the threads is usually CPU intensive and can make your server sluggish in extreme cases.

Squid can reduce the need to create these threads by aggregating incoming requests from multiple web surfers and convert them into a single stream encapsulated in a single connection.

In the reverse proxy configuration, Squid caches the Apache httpd responses in memory. It will respond with this stored data instead of querying Apache whenever possible.

The combination of caching and reverse proxying makes Squid an asset in reducing load, and increasing the responsiveness of your Apache server.

Download and Install The Squid Package

Most RedHat and Fedora Linux software product packages are available in the RPM format, whereas Debian and Ubuntu Linux use DEB format installation files. When searching for these packages remember that the filename usually starts with the software package name and is followed by a version number, as in squid-3.1.9-3.fc14.i686.rpm. (For help on downloading and installing the package, see Chapter 6, "Installing Linux Software").

Starting Squid

The methodologies vary depending on the variant of Linux you are using as you’ll see next.

Fedora / CentOS / RedHat

With these flavors of Linux you can use the chkconfig command to get squid configured to start at boot:

[root@bigboy tmp]# chkconfig squid on

To start, stop, and restart squid after booting use the service command:

[root@bigboy tmp]# service squid start
[root@bigboy tmp]# service squid stop
[root@bigboy tmp]# service squid restart

To determine whether squid is running you can issue either of these two commands. The first will give a status message. The second will return the process ID numbers of the squid daemons.

[root@bigboy tmp]# service squid status
[root@bigboy tmp]# pgrep spam

Note: Remember to run the chkconfig command at least once to ensure squid starts automatically on your next reboot.

Ubuntu / Debian

With these flavors of Linux the commands are different. Try installing the sysv-rc-conf and sysvinit-utils DEB packages as they provide commands that simplify the process. For help on downloading and installing the packages, see Chapter 6, "Installing Linux Software".) You can use the sysv-rc-conf command to get squid configured to start at boot:

user@ubuntu:~$ sudo sysv-rc-conf squid on

To start, stop, and restart squid after booting the service command is the same:

user@ubuntu:~$ sudo service squid start
user@ubuntu:~$ sudo service squid stop
user@ubuntu:~$ sudo service squid restart

To determine whether squid is running you can issue either of these two commands. The first will give a status message. The second will return the process ID numbers of the squid daemons.

user@ubuntu:~$ sudo service squid status
user@ubuntu:~$ pgrep squid

Note: Remember to run the sysv-rc-conf command at least once to ensure squid starts automatically on your next reboot.

Squid Configuration Files

You can define most of Squid’s configuration parameters in the squid.conf file which may be located in either the /etc or /etc/squid directory depending on your version of Linux.

Remember to restart Squid after you make any changes to your configuration files. This is the only way to activate the new settings.

General Squid Configuration Guidelines

Each Squid server in your administrative zone has to be uniquely identifiable by either its hostname listed in the /etc/hosts file or the value set in the visible_hostname directive in squid.conf. This is especially important in more complex configurations where clusters of Squid servers pool their resources in order to achieve some common caching goal.

Your /etc/hosts file should be configured with your server’s hostname at the end of the localhost line. In this example the server name “bigboy” has been correctly added.

# File: /etc/hosts
127.0.0.1   localhost localhost.localdomain bigboy

If you want to give your Squid process a name that is different from your hostname, then add the visible_hostname directive to your squid.conf file. In this example, we give the server the hostname “cache-001”.

# File: squid.conf
visible_hostname cache-001

Misconfigured Squid instances will give an error like this when the hostname isn’t correctly defined

WARNING: Could not determine this machines public hostname. Please configure one or set 'visible_hostname'.

Now it’s time to configure proxies and reverse proxies.

Configuring Squid Proxies

Squid offers many options to manage the access to the web for security, legal, resource utilization reasons. We’ll cover a few of these in the sections that follow.

Access Control Lists

You can limit users' ability to browse the Internet with access control lists (ACLs). Each ACL line defines a particular type of activity, such as an access time or source network, they are then linked to an http_access statement that tells Squid whether or not to deny or allow traffic that matches the ACL.

Squid matches each Web access request it receives by checking the http_access list from top to bottom. If it finds a match, it enforces the allow or deny statement and stops reading further. You have to be careful not to place a deny statement in the list that blocks a similar allow statement below it. The final http_access statement denies everything, so it is best to place new http_access statements above it

Note: The very last http_access statement in the squid.conf file denies all access. You therefore have to add your specific permit statements above this line. In the chapter's examples, I've suggested that you place your statements at the top of the http_access list for the sake of manageability, but you can put them anywhere in the section above that last line.

Squid has a minimum required set of ACL statements in the ACCESS_CONTROL section of the squid.conf file. It is best to put new customized entries right after this list to make the file easier to read.

Restricting Web Access By Time

You can create access control lists with time parameters. For example, you can allow only business hour access from the home network, while always restricting access to host 192.168.1.23.

#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl RestrictedHost src 192.168.1.23

#
# Add this at the top of the http_access section of squid.conf
#
http_access deny RestrictedHost
http_access allow home_network business_hours

Or, you can allow morning access only:

#
# Add this to the bottom of the ACL section of squid.conf
#
acl mornings time 08:00-12:00
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow mornings

Restricting Access to specific Web sites

Squid is also capable of reading files containing lists of web sites and/or domains for use in ACLs. In this example we create to lists in files named /usr/local/etc/allowed-sites.squid and /usr/local/etc/restricted-sites.squid.

# File: /usr/local/etc/allowed-sites.squid
www.openfree.org
linuxhomenetworking.com

# File: /usr/local/etc/restricted-sites.squid
www.porn.com
illegal.com

These can then be used to always block the restricted sites and permit the allowed sites during working hours. This can be illustrated by expanding our previous example slightly.

#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl GoodSites dstdomain "/usr/local/etc/allowed-sites.squid"
acl BadSites  dstdomain "/usr/local/etc/restricted-sites.squid"

#
# Add this at the top of the http_access section of squid.conf
#
http_access deny BadSites
http_access allow home_network business_hours GoodSites

Restricting Web Access By IP Address

You can create an access control list that restricts Web access to users on certain networks. In this case, it's an ACL that defines a home network of 192.168.1.0.

#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/255.255.255.0

You also have to add a corresponding http_access statement that allows traffic that matches the ACL:

#
# Add this at the top of the http_access section of squid.conf
#
http_access allow home_network

Password Authentication Using NCSA

You can configure Squid to prompt users for a username and password. Squid comes with a program called ncsa_auth that reads any NCSA-compliant encrypted password file. You can use the htpasswd program that comes installed with Apache to create your passwords. Here is how it's done:

1) Create the password file. The name of the password file should be /etc/squid/squid_passwd, and you need to make sure that it's universally readable.

[root@bigboy tmp]# touch /etc/squid/squid_passwd
[root@bigboy tmp]# chmod o+r /etc/squid/squid_passwd

2) Use the htpasswd program to add users to the password file. You can add users at anytime without having to restart Squid. In this case, you add a username called www:

[root@bigboy tmp]# htpasswd /etc/squid/squid_passwd www
New password:
Re-type new password:
Adding password for user www
[root@bigboy tmp]#

3) Find your ncsa_auth file using the locate command.

[root@bigboy tmp]# locate ncsa_auth
/usr/lib/squid/ncsa_auth
[root@bigboy tmp]#

4) Edit squid.conf; specifically, you need to define the authentication program in squid.conf, which is in this case ncsa_auth. Next, create an ACL named ncsa_users with the REQUIRED keyword that forces Squid to use the NCSA auth_param method you defined previously. Finally, create an http_access entry that allows traffic that matches the ncsa_users ACL entry. Here's a simple user authentication example; the order of the statements is important:

#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd
 
#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users

5) This requires password authentication and allows access only during business hours. Once again, the order of the statements is important:

#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd
 
#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
acl business_hours time M T W H F 9:00-17:00

#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users business_hours

Remember to restart Squid for the changes to take effect.

Enforcing The Use of Your Squid Forward Proxy Server

If you are using access controls on Squid, you may also want to configure your firewall to allow only HTTP Internet access to only the Squid server. This forces your users to browse the Web through the Squid proxy.

Making Your Squid Server Transparent To Users

It is possible to limit HTTP Internet access to only the Squid server without having to modify the browser settings on your client PCs. This called a transparent proxy configuration. It is usually achieved by configuring a firewall between the client PCs and the Internet to redirect all HTTP (TCP port 80) traffic to the Squid server on TCP port 3128, which is the Squid server's default TCP port.

Squid Transparent Proxy Configuration

Your first step will be to modify your squid.conf to create a transparent proxy. The procedure is different depending on your version of Squid.

Prior to version 2.6: In older versions of Squid, transparent proxy was achieved through the use of the httpd_accel options which were originally developed for http acceleration. In these cases, the configuration syntax would be as follows:

httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

Version 2.6 to 3.0: These versions versions of Squid simply require you to add the word "transparent" to the default "http_port 3128" statement. In this example, Squid not only listens on TCP port 3128 for proxy connections, but will also do so in transparent mode.

http_port 3128 transparent

Version 3.1+: Newer versions of Squid also add the “intercept” keyword to the "http_port 3128" statement when transparent proxying uses an HTTP redirect. If redirection isn’t being used the “transparent” keyword is still used. Here is an example:

http_port 3128 intercept

Or

http_port 3128 transparent

Note: Remember to restart Squid for the changes to take effect

Configuring iptables to Support the Squid Transparent Proxy

The examples below are based on the discussion of Linux iptables in Chapter 14, "Linux Firewalls Using iptables". Additional commands may be necessary for you particular network topology.

In both cases below, the firewall is connected to the Internet on interface eth0 and to the home network on interface eth1. The firewall is also the default gateway for the home network and handles network address translation on all the network's traffic to the Internet.

Only the Squid server has access to the Internet on port 80 (HTTP), because all HTTP traffic, except that coming from the Squid server, is redirected.

Squid Server and Firewall – Same Server (HTTP Redirect)

If the Squid server and firewall are the same server, all HTTP traffic from the home network is redirected to the firewall itself on the Squid port of 3128 and then only the firewall itself is allowed to access the Internet on port 80.

iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \
        -j REDIRECT --to-port 3128
iptables -A INPUT -j ACCEPT -m state \
        --state NEW,ESTABLISHED,RELATED -i eth1 -p tcp \
        --dport 3128
iptables -A OUTPUT -j ACCEPT -m state \
        --state NEW,ESTABLISHED,RELATED -o eth0 -p tcp \
        --dport 80
iptables -A INPUT -j ACCEPT -m state \
        --state ESTABLISHED,RELATED -i eth0 -p tcp \
        --sport 80
iptables -A OUTPUT -j ACCEPT -m state \
        --state ESTABLISHED,RELATED -o eth1 -p tcp \
        --sport 80

Note: This example is specific to HTTP traffic. You won't be able to adapt this example to support HTTPS web browsing on TCP port 443, as that protocol specifically doesn't allow the insertion of a "man in the middle" server for security purposes. One solution is to add IP masquerading statements for port 443, or any other important traffic, immediately after the code snippet. This will allow non HTTP traffic to access the Internet without being cached by Squid.

Squid Server and Firewall – Different Servers

If the Squid server and firewall are different servers, the statements are different. You need to set up iptables so that all connections to the Web, not originating from the Squid server, are actually converted into three connections; one from the Web browser client to the firewall and another from the firewall to the Squid server, which triggers the Squid server to make its own connection to the Web to service the request. The Squid server then gets the data and replies to the firewall which then relays this information to the Web browser client. The iptables program does all this using these NAT statements:

iptables -t nat -A PREROUTING -i eth1 -s ! 192.168.1.100 \
        -p tcp --dport 80 -j DNAT --to 192.168.1.100:3128
iptables -t nat -A POSTROUTING -o eth1 -s 192.168.1.0/24 \
        -d 192.168.1.100 -j SNAT --to 192.168.1.1
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.100 \
        -i eth1 -o eth1 -m state 
         --state NEW,ESTABLISHED,RELATED \
        -p tcp --dport 3128 -j ACCEPT
 iptables -A FORWARD -d 192.168.1.0/24 -s 192.168.1.100 \
        -i eth1 -o eth1 -m state --state ESTABLISHED,RELATED \
        -p tcp --sport 3128 -j ACCEPT

In the first statement all HTTP traffic from the home network except from the Squid server at IP address 192.168.1.100 is redirected to the Squid server on port 3128 using destination NAT. The second statement makes this redirected traffic also undergo source NAT to make it appear as if it is coming from the firewall itself. The FORWARD statements are used to ensure the traffic is allowed to flow to the Squid server after the NAT process is complete. The unusual feature is that the NAT all takes place on one interface; that of the home network (eth1).

You will additionally have to make sure your firewall has rules to allow your Squid server to access the Internet on HTTP TCP port 80 as covered in Chapter 14, "Linux Firewalls Using iptables".

Manually Configuring Web Browsers To Use Your Squid Server

If you don't have a firewall that supports redirection, then you need to configure your firewall to only accept HTTP Internet access from the Squid server. You will also need to configure your browser's proxy server settings to use the Squid server. The method you use depends on your browser and the process will vary.

Make sure the proxy server used is the IP address or fully qualified domain name (Example: proxy.my-site.com) and that the TCP port to use is 3128, the Squid default.

Your Squid configuration should now be complete.

Configuring Squid Reverse Proxies

This requires configuring both Squid and Apache. Here are the steps you need to follow.

Squid Configuration

Setting up the Squid reverse proxy is easy, but not straight forward. There are a lot of things to take into consideration and we’ll discuss them next. 1. The first step is to use the http_port directive to define some key elements of the configuration.

 http_port 99.184.206.67:80 accel ignore-cc defaultsite=www.my-site.com vhost

First we define the IP address (99.184.206.67) and port (80) on which squid should be running. The ICP protocol, which will be discussed later, uses Cache-Control headers to communicate between inter-related caches. This is not required and the ignore-cc option disables this feature.

If a web browser visits the server using an IP address, or the website isn’t defined later on in the our_sites ACL, then the site defined in the defaultsite option will be displayed. In this case it’s www.my-site.com. As the server is also acting as a virtual host, running multiple websites, we have to mention this too with the vhost option.

Finally, accelerator, or reverse proxy mode is defined with the accel option.

2. Define an ACL for all the sites hosted on your server with the dstdomain option, then allow users to access them using the http_access directive

acl our_sites dstdomain www.my-site.com my-site.com www.my-other-site.com my-other-site.com
http_access allow our_sites

3. Use the cache_peer directive to define the server that is going to cache the content. The syntax is as follows:

# !!! NOTE !!! 
#
# Do not use this line in your configuration

cache_peer hostname type http-port icp-port [options]

We will now review each of these command options switches which will be necessary to configure the reverse proxy on your server. Here is the real example.

# Use this line in your configuration
cache_peer 127.0.0.1 parent 80 0 no-query originserver name=myAccel

In this case the cache_peer is Squid running on localhost.

Caches can have parents and children. When a child doesn’t have content to serve, it will refer to the parent as a caching source of last resort using the ICP protocol. In this example, there is only one cache, so we have to define localhost as the parent with the parent keyword. ICP is also disabled with the no-query keyword.

The TCP port on which the localhost Apache origin server will be listening listening is port 80. ICP is further disabled by defining it as listening on port 0.

When a parent cache cannot find its content, it gets its data to replenish its cache with newer content from an origin server, this will be localhost too.

The myAccel name will be used later to define the sites that will be cached on the server.

4. The cache_peer_access directive is now used to allow all the sites listed in the our_sites ACL to be cached and deny all others.

cache_peer_access myAccel allow our_sites
cache_peer_access myAccel deny all

5. Every HTTP request states the IP address of the client in the header. If HTTP request also happens to go through a proxy server, the proxy server will create an additional X-Forwarded-For header to which it will add the remote client’s IP address as well as its own. This command will truncate the list from any such remote clients so that only a single IP address is in the list, the last one of the last proxy server.

forwarded_for truncate

Squid being a proxy server will then add its own IP address (127.0.0.1) to the X-Forwarded-For list. The Apache logging will have to be modified to truncate this IP address. This will be covered later.

Note: If your Apache logs show two IP addresses or more for the clients when using Squid, you probably have left out this forward_for step.

6. Though we have previously disabled ICP, Squid will still send cache related HTTP headers to web clients. This will need to be disabled.

acl localnet src 99.184.206.64/27
via off
reply_header_access X-Cache-Lookup deny !localnet
reply_header_access X-Squid-Error deny !localnet
reply_header_access X-Cache deny !localnet

Here we define the local network as 99.184.206.64/27 and then we disable the headers to the remote users.

7. We don’t want to cache the results. This can cause issues with some static content. For example, if the first access to your website occurs when it is under maintenance there will be a “Closed” page. If this is cached, this page will always be displayed till the cache expires. Disabling caching eliminates this problem.

# Do not allow caching memory structures to be created
memory_pools off

# Turn off caching
cache deny all

8. The very last configuration step is to search the remainder of your squid.conf file and comment out the http_port command that may have been configured previously. Remember, Squid would have been configured to listen on its default TCP 3128 port, this was set to TCP port 80 before.

# Squid normally listens to port 3128
#http_port 3128

9. Finally. Shutdown apache and restart Squid. Also make sure Squid will start automatically when you reboot.

The next phase will require configuring Apache to work with the newly activated Squid daemon.

Apache Configuration

You will now have to configure Apache to not only work with Squid, but also to listen on localhost instead of the network interface. Here are the steps to follow.

1. In your Apache conf.d directory add a file with the following commands.

The first section makes sure Apache listens on the localhost address and not that of the network interface. It also makes sure your virtual hosts will do the same.

The next section defines how logging will be handled when X-Forwarded-For entries are present. Notice it will only be used if the combined_squid condition is met. This condition will be defined later.

The final section tells Apache how to handle Squid cache headers. If there is no X-Forwarded-For header (represented by the regular expression ^$) then the logging is assigned the normal_request variable. If there is an X-Forwarded-For header (represented by the regular expression .+) then the logging is assigned the squid_request variable.

#
# Define listening IP addresses
#

Listen 127.0.0.1:80
NameVirtualHost 127.0.0.1:80

#
# Squid Logging format
#

LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined_squid

#
# Set up how you handle Squid cache headers
#

SetEnvIf X-Forwarded-For    "^$"         normal_request
SetEnvIf X-Forwarded-For    ".+"         squid_request

2. The final steps are to first change your VirtualHost entries to refer to 127.0.0.1:80, then we tell Apache to use the combined_squid format if Squid passes the logging data to Apache with the X-Forwarded-For header as flagged when the squid_request variable is set.

<VirtualHost 127.0.0.1:80>
 
    CustomLog   logs/access_log combined_squid env=squid_request
    CustomLog   logs/access_log combined env=normal_request

</VirtualHost>

Note: Remember to restart Squid for the changes to take effect.

With this last step your server will be ready to operate your website. It should be noticeably faster and make the user experience more pleasant.

Squid Disk Usage

Squid uses the /var/spool/squid directory to store its cache files. High usage squid servers need a large amount of disk space in the /var partition to get optimum performance.

Every webpage and image accessed via the Squid server is logged in the /var/log/squid/access.log file. This can get quite large on high usage servers. Fortunately, the logrotate program automatically purges this file.

Troubleshooting Squid

Squid logs both informational and error messages to files in the /var/log/squid/ directory. It is best to review these files first whenever you have difficulties.The squid.out file can be especially useful as it contains Squids' system errors.

Another source of errors could be unintended statements in the squid.conf file that cause no errors; mistakes in the configuration of hours of access and permitted networks that were forgotten to be added are just two possibilities.

By default, Squid operates on port 3128, so if you are having connectivity problems, you'll need to follow the troubleshooting steps in Chapter 4, "Simple Network Troubleshooting", to help rectify them.

Note: Some of Squid's capabilities go beyond the scope of this book, but you should be aware of them. For example, for performance reasons, you can configure child Squid servers on which certain types of content are exclusively cached. Also, you can restrict the amount of disk space and bandwidth Squid uses.

Conclusion

Tools such as Squid are popular with many company mangers. By caching images and files on a server shared by all, Internet bandwidth charges can be reduced.

Squid's password authentication feature is well liked because it allows only authorized users to access the Internet as a means of reducing usage fees and distractions in the office. Unfortunately, an Internet access password is usually not viewed as a major security concern by most users who are often willing to share it with their colleagues. Although it is beyond the scope of this book, you should consider automatically tying the Squid password to the user's regular login password. This will make them think twice about giving their passwords away. Internet access is one thing, letting your friends have full access to your e-mail and computer files is quite another.