|
About This Book
I started the Squid project eight years ago while working
at the National Laboratory for Applied Network Research and
the University of California. Back then I certainly enjoyed
writing code and fixing bugs, but always felt bad about the
lack of decent documentation. This book is my attempt to
rectify that situation. It's been a long time coming, and
almost didn't happen. Like they say, "better late than
never!"
This book is written for those of us tasked with setting
up and maintaining one or more Squid caches. If you're new
to Squid, I'll show you how to download, compile, and install
the code. Those of you who have been using Squid for a while
will be more interested in the later chapters, where I'll
talk about disk cache performance, modifying requests, surrogate mode,
caching hierarchies, monitoring Squid, and more.
In order to use this book, you should have basic knowledge
of Unix systems. Many of the examples that I present are
based on free operating systems, such as Linux, FreeBSD,
NetBSD, and OpenBSD. I also have some tips for Solaris
users. If you're more comfortable with Windows systems,
you can use Squid under a Unix emulator, or give the
native NT port a try.
Here's an overview of the book's contents:
-
Chapter 1, Introduction
-
This chapter introduces you to Squid and web caching.
I give a brief history of the project, and a few notes
on our future work. I explain how you can find additional
support and information, including a FAQ, on the Squid
web site.
-
Chapter 2, Getting Squid
-
In this chapter, I explain how and why you should download
Squid's source code. You may prefer to install a pre-compiled
binary or use a pre-configured package. I also talk about
staying up-to-date with Squid by using the anonymous CVS
server.
-
Chapter 3, Compiling and Installing
-
Assuming you've downloaded the source code, this chapter
explains how to configure and compile Squid. In some cases
you may need to tune your system before compiling Squid.
For example, your kernel may have relatively low
filedescriptor limits that affect Squid's performance.
-
Chapter 4, Configuration Guide for the Eager
-
Here, I give you a brief introduction to Squid's
configuration file. If you are the impatient type
and can't wait to start using Squid, this chapter will
leave you with a minimal configuration file that
you can start playing with.
-
Chapter 5, Running Squid
-
In this chapter, I explain how to run Squid for the
first time, and how to test Squid in a terminal window.
Following that, I suggest a number ways to configure your
system so that Squid starts each time it boots. I also
explain how to reconfigure Squid while it is running, and
how to safely shut it down.
-
Chapter 6, All About Access Controls
-
I talk extensively about access controls in this
chapter. Squid has a powerful collection of access
control features and a number of different rule sets that
determine how requests and responses are treated. This is
an important chapter because a mistake in your access controls
may leave your cache, or even internal systems, vulnerable to
abuse from outsiders.
-
Chapter 7, Disk Cache Basics
-
This chapter is about Squid's primary function: storing cached
responses on disk. I explain how to configure the disk
cache, including replacement policies and freshness controls.
I also show you how to manually remove unwanted objects from
the cache.
-
Chapter 8, Advanced Disk Cache Topics
-
In this chapter, I explain how to improve the performance
of Squid's disk cache. I'll talk about Squid's
different storage schemes and a number of filesystem
tuning options that may help. If your Squid cache
handles a relatively light load, then probably don't
need to worry about disk performance.
-
Chapter 9, Interception Caching
-
Here, I explain how to configure Squid for HTTP
interception, sometimes also called transparent caching.
Actually, configuring Squid is the easy part. The
difficulty comes from setting up a router or switch
on your network and the host where Squid is running.
I explain how to configure networking equipment from
Cisco, Alteon, Foundry, and Extreme. I show you how
to configure your operating system (Linux, FreeBSD,
NetBSD, OpenBSD, and Solaris) for HTTP interception.
I also talk about WCCP.
-
Chapter 10, Talking to Other Squids
-
In this chapter, I cover the ins and outs of cache
cooperation, including meshes, arrays, and hierarchies.
You may also find it useful if you simply need to
forward requests from Squid to another proxy or intermediary.
I'll talk about the various inter-cache protocols supported
by Squid (ICP, HTCP, Cache Digests, and CARP), and how
Squid chooses the next-hop location for a given
cache miss.
-
Chapter 11, Redirectors
-
Redirectors are the best way to make Squid rewrite HTTP requests
before forwarding them. I describe the interface between
Squid and a redirector program so that you can write your
own. I also present a few of the more popular third-party
redirectors available.
-
Chapter 12, Authentication Helpers
-
In this chapter, I explain how Squid interfaces with external
authentication databases such as LDAP, NT domain controllers,
and password files. Squid comes with a number of authentication
helpers and understands Basic, Digest, and NTLM authentication
credentials. I also document the API for each, in case you
want to develop your own helper.
-
Chapter 13, Logfiles
-
I cover Squid's various logfiles in this chapter, including
&Accesslog;, &Storelog;, &Cachelog; and others. I explain
what each logfile contains, and how you should periodically
maintain them.
-
Chapter 14, Monitoring Squid
-
This chapter, the longest in the book, provides a lot
of information on monitoring Squid's operation. I cover both
SNMP and Squid's own cache manager interface. You'll
find it useful for both long term monitoring and
short-term problem diagnosis.
-
Chapter 15, Server Accelerator Mode
-
Squid's server accelerator mode is useful in a number of
situations. You can use it to boost your origin server's
poor performance, as a "firewall" to protect the server,
or even to build your own content delivery network. I show you
how to set up Squid and make sure that outsiders cannot
abuse your service.
-
Chapter 16, Debugging and Troubleshooting
-
The book's final chapter explains how to debug and
troubleshoot problems with Squid. You may find that
some sites, or some user-agents, don't work properly
with Squid. I show you how to isolate and reproduce
the problem, and how to present the information to
Squid developers for assistance.
-
Appendix A, Config File Reference
-
This appendix is a reference guide for each of Squid's
200 configuration file directives. For each one I provide
a description, syntax, defaults, and examples.
-
Appendix B, The Memory Cache
-
This brief appendix explains a little about Squid's
memory cache.
-
Appendix C, Delay Pools
-
You can use Squid's delay pools feature to limit bandwidth
consumed by web surfers. I explain how the delay pools work
and provide a number of example configurations.
-
Appendix D, Filesystem Performance Benchmarks
-
In this appendix, I present the results of numerous filesystem
benchmarks. These may help you make informed decisions regarding
particular operating systems, filesystem features, and Squid's
storage techniques.
-
Appendix E, Squid on Windows
-
Have a look at this appendix if you'd like to run Squid
on your Windows box. I talk about using Cygwin and about
a native port of Squid, called SquidNT.
-
Appendix F, Configuring Squid Clients
-
This appendix contains information on how to configure various
user-agents to use Squid. I talk about manual configuration,
environment variables, Proxy Auto-Configuration functions, and
the Web Proxy Auto Discovery protocol.
As I'm finishing up this book, the latest stable version is
Squid-2.5.STABLE4, and the development version is Squid-3.0.
Perhaps the most important difference between the two is that
Squid-3 is being rewritten in C++. You should find that most
things are backwards compatible, although a few new configuration
directives have been created. Please read the release notes
carefully if you are using Squid-3.0 or later.
I have created a web site for the book, located at
http://squidbook.org/.
There, you will find errata, supplemental information,
and links to online resources.
Topics Not Covered
Due to a lack of time and space, there are some topics which
I was unable to cover in this book. For example:
Non-HTTP protocols
You'll find that I mostly talk about HTTP, even
though Squid also supports FTP, Gopher, and some
other relatively obscure protocols.
Customizing error messages
Squid's error messages can be customized and the
source distribution includes versions of the error
messages in a number of different languages. You
can probably figure out out to customize the error
messages by modifying the default pages or by reading
Squid's source code.
Load-balancing Squids
Load-balancing is a popular way to increase the
capacity of a caching service. Refer to one of the
load balancing books mentioned in the following
section if necessary.
What is cachable
HTTP has a number of somewhat complicated rules for
determining what may, or may not be, cached, and
for how long. Refer to Web Caching
or HTTP: The Definitive
Guide.
Copyright
A number of non-technical issues surround web
caching. These include copyrights and privacy.
Refer to Web Caching for
more information.
Modifying the source
I'll have very little to say about Squid's source
code in this book. The Squid project hosts a
programmers guide, which is generally incomplete
and out-of-date. If you have questions about the
source code, please join the squid-dev
mailing list.
SOCKS
Squid does not support the SOCKS protocol at this
time.
Recommended Reading
While reading this book, you may want to consult some of
these other resources for more information.
-
Web Caching by Duane Wessels
(O'Reilly and Associates)
-
HTTP: The Definitive Guide by
David Gourley Brian Totty
(O'Reilly and Associates)
-
DNS and BIND, 4th Edition by
Paul Albitz and Cricket Liu (O'Reilly and Associates)
-
Mastering Regular Expressions, 2nd Edition
by Jeffrey E. F. Friedl (O'Reilly and Associates)
-
Unix System Administration Handbook
and
Linux System Administration Handbook
by Evi Nemeth, Garth Snyder, Scott Seebass, and Trent R. Hein
(Prentice Hall)
-
Server Load Balancing
by Tony Bourke (O'Reilly and Associates)
-
Load Balancing Servers, Firewalls, and
Caches by Chandra Koopurapu, (John Wiley
& Sons)
-
RFC 1413: Identification Protocol
-
RFC 1738: Uniform Resource Locators (URL)
-
RFC 2186: Internet Cache Protocol (ICP), version
2
-
RFC 2187: Application of Internet Cache Protocol
(ICP), version 2
-
RFC 2396: Uniform Resource Identifiers (URI): Generic
Syntax
-
RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1
-
RFC 2617: HTTP Authentication: Basic and Digest
Access Authentication
-
RFC 2756: Hypertext Caching Protocol
-
RFC 3040: Internet Web Replication and Caching
Taxonomy
-
RFC 3143: Known HTTP Proxy/Caching Problems
-
Caching-related web sites, such as
http://www.caching.com/
and
http://www.web-cache.com/.
Acknowledgments
Looking back at the events and people that allowed me to
write this book makes me feel extremely humble and grateful.
I'm so happy to have been a part of the Harvest project
with Mike Schwartz, Peter Danzig, and the others. That led
directly to my work with kc claffy and Hans-Werner Braun
at NLANR/UCSD. The Squid project would have never been at
all without their support, and the grant from the National
Science Foundation.
I'm also very thankful for all the hard work put in by the
small crew of Squid developers: Henrik Nordström,
Robert Collins, Adrian Chadd, and everyone else who has
contributed time and code to the project. And I'm sorry
that you ever had to read and/or fix any of the ugly code
that I wrote.
To all the reviewers who read the drafts—Joe Cooper,
Scott Pepple, Robert Collins, and Adrian Chadd—thanks
for finding my mistakes and suggesting ways to make the
book better. I also owe so much to the people at O'Reilly
for making the book possible, and for making it all come
together. My editors Tatiana Diaz and Nat Torkington, the
production editor Mary Anne Mayo, the graphic designer Melanie Wang,
the illustrator, Rob Romano, the XML mungers Andrew Savikas and
Joe Wizda, and
the countless other folks working behind the scenes for
me.
To my good friend, and business partner, Alex Rousskov:
thanks for giving me the time and freedom to see this little
project through. Finally, to the members of my new family,
Annie and Blooey, thanks for putting up with the late nights.
Can I make it up to you with extra back scratches?
|