# 译 - Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Web Proxy之间的共享缓存是减少Web流量并缓解网络瓶颈的一项重要技术。然而，由于现有协议的开销，它并未得到广泛部署。在本文中，我们演示了缓存共享的好处，衡量了现有协议的开销，并提出了一种称为"摘要缓存’'的新协议。在这个新协议中，每个Proxy都保留了一个包含所有Proxy的缓存摘要目录，并在任何查询之前都要检查在这些摘要之中是否存在潜在的匹配项。有两个因素利于我们协议的低开销：摘要的定期更新以及十分简朴的目录信息，每个条目只有 8bits 。通过使用跟踪驱动的仿真和原型实现，我们证明了与现有的协议（例如 Internet 的缓存协议ICP）相比，"摘要缓存"将 缓存间协议消息的数量减少了25到60 ，带宽消耗 减少了超过50%消除了75%到95%的CPU处理协议开销 ，同时 保持了与ICP几乎相同的缓存命中率 。因此"摘要缓存"可以扩展到大量的Proxy。

## 介绍

Web共享缓存的概念最初是在Harvest项目中被提出的[26,12]。Harvest小组设计了Internet缓存协议（ICP）[18]，该协议支持从相邻的缓存中发现和检索文档。如今，很多机构和很多国家都建立了代理缓存的层次结构，这些层次结构通过ICP进行合作已减少Internet流量[25,30,41,5,14]。

## 轨迹和模拟

Traces DEC UCB UPisa Questnet NLANR
Time(时间) 8/29-9/4, 1996 9/14-9/19, 1996 Jan-March, 1997 1/15-1/21, 1998 12/22, 1997
Requests(请求数) 3543968 1907762 2833624 2885285 1766409
Infinite Cache Size(无限缓存大小) 2.88e+10 1.80e+10 2.07e+10 2.33e+10 1.37e+10
Maximum Hit Ratio(最大命中率) 0.49 0.30 0.40 0.30 0.36
Maximum ByteHit Ratio(最大字节命中率) 0.36 0.14 0.27 0.15 0.27
Client Population(客户人数) 10089 5780 2203 12 4
Client Groups(客户群) 16 8 8 12 4

• DEC跟踪[32]：Digital Equipment Corporation Web代理服务器跟踪，为大约 17,000个工作站 提供服务。跟踪持续25天（1996年8月29日至9月21日）。我们将跟踪分为三个1周和1个半周的跟踪。由于交换空间的限制，我们的模拟器只能模拟子迹线。在本文中，我们介绍了 1996年8月29日9月4日 这一周的迹线结果。其他迹线的结果非常相似。

• UCB跟踪[24]：从UC Berkeley向其学生，教职员工提供的家庭IP服务收集的HTTP请求的跟踪。从1996年11月1日到11月19日，总迹线为期18天，并分为四个子迹线，每四或五天一次。我们在11月14日至11月19日的跟踪中显示结果。尽管该跟踪最初记录了2,468,890个请求，但其中许多响应数据大小为0或1，因此我们决定忽略这些请求。同样，我们在UCB集合中的其他迹线上进行了仿真，结果与此处介绍的相似。

• UPisa跟踪[43]：意大利比萨大学计算机科学系的用户在1997年1月至3月的三个月内对HTTP请求的跟踪。在跟踪中，我们仅模拟GET请求，并且仅网址不包含查询字符串的用户，因为大多数代理不缓存查询请求。

• Questnet追踪[47]：1998年1月15日至1月21日，父级代理在Questnet（澳大利亚的区域网络）中看到的HTTP请求日志为7天，这些代理服务于大约12个父级代理区域网络中的子代理。我们提取父代理看到的成功GET请求。因此，跟踪只是进入十个代理的用户请求的子集。不幸的是，用户对代理的完整请求集不可用。

• NLANR跟踪 [40]：由NLANR（应用网络研究国家实验室）向国家高速缓存层次结构中的四个主要父代理高速缓存发送HTTP请求的一日日志（1997年12月22日）。国家缓存层次结构中大约有八个代理，但是只有四个代理（“ bo”，“ pb”，“ sd”和“ uc”） 处理来自.com ，.net，.edu和其他主要领域。因此，我们决定仅模拟对四个代理的请求。

## 缓存共享的好处

• 无缓存共享 ：代理通过不协作的方式来服务彼此的缓存未命中；
• 简单的缓存共享 ：代理服务彼此的缓存未命中。代理从另一个代理中获取文档后，便会在本地缓存该文档。代理通过不协调的方式进行缓存替换。这是由ICP协议实现的共享；
• 单副本缓存共享 ：代理服务彼此的缓存未命中，但是一个代理不缓存从另一个代理获取的文档。而是，另一个代理将文档标记为最近访问的文档，并增加了其缓存优先级。与 简单的缓存共享 相比，此方案消除了重复副本的存储并提高了可用缓存空间的利用率；
• 全局缓存 ：代理共享缓存内容并协调替换，以便它们显示为一个统一的缓存，对用户而言具有全局LRU替换。这是协作缓存的完全协调形式。我们通过假设所有请求都转到一个缓存的大小来模拟该方案，该缓存的大小是所有代理缓存大小的总和；.

## ICP的开销

Exp 1 Hit Ratio Client Latency User CPU System CPU UDP Msgs TCP Msgs Total Packets
no ICP 25% 2.75 (5%) 94.42 (5%) 133.65 (6%) 615 (28%) 334K (8%) 355K(7%)
ICP 25% 3.07 (0.7%) 116.87 (5%) 146.50 (5%) 54774 (0%) 328K (4%) 402K (3%)
Overhead 12% 24% 10% 9000% 2% 13%
SC-ICP 25% 2.85 (1%) 95.07 (6%) 134.61 (6%) 1079 (0%) 330K (5%) 351K (5%)
Overhead 4% 0.7% 0.7% 75% -1% -1%
Exp 2 Hit Ratio Client Latency User CPU System CPU UDP Msgs TCP Msgs Total Packets
no ICP 45% 2.21 (1%) 80.83 (2%) 111.10 (2%) 540 (3%) 272K (3%) 290K (3%)
ICP 45% 2.39 (1%) 97.36 (1%) 118.59 (1%) 39968 (0%) 257K (2%) 314K (1%)
Overhead 8% 20% 7% 7300% -1% 8%
SC-ICP 45% 2.25 (1%) 82.03 (3%) 111.87 (3%) 799 (5%) 269K (5%) 287K (5%)
Overhead 2% 1% 1% 48% -1% -1%

Internet缓存协议（ICP）[18]在鼓励世界各地的Web缓存共享实践方面非常成功。 它需要代理之间的松散协调，并且基于UDP构建以提高效率。 它是由Harvest研究小组[26]设计的，并得到了公共领域Squid [19]代理软件和当今一些商业产品的支持。 随着Squid代理在全球的部署，ICP被全球很多国家广泛使用，以减少跨大西洋和跨太平洋链接的流量。

## 摘要缓存

• 假的未命中：所请求的文档被缓存在其他代理服务器上，但是其摘要未反映该事实。 在这种情况下，不利用远程高速缓存命中，并且降低了高速缓存集合中的总命中率。

• 假的命中：所请求的文档未在其他代理处缓存，但其摘要表明已缓存。 代理将向另一个代理发送查询消息，仅通知该文件未缓存在该代理中。 在这种情况下，浪费了查询消息。

### 布隆过滤器-数学

m/n k k=1 k=2 k=3 k=4 k=5 k=6 k=7 k=8
2 1.39 0.393 0.400
3 2.08 0.283 0.237 0.253
4 2.77 0.221 0.155 0.147 0.160
5 3.46 0.181 0.109 0.092 0.092 0.101
6 4.16 0.154 0.0804 0.0609 0.0561 0.0578 0.0638
7 4.85 0.133 0.0618 0.0423 0.0359 0.0347 0.0364
8 5.55 0.118 0.0489 0.0306 0.024 0.0217 0.0216 0.0229
9 6.24 0.105 0.0397 0.0228 0.0166 0.0141 0.0133 0.0135 0.0145
10 6.93 0.0952 0.0329 0.0174 0.0118 0.00943 0.00844 0.00819 0.00846
11 7.62 0.0869 0.0276 0.0136 0.00864 0.0065 0.00552 0.00513 0.00509
12 8.32 0.08 0.0236 0.0108 0.00646 0.00459 0.00371 0.00329 0.00314
13 9.01 0.074 0.0203 0.00875 0.00492 0.00332 0.00255 0.00217 0.00199
14 9.7 0.0689 0.0177 0.00718 0.00381 0.00244 0.00179 0.00146 0.00129
15 10.4 0.0645 0.0156 0.00596 0.003 0.00183 0.00128 0.001 0.000852
16 11.1 0.0606 0.0138 0.005 0.00239 0.00139 0.000935 0.000702 0.000574
17 11.8 0.0571 0.0123 0.00423 0.00193 0.00107 0.000692 0.000499 0.000394
18 12.5 0.054 0.0111 0.00362 0.00158 0.000839 0.000519 0.00036 0.000275
19 13.2 0.0513 0.00998 0.00312 0.0013 0.000663 0.000394 0.000264 0.000194
20 13.9 0.0488 0.00906 0.0027 0.00108 0.00053 0.000303 0.000196 0.00014
21 14.6 0.0465 0.00825 0.00236 0.000905 0.000427 0.000236 0.000147 0.000101
22 15.2 0.0444 0.00755 0.00207 0.000764 0.000347 0.000185 0.000112 7.46e-05
23 15.9 0.0425 0.00694 0.00183 0.000649 0.000285 0.000147 8.56e-05 5.55e-05
24 16.6 0.0408 0.00639 0.00162 0.000555 0.000235 0.000117 6.63e-05 4.17e-05
25 17.3 0.0392 0.00591 0.00145 0.000478 0.000196 9.44e-05 5.18e-05 3.16e-05
26 18 0.0377 0.00548 0.00129 0.000413 0.000164 7.66e-05 4.08e-05 2.42e-05
27 18.7 0.0364 0.0051 0.00116 0.000359 0.000138 6.26e-05 3.24e-05 1.87e-05
28 19.4 0.0351 0.00475 0.00105 0.000314 0.000117 5.15e-05 2.59e-05 1.46e-05
29 20.1 0.0339 0.00444 0.000949 0.000276 9.96e-05 4.26e-05 2.09e-05 1.14e-05
30 20.8 0.0328 0.00416 0.000862 0.000243 8.53e-05 3.55e-05 1.69e-05 9.01e-06
31 21.5 0.0317 0.0039 0.000785 0.000215 7.33e-05 2.97e-05 1.38e-05 7.16e-06
32 22.2 0.0308 0.00367 0.000717 0.000191 6.33e-05 2.5e-05 1.13e-05 5.73e-06

m/n k k=9 k=10 k=11 k=12 k=13 k=14 k=15 k=16
11 7.62 0.00531
12 8.32 0.00317 0.00334
13 9.01 0.00194 0.00198 0.0021
14 9.7 0.00121 0.0012 0.00124
15 10.4 0.000775 0.000744 0.000747 0.000778
16 11.1 0.000505 0.00047 0.000459 0.000466 0.000488
17 11.8 0.000335 0.000302 0.000287 0.000284 0.000291
18 12.5 0.000226 0.000198 0.000183 0.000176 0.000176 0.000182
19 13.2 0.000155 0.000132 0.000118 0.000111 0.000109 0.00011 0.000114
20 13.9 0.000108 8.89e-05 7.77e-05 7.12e-05 6.79e-05 6.71e-05 6.84e-05
21 14.6 7.59e-05 6.09e-05 5.18e-05 4.63e-05 4.31e-05 4.17e-05 4.16e-05 4.27e-05
22 15.2 5.42e-05 4.23e-05 3.5e-05 3.05e-05 2.78e-05 2.63e-05 2.57e-05 2.59e-05
23 15.9 3.92e-05 2.97e-05 2.4e-05 2.04e-05 1.81e-05 1.68e-05 1.61e-05 1.59e-05
24 16.6 2.86e-05 2.11e-05 1.66e-05 1.38e-05 1.2e-05 1.08e-05 1.02e-05 9.87e-06
25 17.3 2.11e-05 1.52e-05 1.16e-05 9.42e-06 8.01e-06 7.1e-06 6.54e-06 6.22e-06
26 18 1.57e-05 1.1e-05 8.23e-06 6.52e-06 5.42e-06 4.7e-06 4.24e-06 3.96e-06
27 18.7 1.18e-05 8.07e-06 5.89e-06 4.56e-06 3.7e-06 3.15e-06 2.79e-06 2.55e-06
28 19.4 8.96e-06 5.97e-06 4.25e-06 3.22e-06 2.56e-06 2.13e-06 1.85e-06 1.66e-06
29 20.1 6.85e-06 4.45e-06 3.1e-06 2.29e-06 1.79e-06 1.46e-06 1.24e-06 1.09e-06
30 20.8 5.28e-06 3.35e-06 2.28e-06 1.65e-06 1.26e-06 1.01e-06 8.39e-06 7.26e-06
31 21.5 4.1e-06 2.54e-06 1.69e-06 1.2e-06 8.93e-07 7e-07 5.73e-07 4.87e-07
32 22.2 3.2e-06 1.94e-06 1.26e-06 8.74e-07 6.4e-07 4.92e-07 3.95e-07 3.3e-07

m/n k k=17 k=18 k=19 k=20 k=21 k=22 k=23 k=24
22 15.2 2.67e-05
23 15.9 1.61e-05
24 16.6 9.84e-06 1e-05
25 17.3 6.08e-06 6.11e-06 6.27e-06
26 18 3.81e-06 3.76e-06 3.8e-06 3.92e-06
27 18.7 2.41e-06 2.34e-06 2.33e-06 2.37e-06
28 19.4 1.54e-06 1.47e-06 1.44e-06 1.44e-06 1.48e-06
29 20.1 9.96e-07 9.35e-07 9.01e-07 8.89e-07 8.96e-07 9.21e-07
30 20.8 6.5e-07 6e-07 5.69e-07 5.54e-07 5.5e-07 5.58e-07
31 21.5 4.29e-07 3.89e-07 3.63e-07 3.48e-07 3.41e-07 3.41e-07 3.48e-07
32 22.2 2.85e-07 2.55e-07 2.34e-07 2.21e-07 2.13e-07 2.1e-07 2.12e-07 2.17e-07

$T^{-1}{(m)}(1 + \displaystyle \frac{ln(kn/m)}{ln T^{-1}(m)} + O(\frac{1}{ln^2 T^{-1}(m)}))$

$Pr(max© \geq i) \leq m({^{nk}_i}){ \displaystyle \frac{1}{m^i}} \leq m({\displaystyle \frac{enk}{im}})^i$

$Pr(max© \geq i) \leq m({\displaystyle \frac{eln2}{i}})^i$

$Pr(max© \geq 16) \leq 1.37 * 10^{-15} * m$

### 布隆过滤器作为摘要

Bloom filters provide a straightforward mechanism to build summaries. A proxy builds a Bloom filter from the list of URLs of cached documents, and sends the bit array plus the specification of the hash functions to other proxies. When updating the summary, the proxy can either specify which bits in the bit array are flipped, or send the whole array, whichever is smaller.


Each proxy maintains a local copy of the bloom filter, and updates it as documents are added to and replaced from the cache. As explained, to update the local filter, a proxy maintains an array of counters, each counter remembering the number of times the corresponding bit is set to 1. When a document is added into the cache, the counters for the corresponding bits are incremented; when it is deleted from the cache, the counters are decremented. When a counter increases from 0 to 1 or drops from 1 to 0, the corresponding bit is set to 1 or 0, and a record is added to the list remembering the updates.


The advantage of Bloom filters is that they provide a tradeoff between the memory requirement and the false positive ratio (which induces false hits). Thus, if proxies want to devote less memory to the summaries, they can do so at a slight increase of inter-proxy traffic.


We experimented with three configurations for Bloom filter based summaries: the number of bits being 8, 16, and 32 times the average number of documents in the cache (the ratio is also called a "load factor’’). The average number of documents is calculated by dividing the cache size by 8K (the average document size). All three configurations use four hash functions. The number of hash functions is not the optimal choice for each configuration, but suffices to demonstrate the performance of Bloom filters. The hash functions are built by first calculating the MD5 signature [ 38 ] of the URL, which yields 128 bits, then dividing the 128 bits into four 32-bit word, and finally taking the modular of each 32-bit word by the table size m . MD5 is a cryptographic message digest algorithm that hashes arbitrary length strings to 128 bits [ 38 ]. We select it because of its well-known properties and relatively fast implementation.

The performance of these summary representations, the exact-directory approach, and the server-name approach are shown in Figures 5 through 9 . In Figure  5 we show the total cache hit ratios and in Figure 6 we show the false hit ratios. Note that the y-axis in Figure  6 is in log scale. The Bloom filter based summaries have virtually the same cache hit ratio as the exact-directory approach, and have slightly higher false hit ratio when the bit array is small. Server-name has a much higher false hit ratio. It has a higher cache hit ratio, probably because its many false hits help to avoid false misses.

Figure 7 shows the total number of inter-proxy network messages, including the number of summary updates and the number of query messages (which includes remote cache hits, false hits and remote stale hits). Note that the y-axis in Figure  7 is in log scale. For comparison we also list the number of messages incurred by ICP in each trace. All messages are assumed to be uni-cast messages. The figure normalizes the number of messages by the number of HTTP requests in each trace. Both exact-directory and Bloom filter based summaries perform well, and server-name and ICP generate many more messages. For Bloom filters, there is a tradeoff between bit array size and the number of messages, as expected. However, once the false hit ratio is small enough, false hits are no longer a dominant contributor to inter-proxy messages. Rather, remote cache hits and remote stale hits become dominant. Thus, the difference in terms of network messages between load factor 16 and load factor 32 is small. Compared to ICP, Bloom filter based summaries reduce the number of messages by a factor of 25 to 60.

Figure 8 shows the estimated total size of inter-proxy network messages in bytes. We estimate the size because update messages tend to be larger than query messages. The average size of query messages in both ICP and other approaches is assumed to be 20 bytes of header and 50 bytes of average URL. The size of summary updates in exact-directory and server-name is assumed to be 20 bytes of header and 16 bytes per change. The size of summary updates in Bloom filter based summaries is estimated at 32 bytes of header (see Section 6 ) plus 4 bytes per bit-flip. The results show that in terms of message bytes, Bloom filter based summaries improves over ICP by 55% to 64%. In other words, summary cache uses occasional burst of large messages to avoid continuous stream of small messages. Looking at the CPU overhead and network interface packets in Tables  2 ,   6 and 7 (in which SC-ICP stands for the summary cache approach), we can see that it is a good tradeoff.

Finally, Figure 9 shows the memory per proxy of the summary cache approaches, in terms of percentage of cache size. The three Bloom filter configurations consume much less memory than exact-directory, and yet perform similarly to it in all other aspects. The Bloom filter summary at the load factor of 8 has a similar or less memory requirement to the server-name approach, and much fewer false hits and network messages. Since the DEC, UCB, UPisa, NLANR and Questnet traces have 16, 8, 8, 4, and 12 proxies respectively, the figure shows that the memory requirement grows linearly with the number of proxies.

Considering all the results, we see that Bloom filter summaries provide the best performance in terms of low network overhead and low memory requirements. This approach is simple and easy to implement. In addition to MD5, other faster hashing methods are available, for instance hash functions can be based on polynomial arithmetic as in Rabin’s fingerprinting method (See [ 42 , 7 ]), or a simple hash function (e.g. [ 22 , p. 48]) can be used to generate, say 32 bits, and further bits can be obtained by taking random linear transformations of these 32 bits viewed as an integer. A disadvantage is that these faster functions are efficiently invertible (that is, one can easily build an URL that hashes to a particular location), a fact that might be used by malicious users to nefarious purposes.

### 推荐配置

Combining the above results, we recommend the following configuration for the summary cache approach. The update threshold should be between 1% and 10% to avoid significant reduction of total cache hit ratio. If a time-based update approach is chosen, the time interval should be chosen such that the percentage of new documents is between 1% and 10%. The proxy can either broadcast the changes (or the entire bit array if it is smaller), or let other proxies fetch the updates from it. The summary should be in the form of a Bloom filter. A load factor between 8 and 16 works well, though proxies can lower or raise it depending on their memory and network traffic concerns. Based on the load factor, four or more hash functions should be used. The data provided here and in [ 16 ] can be used as references in making the decisions. For hash functions, we recommend taking disjoint groups of bits from the 128-bit MD5 signature of the URL. If more bits are needed, one can calculate the MD5 signature of the URL concatenated with itself. In practice, the computational overhead of MD5 is negligible compared with the user and system CPU overhead incurred by caching documents (see Section  7 ).

### 可扩展性

Although our simulations are done for 4 to 16 proxies, we can easily extrapolate the results. For example, assume that 100 proxies each with 8GB of cache would like to cooperate. Each proxy stores on average about 1M Web pages. The Bloom filter memory needed to represent 1M pages is 2MB at load factor 16. Each proxy needs about 200 MB to represent all the summaries plus another 1 MB to represent its own counters. The inter-proxy messages consist of update messages, false hits, remote cache hits and remote stale hits. The threshold of 1% corresponds to 10K requests between updates, each update consisting of 99 messages, and the number of update messages per request is less than 0.01. The false hit ratios are around 4.7% for the load factor of 16 with 10 hash functions. (The probability of a false positive is less than 0.00047 for each summary, but there are 100 of them.) Thus, not counting the messages introduced by remote cache hits and remote stale hits (which are relatively stable across the number of proxies), the overhead introduced by the protocol is under 0.06 messages per request for 100 proxies. Of these messages, only the update message is large, on the order of several hundreds KB. Fortunately, update messages can be transferred via a non-reliable multicast scheme. Our simulations predict that, while keeping the overhead low, this scheme reduces the total hit ratio by less than 2% compared to the theoretical hit ratio of ICP.


Though none of the traces are large enough to enable meaningful simulation of 100 proxies, we have performed simulations with larger number of proxies and the results verify these back of the envelope’’ calculations. Thus, we are confident that Summary Cache scales well.


## 摘要缓存增强型ICP的实现

Based on the simulation results, we propose the following Summary-Cache Enhanced Internet Cache Protocol as an optimization of ICP. The protocol has been implemented in a prototype built on top of Squid 1.1.14 and the prototype is publicly available [ 15 ]. A variant of our approach called Cache Digest is also implemented in Squid 1.2b20 [ 44 ].

### 协议书

The design of our protocol is geared toward small delay thresholds. Thus, it assumes that summaries are updated via sending the differences. If the delay threshold is large, then it is more economical to send the entire bit array; this approach is adopted in the Cache Digest prototype in Squid 1.2b20 [ 44 ].

We added a new opcode in ICP version 2 [ 48 ],

ICP_OP_DIRUPDATE (= 20), which stands for directory update messages. In an update message, an additional header follows the regular ICP header and consists of: 16 bits of Function_Num, 16 bits of Function_Bits, 32 bits of BitArray_Size_InBits, and 32 bits of Number_of_Updates. The header completely specifies the hashing functions used to probe the filter. There are Function_Num of hashing functions. The functions are calculated by first taking bits 0 to M-1, M to 2M-1, 2M to 3M-1, etc. out of the MD5 signature [ 38 , 22 ] of the URL, where M is Function_Bits, and then modular the bits by BitArray_Size_InBits. If 128 bits are not enough, more bits are generated by computing the MD5 signature of the URL concatenated with itself.

ICP_OP_DIRUPDATE（= 20），代表目录更新消息。 在更新消息中，常规ICP头后面有一个附加头，该头包括：16位的Function_Num，16位的Function_Bits，32位的BitArray_Size_InBits和32位的Number_of_Updates。 标头完全指定用于探测过滤器的哈希函数。 有散列函数的Function_Num。 通过首先从URL的MD5签名[38,22]中取出位0到M-1，M到2M-1、2M到3M-1等来计算函数，其中M是Function_Bits，然后进行模块化 位由BitArray_Size_InBits决定。 如果128位不够用，则通过计算与其自身连接的URL的MD5签名来生成更多位。

The header is followed by a list of 32-bit integers. The most significant bit in an integer specifies whether the bit should be set to 0 or 1, and the rest of the bits specify the index of the bit that needs to be changed. The design is due to the concern that if the message specifies only which bits should be flipped, loss of previous update messages would have cascading effects. The design enables the messages to be sent via a unreliable multicast protocol. Furthermore, every update message carries the header, which specifies the hash functions, so that receivers can verify the information. The design limits the hash table size to be less than 2 billion, which for the time being is large enough.


### 原型实现

We modified the Squid 1.1.4 software to implement the above protocol. An additional bit array is added to the data structure for each neighbor. The structure is initialized when the first summary update message is received from the neighbor. The proxy also allocates an array of byte counters for maintaining the local copy of the bloom filter, and an integer array to remember the filter changes.


The current prototype sends the update messages via UDP, since ICP is built on top of UDP. A variant of the design would be to send the messages via TCP or multicast. Due to the size of these messages, it is perhaps better to send them via TCP or multicast. Furthermore, since the collection of cooperating proxies is relatively static, the proxies can just maintain a permanent TCP connection with each other to exchange update messages. Unfortunately, the implementation of ICP in Squid is on top of UDP only. Thus, the prototype deviates from the recommendation in Section 5.5 and sends updates whenever there are enough changes to fill an IP packet. The implementation further leverages Squid’s built-in support to detect failure and recovery of neighbor proxies, and reinitializes a failed neighbor’s bit array when it recovers.

## 实验

Table 6:Performance of ICP and Summary-Cache for UPisa trace in Experiment 2. Numbers in the parenthesis show the variance of the measurement among three experiments.

Exp Hit Ratio Client Latency User CPU System CPU UDP Traffic TCP Traffic Total Packets
no ICP 16.94 6.22(0.4%) 81.72(0.1%) 115.63(0.1%) 4718(1%) 242K(0.1%) 259K(0.1%)
ICP 19.3 6.31(0.5%) 116.81(0.1%) 137.12(0.1%) 72761(0%) 245K(0.1%) 325K(0.2%)
Overhead 1.42% 43% 19% 1400% 1% 25%
SC-ICP 19.0 6.07 (0.1%) 91.53(0.4%) 121.75(0.5%) 5765(2%) 244K(0.1%) 262K(0.1%)
Overhead -2.4% 12% 5% 22% 1% 1%

Table 7:Performance of ICP and Summary-Cache for UPisa trace in Experiment 3.

Exp Hit Ratio Client Latency User CPU System CPU UDP Traffic TCP Traffic Total Packets
no ICP 9.94 7.11 81.75 119.7 1608 248K 265K
ICP 17.9 7.22 121.5 146.4 75226 257K 343K
Overhead 1.6% 49% 22% 4577% 3.7% 29%
SC-ICP 16.2 6.80 90.4 126.5 4144 254K 274K
Overhead -4.3% 11% 5.7% 1.6 2.4% 3.2%

We run three experiments with the prototype. The first experiment repeats the test in Section 4 and the results are included in Table 2 in Section 4 , under the title SC-ICP.’’ The improved protocol reduces the UDP traffic by a factor of 50, and has network traffic, CPU times and client latencies similar to those of No-ICP.

Our second experiment and third experiment replay the first 24,000 requests from the UPisa trace. We use a collection of 80 client processes running on 4 workstations, and client processes on the same workstation connect to the same proxy server. In the second experiment, we replay the trace by having each client process emulate a set of real-life clients through issuing their Web requests. In the third experiment, we replay the trace by having the client processes issuing requests round-robin from the trace file, regardless of which real-life client each request comes from. The second experiment preserves the bounding between a client and its requests, and a client’s requests all go to the same proxy. However, it does not preserve the order among requests from different clients. The third experiment does not preserve the bounding between requests and clients, but do preserve the timing order among the requests. The proxies are more load-balanced in the third experiment than in the second experiment.


In both experiments, each request’s URL carries the size of the request in the trace file, and the server replies with the specified number of bytes. The rest of the configuration is similar to the experiments in Section 4 . Different from the synthetic benchmark, the trace contains a noticeable number of remote hits. The results from Experiment 2 are listed in Table  6 , and those from Experiment 3 are listed in Table  7 .

The results show that the enhanced ICP protocol reduces the network traffic and CPU overhead significantly, while only slightly decreasing the total hit ratio. The enhanced ICP protocol lowers the client latency slightly compared to the No-ICP case, even though it increases the CPU time by about 12%. The reduction in client latency is due to the remote cache hits. Separate experiments show that most of the CPU time increase is due to servicing remote hits, and the CPU time increase due to MD5 calculation is less than 5%. Though the experiments do not replay the trace faithfully, they do illustrate the performance of summary cache in practice.


Our results indicate that the summary-cache enhanced ICP solves the overhead problem of ICP, requires minimal changes, and enables scalable Web cache sharing over a wide-area network.


## 相关工作

Web caching is an active research area. There are many studies on Web client access characteristics [ 10 , 3 , 14 , 33 , 23 ], web caching algorithms [ 49 , 35 , 8 ] as well as Web cache consistency [ 28 , 31 , 34 , 13 ]. Our study does not address caching algorithms or cache consistency maintanence, but overlaps some of client traffic studies in our investigation of the benefits of Web cache sharing.

Web缓存是一个活跃的研究领域。 关于Web客户端访问特性[10,3,14,33,23]，Web缓存算法[49,35,8]以及Web缓存一致性[28,31,34,13]已有许多研究。 我们的研究不涉及缓存算法或缓存一致性维护，但是在我们对Web缓存共享的好处进行调查的过程中，与某些客户端流量研究重叠。

Recently, there have been a number of new cache sharing approaches proposed in the literature. The Cache Array Routing Protocol [ 46 ] divides URL-space among an array of loosely coupled proxy servers, and lets each proxy cache only the documents whose URLs are hashed to it. An advantage of the approach is that it eliminates duplicate copies of documents. However, it is not clear how well the approach performs for wide-area cache sharing, where proxies are distributed over a regional network. The Relais project [ 27 ] also proposes using local directories to find documents in other caches, and updating the directories asynchronously. The idea is similar to summary cache. However, the project does not seem to address the issue of memory demands. From the publications on Relais that we can find and read [ 4 ], it is also not clear to us whether the project addresses the issue of directory update frequencies. Proxies built out of tightly-coupled clustered workstations also use various hashing and partitioning approaches to utilize the memory and disks in the cluster [ 20 ], but the approaches are not appropriate in wide-area networks.

Our study is partially motivated by an existing proposal called directory server [ 21 ]. The approach uses a central server to keep track of the cache directories of all proxies, and all proxies query the server for cache hits in other proxies. The drawback of the approach is that the central server can easily become a bottleneck. The advantage is that little communication is needed between sibling proxies except for remote hits.

There have also been many studies on Web cache hierarchies and cache sharing. Hierarchical Web caching is first proposed in the Harvest project [ 26 , 12 ], which also introduces the ICP protocol. Currently, the Squid proxy server implements version 2 of the ICP protocol [ 48 ], upon which our Summary-Cached enhanced ICP is based. Adaptive Web caching [ 50 ] proposes a multicast-based adaptive caching infrastructure for document dissemination in the Web. In particular, the scheme seeks to position the documents at the right caches along the routes to the servers. Our study does not address the positioning issues. Rather, we note that our study is complimentary in the sense that the summary cache approach can be used as a mechanism for communicating caches’ contents.

Though we did not simulate the scenario, Summary-Cache enhanced ICP can be used between parent and child proxies. Hierarchical Web caching includes not only cooperation among neighboring (sibling) proxies, but also parent and child proxies. The difference between a sibling proxy and a parent proxy is that a proxy cannot ask a sibling proxy to fetch a document from the server, but can ask a parent proxy to do so. Though our simulations only involve the cooperation among sibling proxies, the summary-cache approach can be used to propagate information about the parent cache’s content to the child proxies, and eliminate the ICP queries from the child proxies to the parent. Our inspection of the Questnet traces shows that the child-to-parent ICP queries can be a significant portion (over 2/3) of the messages that the parent has to process.


In the operating system context, there have been a lot of studies on cooperative file caching [ 11 , 2 ] and the global memory system (GMS) [ 17 ]. The underlying assumption in these systems is that the high-speed local area networks are faster than disks, and workstations should use each other’s idle memory to cache file pages or virtual memory pages to avoid traffic to disks. In this aspect, the problem is quite different from Web cache sharing. On the other hand, in both context there is the issue of how tightly coordinated the caches should be. Most cooperative file caching and GMS systems try to emulate the global LRU replacement algorithm, sometimes also using hints in doing so [ 45 ]. It is interesting to note that we arrive at quite different conclusions on whether global replacement algorithm is necessary [ 17 ]. The reason is that in the OS context, the global replacement algorithm is used for stealing memory from idle workstations (i.e. load-balancing the caches), while in Web cache sharing, every proxy is busy all the time. Thus, while simple cache sharing performs poorly in the OS context, it suffices for Web proxy cache sharing as long as each proxy’s resource configuration is appropriate for its load. Finally, note that the technique of Bloom filter based summary cache is not restricted to the Web proxy caching context, but can be used where-ever the knowledge of other caches’ contents is beneficial, for example, in caching and load-balancing in clustered servers.

## 结论与未来工作

We propose the Summary-Cache enhanced ICP, a scalable wide-area Web cache sharing protocol. Using trace-driven simulations and measurements, we demonstrate the benefits of Web proxy cache sharing, illustrate the overhead of current cache sharing protocols, and show that the summary cache approach substantially reduces the overhead. We study two key aspects of this approach: the effects of delayed updates, and the succinct representation of summaries. Our solution, Bloom filter based summaries with update delay thresholds, has low demand on memory and bandwidth, and yet achieves a hit ratio similar to that of the original ICP protocol. In particular, trace-driven simulations show that, compared to ICP, the new protocol reduces the number of inter-proxy protocol messages by a factor of 25 to 60 , reduces the bandwidth consumption by over 50% , while incurring almost no degradation in the cache hit ratios. Simulation and analysis further demonstrate the scalability of the protocol.

We have built a prototype implementation in Squid 1.1.14. Synthetic and trace-replay experiments show that, in addition to the network traffic reduction, the new protocol reduces the CPU overhead between 75% to 95% and improves the client latency. The prototype implementation is publicly available [ 15 ].

Much future work remains. We plan to investigate the impact of the protocol on parent-child proxy cooperations, and the optimal hierarchy configuration for a given workload. We also plan to study the application of summary cache to various Web cache consistency protocols. Last, summary cache can be used in individual proxy implementation to speed up cache lookup, and we will quantify the effect through modifying a proxy implementation.


## 参考文献

[1] J. Almeida and P. Cao. (1997) Wisconsin proxy benchmark 1.0. [Online]. Available: http://www.cs.wisc.edu/~cao/wpbl.0.html

[2] T.E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli,and R. Y. Wang,“Serverless network file systems,” in Proc. 15th ACM Syrup. Operating Syst. Principles,Dec. 1995.

[3] M. Arlitt, R. Friedrich, and T. Jin, “Performance evaluation of Web proxy cache replacement policies,” in Proc. Performance Tools’98, Lecture Notes in Computer Science, 1998, vol. 1469, pp. 193-206.

[4] M. Arlitt and C. Williamson, “Web server workload characterization,” in Proc. 1996 ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, May 1996.

[5] A. Baggio and G. Pierre. Oleron: Supporting information sharing in large-scale mobile environments, presented at ERSADS Workshop, Mar.[Online]. Available: http://www-sor.inria.fr/projects/relais/

[6] K. Beck. Tennessee cache box project, presented at 2nd Web Caching Workshop, Boulder, CO, June 1997. [Online]. Available: http://ircache.nlanr.net/Cache/Workshop97/

[7] B. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Commun. ACM, vol. 13, no. 7, pp. 422-426, July 1970.

[8] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and zipf-like distributions: Evidence and implications,” in Proc. IEEE INFOCOM, 1999.

[9] A. Z. Broder, “Some applications of Rabin’s fingerprinting method,” in Sequences 11: Methods in Communications, Security, and Computer Science, R. Capocelli, A. De Santis, and U. Vaccaro, Eds. New York, NY: Springer-Verlag, 1993, pp. 143-152.

[10] P. Can and S. Irani, “Cost-aware WWW proxy caching algorithms,” in Proc. 1997 USEN1X Symp. lnternet Technology and Systems, Dec. 1997, http://www.cs.wisc.edu/~cao/papers/gd-size.html , pp. 193-206.

[11] M. Crovella and A. Bestavros, “Self-similiarity in world wide web traffic: Evidence and possible causes,” in Proc. 1996 Sigmetrics Conf. Measurement and Modeling of Computer Systems, Philadelphia, PA, May 1996.

[12] C. R. Cunha, A. Bestavros, and M. E. Crovella, “Characteristics of WWW client-based traces,” Boston University, Boston, MA, Tech. Rep. BU-CS-96-010, Oct. 1995.

[13] M. D. Dahlin, R. Y. Wang, T. E. Anderson, and D. A. Patterson, “Cooperative caching: Using remote client memory to improve file system performance,” in Proc. 1st USENIX Symp. Operating Systems Design and Implementation, Nov. 1994, pp. 267-280.

[14] P. B. Danzig, R. S. Hall, and M. E Schwartz, “A case for caching file objects inside internetworks,” in Proc. S1GCOMM, 1993, pp. 239-248.

[15] E Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul, “Rate of change and other metrics: A live study of the world wide web,” in Proc. USENIX Symp. lnternet Technology and Systems, Dec. 1997.

[16] B.M. Duska, D. Marwood, and M. J. Feeley, “The measured access characteristics of world-wide-web client proxy caches,” in Proc. USENIX Symp. lnternet Technology and Systems, Dec. 1997.

[17] L. Fan, P. Cao, and J. Almeida. (1998, Feb.) A prototype implementation of summary-cache enhanced icp in Squid 1.1.14. [Online]. Available: http://www.cs.wisc.edu/~cao/sc-icp.html

[18] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: A scalable wide-area web cache sharing protocol,” in Proc. ACM SIGCOMM,

[19] --, (1998, Feb.) Summary cache: A scalable wide-area web cache sharing protocol. Tech. Rep. 1361, Computer Science Department, University of Wisconsin-Madison. [Online]. Available: http://www.cs.wisc.edu/-cao/papers/summarycache.html

[20] M.J. Feeley, W. E. Morgan, E H. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath, “Implementing global memory management in a workstation cluster,” in Proc. 15th ACM Symp. Operating Systems Principles, Dec. 1995.

[21] ICP working group. (1998). National Lab for Applied Network Research. [Online]. Available: http://ircache.nlanr.netICache/ICP/

[22] A. Fox, S. D. Gribhle, Y. Chawathe, E. A. Brewer, and P. Gauthier

[23] S. Gadde, M. Rabinovich, and J. Chase. Reduce, reuse, recycle: An approach to building large internet caches, presented at 6th Workshop Hot Topics in Operating Systems (HotOS VI), May 1997. [Online]. Available: http://www.research.att.com/-misha/

[24] G. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures. Reading, MA: Addison-Wesley, 1991.

[25] S. Gribble and E. Brewer, “System design issues for intemet middleware service: Deduction from a large client trace,” in Proc. USENIX Symp.Internet Technology and Systems, Dec. 1997.

[26] --, (1997, June) UCB home IP HTTP traces. [Online]. Available: http://www.cs.berkeley.edu/~gribble/traces/index.html

[27] C. Grimm. The dfn cache service in B-WiN. presented at 2nd Web Caching Workshop, Boulder, CO, June 1997. [Online]. Available: http://www-cache.dfn.de/CacheEN/

[28] The Harvest Group. (1994) Harvest Information Discovery and Access System. [Online]. Available: http://excalibur.usc.edu/

[29] The Relais Group. (1998) Relais: Cooperative caches for the world-wide web. [Online]. Available: http://www-sor.inria.fr/projects/relais/

[30] J. Gwertzman and M. Seltzer, “World-wide web cache consistency,” in Proc. 1996 USENIX Tech. Conf., San Diego, CA, Jan. 1996.

[31] IRCACHE. (1999, Mar.) Benchmarking Proxy Caches with Web Polygraph. [Online].Available: http://www.polygraph.ircache.net/slides/

[32] V. Jacobson. How to kill the internet, presented at SIGCOMM’95 Middleware Workshop, Aug. 1995. [Online]. Available: ftp://ftp.ee.lhl .gov/talks/vj -webflame.ps.Z

[33] J. Jung. Nation-wide caching project in korea, presented at 2nd Web Caching Workshop, Boulder, CO, June 1997. [Online]. Available: http://ircache.nlanr.net/Cache/Workshop97/

[34] B. Krishnamurthy and C. E. Ellis, “Study of piggyback cache validation for proxy caches in the world wide web,” in Proc. USENIX Symp. lnternet Technology and Systems, Dec. 1997.

[35] T. M. Kroeger, J. Mogul, and C. Maltzahn. (1996, Aug.) Digital’s web proxy traces. [Online]. Available: ftp://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html

[36] T.M. Kroeger, D. D. E. Long, and J. C. Mogul, “Exploring the bounds of web latency reduction from caching and prefetching,” in Proc. USEN1X Syrup. lnternet Technology and Systems, Dec. 1997.

[37] C. Liu and P. Cao, “Maintaining strong cache consistency for the world-wide web,” presented at the 17th Int. Conf. Distributed Computing Systems, May 1997.

[38] P. Lorenzetti, L. Rizzo, and L. Vicisano. (1996, Oct.) Replacement policies for a proxy cache. Universita di Pisa, Italy. [Online]. Available: http://www.iet.unipi.it/~luigi/caching.ps.gz

[39] C. Maltzahn, K. Richardson, and D. Grunwald, “Performance issues of enterprise level web proxies,” in Proc. 1997 ACM SIGMETRICS Int. Conf. Measurement and MOdeling of Computer Systems, June 1997, pp. 13-23.

[40] J. Marais and K. Bharat. Supporting cooperative and personal surfing with a desktop assistant, presented at ACM UIST’97. [Online]. Available: ftp://ftp.digital.com/pub/DEC/SRC/publications/marais/uist97paper.pdf .

[41] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of Applied Cryptography: CRC Press, 1997.

[42] J. C. Mogul, E Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for http.presented at ACM SIGCOMM’97. [Online]. Available: http://www.research.att.com/~douglis/

[43] National Lab of Applied Network Research. (1997, July) Sanitized Access Log.[Online].Available: ftp://ircache.nlanr.netlTraces/

[44] J. Pietsch. Caching in the Washington State k-20 network, presented at2nd Web Caching Workshop, Boulder, CO, June 1997. [Online]. Available: http:/lircache.nlanr.net/CachelWorkshop97/

[45] M. O. Rabin, “Fingerprinting by random polynomials,” Center for Research in Computing Technology, Harvard Univ., Tech. Rep. TR-15-81,1981.

[46] A. Rousskov. (1998, Apr.) Cache digest. [Online]. Available: http://squid.nlanr.net/Squid/CacheDigest/

[47] P. Sarkar and J. Hartman, "Efficient cooperative caching using hints,"in Proc. USENIX Conf. Operating System Design and Implementations,Oct. 1996.

[48] V. Valloppillil and K. W. Ross. (1997) Cache array routing protocol vl.0. [Online]. Available: http:l/ircache.nlanr.net/CachelICP/draftvinod-carp-v 1-02.tx

[49] D. Wessels and K. Claffy. (1998) Internet cache protocol (ICP) v.2. [Online]. Available: http://ds.internic.net/rfc/rfc2186.txt

[50] S. Williams, M. Abrams, C. R. Stanbridge, G. Abdulla, and E. A. Fox. Removal policies in network caches for world-wide web documents, presented at ACM SIGCOMM’96. [Online]. Available: http://ei.cs.vt.edu/~succeed/96sigcomm/

[51] L. Zhang, S. Floyd, and V. Jacobson. Adaptive web caching, presented at

2nd Web Caching Workshop, Boulder, CO, June 1997. [Online]. Available: http://ircache.nlanr.net/Cache/Workshop97/Papers/Floyd/floyd