httpcache
provides the HTTP verb functions
GET()
, PUT()
, PATCH()
,
POST()
, and DELETE()
, which are drop-in
replacements for those in the httr
package. GET()
responses are added to the local query
cache; PUT()
, PATCH()
, POST()
,
and DELETE()
requests trigger cache invalidation on the
associated resources. For APIs where a POST
requests is
used to send a command that returns content and doesn’t modify state
(and hence is semantically more of a GET
), you can use
cachedPOST()
, which writes to the cache and doesn’t
invalidate.
To take advantage of these cache-aware functions, all you need to do
is load httpcache
instead of httr
, or in
package development, import from httpcache
.
system.time(a <- GET("https://httpbin.org/get"))
## user system elapsed
## 0.031 0.014 3.993
system.time(b <- GET("https://httpbin.org/get"))
## user system elapsed
## 0 0 0
Notice how the second request returns instantly. It’s reading from cache—there’s no communication with a remote server, so no network latency and no server processing time. Remember: the fastest API request is the one you don’t have to make.
Reading from the query cache yields exactly the same response as if we had contacted the server. We can confirm:
identical(a, b)
## [1] TRUE
Query logging
How do we know, other than by the faster response, that we’re hitting cache and not making server requests?
When designing API clients in R, logging is an invaluable tool for
understanding and improving request patterns. As you build layers of
abstraction on top of the direct HTTP requests, it can be easy to make
inefficient or repetitive requests that degrade performance for your
users and impose unnecessary load on your servers. You can’t improve
what you can’t measure, and the logging tools included in
httpcache
can help you measure.
Let’s clear the cache and repeat that exercise, this time with
logging enabled. Use startLog()
to enable the request log.
startLog
takes a file or connection argument, which it
passes to cat
for log writing. The default, same as for
cat
, prints to the standard output—your display, in an
interactive session. (See ?cat
for details.)
clearCache()
startLog()
a <- GET("http://httpbin.org/get")
## 2023-04-07T09:49:42.647 HTTP GET http://httpbin.org/get 200 368 0 0.001 0.158 0.158 1.953 1.953
## 2023-04-07T09:49:42.648 CACHE SET http://httpbin.org/get
b <- GET("http://httpbin.org/get")
## 2023-04-07T09:49:42.649 CACHE HIT http://httpbin.org/get
Notice how the first request results in an “HTTP GET” and a “CACHE SET”, while the second one gets a “CACHE HIT” and does not touch “HTTP”. From this log output, we can conclude that the query cache is working.
Log analysis
You can also pass a file name to startLog
. This makes it
easier to read the log output back in as a data.frame
and
analyze it quantitatively.
Custom logging
You may want to send other events to the log, interspersed with your
HTTP requests, whether for their own sake or to see how work done in R
outside of the HTTP layer maps onto your server traffic. The function
logMessage()
writes to the connection specified by
startLog
, and it is available for general use. For error
logging, the halt()
function wraps stop
and
sends a message to the log (it also makes the awkwardly named
call.
argument to stop
default to
FALSE
for cleaner error messaging).
Cache invalidation
As the saying (or joke, depending on the version), cache invalidation is one of the two hard problems in computer science. The trouble with caching what the server serves is that the server is the source of truth, and if the state of data on the server changes, our local copy of the data is stale. In some applications and with some APIs, we have no idea when the server state changes, but in many cases, the source of change on the server is actions that we initiate ourselves. In these cases, a local query cache is more feasible, and cache invalidation more tractable.
httpcache
provides some functions to direct cache
invalidation. We’ve seen one already, clearCache()
, which
wipes the entire cache. Other functions give more surgical control.
dropOnly()
invalidates cache only for the specified URL.
dropPattern()
uses regular expression matching to
invalidate cache. dropCache()
is a convenience wrapper
around dropPattern
that invalidates cache for any resources
that start with the given URL.
Depending on the API with which you’re communicating, you may not
need to use those cache-invalidation functions directly, or you may need
them only infrequently. httpcache
was designed with RESTful
APIs in mind, particularly those that expose resources that contain
collections of entities that can be created, replaced, updated, and
deleted (“CRUD”)
with POST, PUT, PATCH, and DELETE, respectively. Consequently, these
four HTTP verb functions are built with default cache invalidation
actions: POST
invalidates cache only for the request URL
(dropOnly
), for the case where POST creates a new entity
appearing as a subresource; while PUT
, PATCH
,
and DELETE
drop cache for the request URL and everything
“below” it (dropCache
).
For example, if GET http://api.example/projects/
returns
a catalog of project entities, and POST to
http://api.example/projects/
creates a resource at
http://api.example/projects/new_id/
, we need to bust cache
for the project catalog on POST, but our cached responses for resources
such as http://api.example/projects/old_id/
should still be
valid. But, if we modify
http://api.example/projects/new_id/
, we should invalidate
cache for that resource and for other resources appearing as
subresources of it, such as
http://api.example/projects/new_id/users/
.
These verb functions in httpcache
(POST()
,
PUT()
, PATCH()
, and DELETE()
) all
take a drop
argument, which defaults as described above. To
override them, you can specify a different call other than
dropCache(url)
or dropOnly(url)
, or you can
pass drop = NULL
and call the cache-invalidation functions
directly outside of the request functions. Depending on your API and
your usage of it, however, httpcache
’s cache management may
just work for you with no additional effort.
Caching across sessions
The query cache you build up in one R session doesn’t have to end
with it. Use saveCache()
to write out the contents of your
cache to a .rds
file. Restore it later with
loadCache()
.