I Think Spolsky Missed a Detail About Starbucks Queueing

This week saw the publication of another Inc.com article by the inimitable Joel Spolsky, and as usual it's a fun, geeky read. His ranting analysis of the queueing and order taking procedures at Starbucks supplement a section about an instance of unfriendliness on the part of the staff, which I'll ignore in favour of concentrating on the (more interesting and less Godin-esque) part of the article on queueing procedures and order taking.

Read the original article [http://www.inc.com/magazine/20080801/how-hard-could-it-be-good-system-bad-system.html]; I'm only reproducing a small, relevant portion here:

"Her main job was to go down the line of people waiting to order and ask them what they wanted in advance of their arriving at the cash register. There, they would be asked to repeat their order before paying and finally joining the line of customers waiting for their drinks to appear.

This premature order taking did not appear to improve the store's productivity. The cashiers still had to take the same number of orders, wait for the customers to fiddle with their purses for the correct change, and so forth. The coffee producers -- known theatrically in the trade as baristas -- still had to make the same number of drinks. The biggest benefit of the procedure, I thought, was that the barista got started on a drink a few seconds earlier, so people got their orders filled a little bit faster, even though the overall rate of output for the store was the same.

A network engineer would say this was a situation of 'same bandwidth, lower latency' [...]"

I disagree!

Armchair Psychology

Firstly, I might have a bit of an issue with the claim of lowering perceived latency by reducing the gap between paying at the register and receiving your drink, because I believe I'd start measuring latency when first giving the order, not when paying. Let's ignore that though; Spolsky correctly assumes the early order-taking is useful in preventing customers from giving up and leaving when faced with a long line. This should be no surprise to anyone who's read Robert Cialdini's Influence: The Psychology of Persuasion [http://www.amazon.com/Influence-Psychology-Persuasion-Robert-Cialdini/dp/0688128165] Once we've expressed a choice, painted a mental picture of ourselves as buying a cup of Starbucks coffee this morning, our internal need for self-consistency will force us to rationalize staying, even in the face of long lines. Seriously, read this book, it's an eye opener.

The Benefit of Longer Queues

Ahem, sorry for the digression - back to queueing. Joel states:

"[...] while not even increasing the total number of Frappuccino Blended Coffees that could be produced per unit of time?"

Aha, that's the thing! What's missing here is that the goal isn't to increase the Frappuccino throughput, it's to increase the total throughput across all drinks, and it's absolutely crucial to realize that the drinks are different, and have different preparation times. I think the point of the pre-order taking is to increase the job queue length, and that increasing total throughput by doing this is actually an achievable goal.

Your typical Starbucks counter layout looks something like this in Canada (simplified):


Section A: 2 Espresso machines, steam wands for frothing milk, grinders.
Section B: 3 Thermos canisters of brewed drip coffee: light, medium, dark roast.
Section C: Cash registers, in front of which customers line up.
Job Queue: {DripDarkRoast, DripDarkRoast, Cappuccino, Latte, Cappuccino}

There is no need to view the job queue as FIFO, in fact, it's intuitively obvious that reordering jobs depending on what's available at any moment (out of steamed milk - need to make more, out of ground coffee - need to grind more, etc) should improve the throughput somewhat. Now, assuming you have enough baristas, you can make 2 espresso-based drinks and 1 brewed coffee simultaneously. Maximum throughput will be achieved when all 3 execution units are kept fully busy, which means your drink pipeline should have at least two espresso drinks and one brewed coffee in it to guarantee full utilization after popping the next job off the queue, AND your staff must be allowed to reorder as they see fit. Practically speaking, since pouring a cup of drip coffee takes less time than paying for it, you should have much more than a single drip coffee job in the queue.

So the expediter can indeed cause the throughput to rise - It's clear to me that the job of the expediter is to increase the pipeline length to maximize the chance that all execution units are kept busy. One might argue that this can be done without an expediter, by having the cashiers simply take more orders and queueing them up, but that seems like it would be too much of a cognitive load: in addition to payment processing, they'd be forced to be perfectly aware of what's in the queue, who's busy, which machines are free, etc. The constant context-switching between smiling to customers, counting change, and checking the state of the queue would slow them down, which is why I think it makes sense to offload all of this decision-making to the expediter, who's then free to apply whatever algorithm she chooses in deciding whether to take more orders or pause.

The Smugness Corner

I frequently buy my morning coffee from Bridgehead Coffee in Ottawa, where the barista often sees me standing in line and starts making my usual drink before I get to the cashier to place my order, which results in incredibly low perceived latency. Go Bridgehead.


Performance investigation of Java's select() in Windows

Java has had select()-based I/O since (I believe) 1.4, through java.nio.channels.Selector and the supporting API. While network I/O over non-blocking SocketChannels has been working fine in one of our [Solace Systems'] messaging software platforms for a long time, and at a more than acceptable throughput, I had never really attempted to precisely measure typical timings of Selector.select() and SocketChannel.write(). That is, until this week, when a coworker coding against Winsock2 wanted to compare his timings against what we got doing similar work in Java. What I found was quite surprising...

Test Design

To get an idea of timings, I quickly bashed out your basic "Hello World" of non-blocking-SocketChannel-using applications (which we'll call the client side), that simply streams data as fast as possible to a netcat (nc) instance in listen mode (the server side). In a Java program, we connect to a remote listening port that was created like so:

nc -kl $PORT > /dev/null

Then we register an interest set of OP_WRITE on that channel, log the time offset (System.nanoTime), and select(). Whenever this returns, we log the time and attempt to write an X kilobyte buffer to the socket, then select again, etc. We also log timings for time spent in write() and the number of bytes written in each call to write(). We'll retry this test for several values of X, getting a sense of how much data is copied from our input buffer to the socket's output buffer on each call to write(), and how long it takes select() to return, indicating space in the socket's send buffer.


First, here's the average time (over ~200 or so writes) spent in select():

1K 396011
2K 654
5K 846
10K 1271
100K 9332

First surprising result: The distribution is pretty much what you'd expect for a 100 mbit network, except for the 1K datapoint, which should just have made you spray coffee on your monitor. The 1K writes start off very fast for the first few samples (~500us), then hit a wall and only get woken up every 500000us, yielding a very, very slow transfer rate (2 KB/s). I initially thought this was due to Nagle preventing a small buffer to be sent before a timeout expired, but setting TCP_NODELAY on the socket had no effect on this behaviour. I can confirm using a packet dump that the server end immediately acks every packet we send to it, so it's not a question of the local TCP send window getting full.

The second weird result is that on Windows, whenever you call SocketChannel.write(ByteBuffer), THE ENTIRE BUFFER GETS COPIED OFF AND REPORTED AS WRITTEN. You'd expect it to write only as many bytes as it can until filling up the local TCP send buffer (sized at SO_SNDBUF, which defaults at 8 KB, as we all know), then return that number, leaving the rest of your input buffer to be copied out on the next call to write(). In fact, that's my understanding of the Sun documentation (emphasis mine):

Writes a sequence of bytes to this channel from the given buffer.

An attempt is made to write up to r bytes to the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.

Suppose that a byte sequence of length n is written, where 0 <= n <= r. This byte sequence will be transferred from the buffer starting at index p, where p is the buffer's position at the moment this method is invoked; the index of the last byte written will be p + n - 1. Upon return the buffer's position will be equal to p + n; its limit will not have changed.

Unless otherwise specified, a write operation will return only after writing all of the r requested bytes. Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all. A socket channel in non-blocking mode, for example, cannot write any more bytes than are free in the socket's output buffer.

Open Questions (Mystery!)

So, I'm left with two big questions:

1. What's going on with the 1K writes? I tried TCP_NODELAY on that socket (Nagle's algorithm being the obvious culprit when small writes have huge latency), with no change: select() only wakes up once per 500ms. Also, it happens consistently on every single select. Since the local SO_SNDBUF is 8K, even if there was something fishy going on around that 500ms pause in select(), shouldn't you only get blocked for the full 500ms once per 8 writes? I've never seen this happen in a real-world production app though, so I'm willing to chalk it up to a quirk in my simplistic test code.

2. Isn't it a bit strange that write() returns immediately and always reports writing the full buffer under Windows, even if you pass in a 100MB ByteBuffer to be output with an SO_SNDBUF of only 8K? On 2 UNIX systems I tried it on, it still wrote much, much more than the value of SO_SNDBUF, but the results were all over the place, they didn't always match the size of the input array (as I'd expect).


Pimpin' thread dump utility class

When you're looking at your source and wondering "how the hell did I get here? Am I on the Swing worker thread, or...", there's a few things you can do to make finding the answer easier.

You've already looked at the call hierarchy with Ctrl+Alt+H, and you can't figure it out. One approach is to just slap down a breakpoint and restart under the Eclipse debugger, but if you put the breakpoint in a method that's called commonly, and you're instead looking for the uncommon hit, it's going to be really annoying to have the debugger jump up every three seconds and click "nope, nope, next, etc" while looking at the call stack.


So here's the second option: good old printf debugging. Now there's no one-liner (that I know of) in Java to just dump the current stack (You could instantiate a Throwable and tell it to print its stack in like 2 lines, but how ugly is that?), so here's a little utility class (ThreadUtil.java) you can grab:

import java.io.PrintStream;

* Debugging utility class for printing current thread's stack trace.
* To print to STDOUT, just call {@link #printMyStackTrace()}. If using a
* logging framework, instead call {@link #getMyStackTrace()} and log the
* result.
* @author Jean-Philippe Daigle
public class ThreadUtil {

private ThreadUtil() {

* Prints current stack to System.out.
public static void printMyStackTrace() {

* Prints current stack to the specified PrintStream.
public static void printMyStackTrace(final PrintStream out) {

* Gets current stack trace as a String.
public static String getMyStackTrace() {
StackTraceElement[] ste_arr = dumpFilteredStack();
StringBuilder sb = new StringBuilder();
for (StackTraceElement stackTraceElement : ste_arr) {
sb.append("\t" + stackTraceElement + "\n");
return sb.toString();

private static StackTraceElement[] dumpFilteredStack() {
StackTraceElement[] ste = Thread.currentThread().getStackTrace();

* The first few elements in the stack will be in Thread.dumpThreads,
* and in the current class, so we need to skip that noise.
int i = 0;
for (i = 0; i < ste.length; i++) {
String cc = ste[i].getClassName();
if (!(cc.equals("java.lang.Thread")
|| cc.equals(ThreadUtil.class.getCanonicalName())))

StackTraceElement[] ste2 = new StackTraceElement[ste.length - i];
System.arraycopy(ste, i, ste2, 0, ste2.length);
return ste2;

private static String getHeader() {
final Thread ct = Thread.currentThread();
return String.format("Thread: \"%s\" %s id=%s, prio=%s:",
ct.isDaemon() ? "daemon " : "",


How to steal a car

I dropped off my car at the Honda dealership this morning to have them investigate the mysterious Check Engine light that came on yesterday. This post was inspired by what happened tonight when I swung by after work to pick up the car - I'm not chastising them for the security flaw, I mostly just find it charming and cute.

I was there about two minutes, and this is the conversation as I remember it, see if you can spot the tiny flaw:

[I enter the service area of the dealership and walk over to the clerk, then sit down in front of him. A car key is lying on the desk, sporting a little tag with my last name scrawled on it in big block letters: DAIGLE.]

- Clerk: Hi!

- JP: Hi, picking up my car...

- Name?

- Daigle

- Oh yeah, just closing that file now, here's your key.

- So what was wrong with it?

- Coolant temperature sensor failed.

- Huh.

- Yeah, it was under warranty, so you're good to go!

- Thanks, good night!

[I exit the dealership, key in hand, and drive off.]

Human beings are usually the weakest link in any security system :)