One of the challenges of understanding large amounts of data is to characterize them using a few numbers that somehow reflect the whole. Statistics such as the minimum, maximum and the various kinds of averages tell you global properties of your data set. Sometimes they are enough to reveal information about individuals. This is why even databases that contain only statistical information about people are a privacy issue: enough statistical questions can reveal personal data.

Consider a simple game between a questioner, Quentin, and a responder, Rosalba. Quentin can ask only about global properties of a group of numbers, (for instance, "are they all whole numbers?", "are they distinct?", and "what are the statistical mean, median, minimum and maximum?"). Rosalba may refuse to answer although she must give the reason. Rosalba always tells the truth. Sometimes, she will volunteer information just for the fun of it.

Warm Up:
Rosalba: "I have five integers, all distinct.
Quentin: "What is the minimum?"
Rosalba: "15."
Quentin: "What is the maximum?"
Rosalba: "I won't tell, because you would know everything."

What are the numbers?

Solution to Warm-Up:
Because the numbers are all distinct, the maximum could reveal everything only if it were 19. Then the collection consists of 15, 16, 17, 18 and 19. Okay, this one was easy, but the deductions will get more interesting.

Before we go on, though, let me remind you of the definition of the mean and median. The mean of a collection of numbers is their sum divided by the number in the collection. For example, the mean of 20, 22, 22, 40 and 101 is 205/5 = 41. The median is the middle number in the sorted order, so 22 for this example. That is, the median is the middle in a sorted ordering of the values (our examples will always have an odd number of values).

Rosalba: "I have five integers that may or may not be distinct."
Quentin: "What is the minimum?"
Rosalba: "20."
Quentin: "Which of these would not allow me to infer all their values -- number that are distinct, mean, maximum or median?"
Rosalba: "Only the median."
Quentin: "Great. I know the numbers."

What are they?

Rosalba: "I have seven integers that may or may not be distinct."
Quentin: "What is the minimum?"
Rosalba: "20."
Quentin: "Which of these are you willing to tell me (that is, which would not allow me to infer all their values): mean, median and maximum?"
Rosalba: "All of them."
Quentin: "Okay, what is the maximum?"
Rosalba: "21."
Quentin: "I know which of the mean and median you're willing to tell me now."

Which? Why?

Rosalba: "Can you find some situation in which I would be happier to tell you the mean rather than the median?"
Quentin: "Could you give me a hint?"
Rosalba: "In an example I can think of, there are three numbers, two of which are distinct."

Rosalba: "Can you find some situation in which all of the minimum, maximum, mean and median are necessary and sufficient to find the identities of five numbers that are all integers?"

Rosalba: "So far we've been playing games with just a few numbers. I've given you hints and you've been able to infer them all. But five numbers are not interesting. Let's try for more.

"Before we do that, let me define one new global property: the total distance to a point. Let us say we have the five numbers 10, 15, 20, 30 and 60. The total distance to a point-let's call the point the number 22 in this case--is the sum of (22-10), (22-15), (22-20), (30-22) and (60-22). Mathematically, the total distance to x is the sum of the absolute values of the differences between each number and x.

"Now we are ready. There are 17 numbers that are not all distinct. Their minimum is 30, their mean is 34 and their median is 35."