The Power of Stats

One of the challenges of understanding large amounts of data is to characterize them using a few numbers that somehow reflect the whole. Statistics such as the minimum, maximum and the various kinds of averages tell you global properties of your data set. Sometimes they are enough to reveal information about individuals. This is why even databases that contain only statistical information about people are a privacy issue: enough statistical questions can reveal personal data.

Consider a simple game between a questioner, Quentin, and a responder, Rosalba. Quentin can ask only about global properties of a group of numbers, (for instance, "are they all whole numbers?", "are they distinct?", and "what are the statistical mean, median, minimum and maximum?"). Rosalba may refuse to answer although she must give the reason. Rosalba always tells the truth. Sometimes, she will volunteer information just for the fun of it.

Warm Up:


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Rosalba: "I have five integers, all distinct.
Quentin: "What is the minimum?"
Rosalba: "15."
Quentin: "What is the maximum?"
Rosalba: "I won't tell, because you would know everything."

What are the numbers?

Solution to Warm-Up:

Because the numbers are all distinct, the maximum could reveal everything only if it were 19. Then the collection consists of 15, 16, 17, 18 and 19. Okay, this one was easy, but the deductions will get more interesting. 

Before we go on, though, let me remind you of the definition of the mean and median. The mean of a collection of numbers is their sum divided by the number in the collection. For example, the mean of 20, 22, 22, 40 and 101 is 205/5 = 41. The median is the middle number in the sorted order, so 22 for this example. That is, the median is the middle in a sorted ordering of the values (our examples will always have an odd number of values).

Problems:

1.
Rosalba: "I have five integers that may or may not be distinct."
Quentin: "What is the minimum?"
Rosalba: "20."
Quentin: "Which of these would not allow me to infer all their values -- number that are distinct, mean, maximum or median?"
Rosalba: "Only the median."
Quentin: "Great. I know the numbers."

What are they?

2.
Rosalba: "I have seven integers that may or may not be distinct."
Quentin: "What is the minimum?"
Rosalba: "20."
Quentin: "Which of these are you willing to tell me (that is, which would not allow me to infer all their values): mean, median and maximum?"
Rosalba: "All of them."
Quentin: "Okay, what is the maximum?"
Rosalba: "21."
Quentin: "I know which of the mean and median you're willing to tell me now." 

Which? Why?

3. Rosalba: "Can you find some situation in which I would be happier to tell you the mean rather than the median?"
Quentin: "Could you give me a hint?"
Rosalba: "In an example I can think of, there are three numbers, two of which are distinct."

4. Rosalba: "Can you find some situation in which all of the minimum, maximum, mean and median are necessary and sufficient to find the identities of five numbers that are all integers?"

5.
Rosalba: "So far we've been playing games with just a few numbers. I've given you hints and you've been able to infer them all. But five numbers are not interesting. Let's try for more.

"Before we do that, let me define one new global property: the total distance to a point. Let us say we have the five numbers 10, 15, 20, 30 and 60. The total distance to a point-let's call the point the number 22 in this case--is the sum of (22-10), (22-15), (22-20), (30-22) and (60-22). Mathematically, the total distance to x is the sum of the absolute values of the differences between each number and x.

"Now we are ready. There are 17 numbers that are not all distinct. Their minimum is 30, their mean is 34 and their median is 35." 

Quentin: "What is their total distance to 35?"
Rosalba: "I won't tell you but the total distance to 35 is five less than the total distance to 38. Whoops, I shouldn't have told you that." 
Quentin laughing: "You're right. Now I know all the numbers."

What are they?

6. How would your answer to this question change if there were 1,701 numbers but otherwise the same information as in the previous question?

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American

Subscribe