rstats Archives - Mark Niemann-Ross

Party Buzz Kill: modifying data

So Steve (SQL), Marsha (C), Bob (Python), and I (R) are at this party. We have TOTALLY cleared the room, especially now that Steve and I are deep into a debate about saving native data objects to disk versus storing data in a database.

*Monica is a real person! She does consulting in Health Data Science*. I don’t know if she serves punch.

I see Monica enter from the kitchen, carrying a bowl full of punch. It’s an awkward task and the fruity, sticky liquid is sloshing on the floor. Monica does data science, so I’m hoping she’ll come to my assist. Sure enough, she places the punch bowl on the table and joins us. She’s about to say something when the front door swings open.

Guenter walks in; he just got off a plane from Germany, so he looks a bit jet-lagged. Since the room is filled with a bunch of people talking SQL, he assumes database debates are the theme of the party.

“I think I have already written an article in this context,” Guenter begins.

*Look – It’s Helen Wall. She’s real too!*

Before he can say anything more, Helen speaks up. “Perhaps talking about programming is an attempt to get everyone to leave the house at the end of the party so you can go to bed?” Where Helen appeared from is a mystery.

Monica listens for a minute, then interrupts the pointless debate between Steve and I. “People who are math aficionados” she says, “are a lot more comfortable generating datasets on-the-fly. People like me enjoy relying on the safety and reliability of importing a structured dataset we checked earlier!”

Steve is happy to hear someone is on his side. Steve thinks I’m a knucklehead. There are many people who agree.

“Sure, but there are advantages to not messing around with unnecessary overhead,” I say. “Let’s play with an example.”

I get out a new napkin and sketch out some R code…

Rain – Evapotranspiration = mm Water

“Eeee-VAP-oooo-TRANS-PURR-ation,” I savor the word as I release it into our conversation. I’m still at the party with Marsha and Bob. We’re trying to determine why anyone (such as me) would want to use R on their Raspberry Pi.

“Big word,” says Bob. “What’s it mean?”

“Water evaporation from the earth and transpiration from plants,” I respond. “It’s a sum of the water escaping from my irrigation system. Look it up on Wikipedia.”

Marsha interrupts grumpy Bob; “So – That means, um…desired amount of water – rainfall + evapotranspiration equals the amount of water your irrigation system needs to supply.”

“Precisely,” I agree. “Until I found out about evapotranspiration, I was unsure how to account for temperature. I knew hot days would require more water because of increased evaporation; but was stumped how to translate temperature into increased inches of necessary water.”

“Never heard of it,” says Bob.

“Me neither,” I agree. “Evapotranspiration is handy, but doesn’t show up in all weather forecasts. Open-Meteo makes it available.”

“Say you’ve got seven days worth of this miracle number,” says Bob. “What does the R code look like?”

Party Buzz Kill: Data Storage

I’m at this party where Bob and Marsha and I are discussing the best languages for programming a Raspberry Pi. Bob advocates for Python, Marsha is a devout student of C. I’m defending my use of R. After all, Raspberry Pi starts with R. We have chased all the other guests out of the room with our conversation.

“With R, I have all sorts of built-in data management,” I say. “Manipulating matrices is in R’s basic DNA.”

Steve wanders in from the other room and joins our conversation. “Matrices aren’t a proper data strategy. You should be using a database. You can run SQLite on a Raspberry Pi with hardly any effort.”

Bob and Marsha simultaneously turn to stare me down. They are curious about how I’m going to get around this supposition.

“Sure. SQL with R–in particular SQLite, would have been easy to implement,” I pontificate. “Just call up RSQLite, push a few buttons, and Bob’s Your Uncle.”

“And that’s not what you did?” Steve is incredulous.

“I store the R object on disk and pull it into memory when I need it.”

“What kind of knucklehead stores data as a file on disk?”

The Imperfection of Language

Human languages are notoriously ambiguous. Computer languages are notoriously un-ambiguous. Humans (mostly) are comfortable with uncertainty. Computers don’t even believe uncertainty is possible. It’s what led us to create un-ambiguous languages specifically for computers.

One morning, I shot an elephant in my pajamas. How he got in my pajamas I don’t know.”
Groucho Marx