Party Buzz Kill: modifying data

So Steve (SQL), Marsha (C), Bob (Python), and I (R) are at this party. We have TOTALLY cleared the room, especially now that Steve and I are deep into a debate about saving native data objects to disk versus storing data in a database.

Monica is a real person! She does consulting in Health Data Science. I don’t know if she serves punch.

I see Monica enter from the kitchen, carrying a bowl full of punch. It’s an awkward task and the fruity, sticky liquid is sloshing on the floor. Monica does data science, so I’m hoping she’ll come to my assist. Sure enough, she places the punch bowl on the table and joins us. She’s about to say something when the front door swings open.

Guenter walks in; he just got off a plane from Germany, so he looks a bit jet-lagged. Since the room is filled with a bunch of people talking SQL, he assumes database debates are the theme of the party.

Guenter is a real person too.

“I think I have already written an article in this context,” Guenter begins.

Look – It’s Helen Wall. She’s real too!

Before he can say anything more, Helen speaks up. “Perhaps talking about programming is an attempt to get everyone to leave the house at the end of the party so you can go to bed?” Where Helen appeared from is a mystery.

Monica listens for a minute, then interrupts the pointless debate between Steve and I. “People who are math aficionados” she says, “are a lot more comfortable generating datasets on-the-fly. People like me enjoy relying on the safety and reliability of importing a structured dataset we checked earlier!”

Steve is happy to hear someone is on his side. Steve thinks I’m a knucklehead. There are many people who agree.

“Sure, but there are advantages to not messing around with unnecessary overhead,” I say. “Let’s play with an example.”

I get out a new napkin and sketch out some R code…

recentRainfall <- sum(waterByZone["rainfall", (yearDay - 1):(yearDay + 1)])

If I assume ROWID can be used in a similar fashion to yearDay, the equivalent SQL code might look something like this…

CREATE TABLE "waterByZone" (
	"recentRainfall"	INTEGER,
	"neededInFront"		INTEGER,
	"neededInRear"		INTEGER,
	"wateredInFront"	INTEGER,
	"wateredInRear"		INTEGER,
	"secondsWateredInFront"	INTEGER,
	"secondsWateredInRear"	INTEGER,
	"evapotranspiration"	INTEGER
);

SELECT SUM(rainfall)
FROM waterByZone
WHERE ROWID BETWEEN :start_day AND :end_day;

Steve gags at my SQL. I’m sloppy, but he needs to give me some leeway. After all, I’m working on a napkin with a dull pencil.

“So I’m not sure where there is much of an advantage between SQL and a data object, at least in this context.” I point out.

“You are correct,” Monica says. “The question is technically two sides of the same coin.”

Guenter picks up the napkin. “When it comes to manageable data sizes, I even save them as a CSV file for later use (I know, I know, …). Guenter shrugs his shoulders and continues. “When I “collect” data, the database system SQLite is actually my first choice on the Pi!”

Helen laughs, “I’m trying to imagine you adjusting the water levels for optimal tomato growth.”

It’s my chance to show off more code, and I take it, “I’m pretty happy with how the data object allows me to access fields. The R code to calculate how much water to put on zone one and zone two looks like this,” I say, and begin work on a new napkin…

howMuchToWater <- function(waterByZone, yearDay) {
  # get the sum rainfall for yesterday, today, and tomorrow
  recentRainfall <- sum(waterByZone["rainfall", (yearDay - 1):(yearDay + 1)])     
  recentEVOTRP <- sum(waterByZone["evapotranspiration", (yearDay - 1):(yearDay + 1)])

  # calculate needed water for front zone
  neededRain <- waterByZone["neededInFront", yearDay] - recentRainfall + recentEVOTRP
  waterByZone["wateredInFront", yearDay] <- ifelse(neededRain <= 0, 0, neededRain)

  # calculate needed water for rear zone
  neededRain <- waterByZone["neededInRear", yearDay] - recentRainfall + recentEVOTRP
  waterByZone["wateredInRear", yearDay] <- ifelse(neededRain <= 0, 0, neededRain)  

 return(waterByZone)
}
Bob isn’t a real person. Photo by Tima Miroshnichenko

Guenter looks at the napkin with all of our code scratchings, “I collect data on the Raspberry Pico W using MicroPython, send this data to the Raspberry Pi 4B via MQTT and analyze it on the Pi 4B using an R function (which runs in a loop) and save it in a SQLite database.” he says. “Then I access the database from my laptop via SSH and perform a post analysis with R. The whole thing is a mixture of MicroPython and R.”

At the mention of Python, Bob rejoins the conversation.

How about you?

How about some Tips & Tricks on Programming R?

Sign up to receive content in your inbox every month.

We don’t spam! Read our privacy policy for more info.

One thought on “Party Buzz Kill: modifying data”

  1. Look at that gorgeous function, Mark! It’s so easy peasy for you! I’m glad you included some comments so I could follow along. The way you throw around functions, who needs a database? Some of us are not so agile! I think I need another glass of that punch…πŸ₯ƒπŸ‰πŸ“πŸ‹

Leave a Reply

Your email address will not be published. Required fields are marked *