Stargazing Pictures

Last night I went stargazing with a few friends, and while the weather did conspire against us we did get a few good chances to see the stars through a friend's telescope. The photos through the telescope are no where near as good as the actual sights, but that's largely because we didn't bring any camera equipment. With time, practice, and proper gear, we'd likely have done a lot better. That said, w e also took some pictures just with our phones and the results were a lot more impressive than any of us expected.

A shot of stars through the lens of a telescope

This is the best shot I could get through the telescope eyepiece with my phone. The actual view was great, but the pictures: not so much.

A shot of a starfield and clouds through my iphone camera
A shot of the night sky with just my iPhone on a 10-second exposure.
Another shot of a starfield and clouds through my iphone camera
Turns out that the cameras on our phones are very, very good.

Step Counts And Goal-Setting

In yet more at-home-data-science, I recently decided to take a look at my step count history and see what insights I could gain from visualizing the data. I've been using Pedometer++ to track my daily step count since 2014 and the app conveniently provides a CSV export of all of your daily step totals as well as other related data like distance and floors climbed. I think the results of my investigation are pretty interesting and yield some important insights.

First off, I simply plotted the entire data set to see what it showed. There were three things I noticed immediately.

A chart generated by an R Script that plots my step count since 2014

  1. Lossy Data:
    You may notice that there are several holes in the data. This is because originally, I used Pedometer++ very infrequently. I didn't set a daily goal and I didn't open the app very often. This meant that — after a while — iOS stopped launching the app in the background to collect step data and at one point offloaded the app from my phone to free space.

  2. Two Very Tall Spikes:
    I love seeing immediately identifiable trends in data, and those two spikes intrigued me. The first is my hike along a section of the PCT and the other is my trip to Barcelona and Paris in 2019.

  3. A Sudden Section of Stability:
    This third section is probably the most interesting for our purposes here. During 2021 I decided to start taking my daily walks more seriously. I'd been going on at least one walk per day, but I finally decided to start setting a goal and trying to meet it every single day.

You may also notice the red lines at the bottom of the chart: these are any string of days during which I exceeded my goal (more on that soon). The brown line is the daily goal value itself.

Setting Attainable Goals

I've talked before about how I set arbitrary deadlines and goals for myself. Usually this is done for client projects or my own apps, but importantly it's a thing I do for personal goals too. In the post I wrote this:

I create artificial deadlines and goals, sometimes completely arbitrarily. Often times, I'll just pick a date on the calendar based on nothing but gut intuition, and then I change it later if necessary.

Back in 2021 I wanted to get more rigorous with my daily walks. I wanted to set a goal for myself and make sure I was meeting it as often as possible. This is important. Originally I had considered trying to meet the default in-app goal of 10,000 steps per day. This is a very common step goal for apps and services to suggest because it's a nice, big, round number. But I didn't do that. I went with a much more modest 6,000 steps as my goal.

The reason for this is simple. Like I said I wanted to meet this goal as often as possible — preferably every day. 10,000 steps is a lot to cram into a single day. I routinely surpass 10,000 but I am nowhere close to hitting it every day.

This goal-setting is important. Meeting a goal can be inspiring, but not meeting it is the opposite, and can be disincentivizing. With a goal too high it can be preferable to give up early rather than to try and fail. This runs counter to my goal: walk more. Giving up is bad, and so I wanted to minimize the amount of times I do it. 6,000 steps has been a good number for me because I routinely find myself around 1,000 steps short in the evening, which incentivizes me to go for a walk, which is what I want. And in a positive development, I ended up raising the goal in 2022 to 6,500 steps!

Additional Insights

What's perhaps most interesting to me is what happens when we zoom in on the 2021-2022 section of the data (after I set a daily step goal). Not only does the data become much more consistent because I checked the app daily, but it becomes incredibly regular. The 30-day rolling average (plotted in green) is very close to the daily goal, which implies that if we ignore the extreme days I'm pretty good at meeting my goals. The red streak line at the bottom also helps illustrate this.

A chart generated by an R Script that plots my step count since mid-2021

I also notice that my rolling average is down this year which is something I wasn't expecting! If it continues like this I will be missing my goal regularly, which is bad. Insights like this have helped motivate me to improve and stay on top of my walking schedule.

If there's one thing I can share from this little experiment, it's this: set arbitrary goals, then celebrate once you've hit them.

Fun With Math: Calculating Multiplicative Persistence

I mentioned recently that one of my Raspberry Pis is mounted under my desk and tracking the weather, and I've written other times about the other tasks this helpful little assistant handles for me. Today I'd like to discuss yet another thing that little RPi does all day — something I've been pretty excited about for some time.

But first we need to talk about a thing called Multiplicative Persistence.

What is that?

Put very simply, Multiplicative Persistence is a little number trick you can play with integers. Here's how it works:

  1. Take any number
  2. Multiple the digits of the number together
  3. If this number is greater than 9, do it again. Counting the number of times you do the process.
    467 -> 4 x 6 x 7 = 168 (1)
    168 -> 1 x 6 x 8 = 48  (2)
    48  -> 4 x 8     = 32  (3)
    32  -> 3 x 2     = 6   (4)
    

Result: 467 has a multiplicative persistence of 4

The goal is to find the smallest number with the greatest persistence value. The current largest persistence ever found is 11 and the smallest number is 277777788888899.

What's special about 277777788888899? - Numberphile

You can learn more about Multiplicative Persistence by watching this great video from Numberphile.

What does this have to do with my Raspberry Pi?

I thought it would be fun to write a little script to calculate the known values with the smallest persistences. There are lots of little tricks you can do to speed up your script (so that you don't have to check every single value), and it was a fun little evening project. Then in June I swapped out the script for a new version that used more of the RPi's cores to speed things up a bit.

The script has been running (nearly) continuously since February, and by my estimates it will reach the current largest persistence value by the end of the year.

The script sends me a push notification (via Pushover) whenever it reaches a new persistence value — which is very exciting. I can't wait to get the next notification because that will mean the script has fulfilled its purpose: finding the largest known value.

Next Steps

For no reason whatsoever, after the script reaches the current largest known value I'm going to turn it loose on some truly enormous numbers to see if it can find the next largest value.

Currently it is theorized that 11 is the largest persistence possible (among all base 10 integers), and according to Matt Parker (in the video) mathematicians have already checked every value with less than 233 digits1, but still there are a lot more numbers to check.

Why? Because it's fun.2

Check out the code here →

1 That's a number that looks like this:
17,944,722,797,467,451,413,885,553,670,907,289,754,932,820,893,589,746,032,750,117,948,680,440,041,708,054,016,996,924,802,613,696,647,178,385,842,833,715,379,727,704,254,519,961,954,721,643,715,078,484,056,283,131,636,661,502,157,729,434,338,946,533,866,675,014,605,168,434,933,154,729,236,329,910,303,053,227

2 Because of some local changes to where San Diego's power comes from, all this computation is powered by 100% clean and renewable energy, so I don't feel bad wasting this (admittedly minute) power for nonsense like this.

Whether To Monitor The Weather And More

Things have been awfully quiet here for the past few months. Lots of other side projects have been taking up my time lately and I haven't felt the need to blog much. I have been doing a lot of reading and writing though and I will hopefully write something about all that stuff soon. For now though, I thought I'd talk about a little side project I've had running for the last half-year.

I have a Raspberry Pi bolted to the bottom of my desk. It's name is Demin1 and it does all kinds of useful things for me. One of those things is collecting the current ambient temperature and atmospheric pressure in the room. Originally I intended to chart this data and see if it correlates with me having random headaches. I'd always heard that pressure changes can cause headaches, but I'd never tested the theory myself. Luckily (and unluckily) my headache sample-set this year is far too small to use to reasonably answer this question, but that doesn't mean that Demin's work is all for naught!

My Personal Assistant
Demin at work. That's the sensor hanging off the top

Using a mixture of R, Python, bash, and a SQLite database I've been recording, tracking, and publishing this data for the past several months on an personal website on my local network. But since the data is very cool looking, I figured I'd post the full chart here. I've also uploaded a gist of the code I use to collect the sensor measurements. The script has a small bash wrapper script that is run via cron job which collects new measurements every five minutes. I've also set up a CGI script (yeah really) that allows me to see the most recent measurements from any device on my local network.

My Weather History
The last several months of data. Notice the unseasonably cool July temperatures. Very interesting.

As I've said before, I love doing the odd bit of scripting to automate menial work. I could have just bought a small indoor weather station, but that wouldn't provide me with programatic access to either real-time or historical data. Plus putting this whole thing together was half the fun anyway.

1. Demin is named after the ship's first mate in one of my D&D games. He was a very good first mate and he's a very helpful little computer too.

The Simple Joy Of Learning To Play Piano

Back in January, I started to teach myself how to play piano. I'd played before when I was a kid — like many people do — but I was never very good and it didn't stick once I'd stopped taking lessons. I'd had a keyboard for years, stowed away at my parent's place, but I'd rarely ever used it. I had occasionally tried to pick up piano over the years, but just like when I was a kid: it didn't stick.

My paino setup

This time with the keyboard visible, and in the center of the room, I hoped things would be different. It's been almost four months now, and I am happy to report that I am still practicing and importantly: I'm getting better!

Luckily, I have some music theory under my belt, and I've played guitar, both solo and in bands for years, so I know my way around musically. That said, neither of those skills prepare you to actually play the piano. It helps to know where middle C is and how to make a chord, but neither of those skills help you contort your hands to actually play the notes and chords you want.

That said, I do have two tips that I'd like to share.

Set Simple Goals, Then Iterate

Back in January, I tried to start by simply practicing scales. You know? The thing everyone hates? Well, it turns out they're super useful.

I started with a simple 5-note scale. I played it over and over again: up the scale and back down as smoothly as possible, one hand at a time. Then I would move it up a key and repeat the process. Not all keys are beginner friendly, so I would often skip complicated keys and stick to the easy ones. Once I had that down I started adding in a few little flairs. Instead of just playing the scales I would keep the time and tempo, but work in a few extra notes or I would start at the root and work up, but on the way back down I'd end on the 5th instead of the root. Little changes.

When something is difficult it can be pretty demoralizing if you can't see yourself making progress. That's where small goals come in. Each time I'd sit down to practice (and after a short warm up) I would set a small goal for myself. Sometimes the goal would be so minor that it would hardly seem worthwhile, but I always tried to explicitly set a tangible goal.

It's a slow process, but I am getting measurably better and I can see the results each and every time I sit down to play.

Don't Put it Away

I learned how to play guitar in high school and while my teacher gave me lots of advice, one thing he said always stuck with me. It was advice about how to make sure you keep practicing.

Never put your guitar away, and keep it within reach. That way you can play it whenever you have even a little bit of downtime. Play it while your computer is loading, while you're waiting for a text message, or even while you're watching a video. You don't need to play a song. Even just strumming a chord, or picking a melody can help.

Those little moments of practice add up.

To this day my guitar is within arms reach of my desk, and I play it when I need a break, when I need to think, or even just when I'm bored.

I feel the same idea has helped me learn piano. The keyboard is in the middle of my living room. I have to walk past it to get water, and whenever I do I think about playing. Often times I will play before or after eating, even just for a few minutes.

The little moments really do add up.

Using Pushover For Super Simple Sysadmin Alerts

For those who don't know, Pushover is a really great tool that allows users to easily set up and send push notifications to a smartphone. The setup is super simple, and all you need is their app and a little scripting know-how.

I've used Pushover for years to help me monitor my apps and services, and over the years my uses for Pushover have evolved and grown more and more integral to my ability to keep track of the various apps I run. To me, Pushover has gone from a nice-to-have integration to an absolute necessity.

I use Pushover to alert me of all kinds of things. Just to give you an idea, here are a few examples of some of the things I currently use Pushover for:

  • Potential queue backups in Pine.blog
  • Reporting daily user signups for Nine9s
  • Alerts when critical background jobs fail
  • Alerts when nightly builds fail to deploy
  • Alerts when a manually-run, long-running job completes

Because Pushover is so easy to integrate with basically any codebase (and even one-off shell scripts) I use it all the time for everything from simple alerts to complex and critical reports.

One particular use I'd like to call out from that list above is the nightly build alerts. Adventurer's Codex has a test environment that we use to sanity check our code before a full deploy. We used to have the test environment redeploy after every single merged pull request, but that system proved incredibly fickle and error prone, so we switched to a simple nightly build. The issue with any automatic build system is that unless you have a detailed live dashboard of deployment statuses (which we do not) it's hard to know if/when a given build has finished deploying or if it encountered an error. That's where Pushover comes in.

Nightly Build and Deploy Script

This script runs as a cron job every night. It attempts to deploy the latest version of the application and if that fails it sends a notification to Pushover.

PUSHOVER_USER="xxxx"
PUSHOVER_KEY="xxx"
PUSHOVER_URL="https://api.pushover.net/1/messages.json"

TITLE="AC Nightly: Build Failed to Deploy"
MESSAGE="The latest build on Nightly has failed."

log() {
  echo "[$(date)] $@";
}

alert_admins() {
  curl -X POST $PUSHOVER_URL \
    -H "Content-Type: application/json" \
    -d "{\"title\": \"$TITLE\", \"message\": \"$MESSAGE\", \
        \"user\": \"$PUSHOVER_USER\", \"token\": \"$PUSHOVER_KEY\"}"
}

./docker-bootstrap.sh upgrade --env nightly
STATUS=$?

if [ $STATUS -eq 0 ]; then
  log "🚀 Build completed successfully!"
else
  log "Uh oh. There was an issue. Alert the admins!"
  alert_admins
fi

My nightly build script for Adventurer's Codex includes a section after the deployment has completed that checkes the status code of the deploy command and if it is not 0 (i.e. it failed) then it sends me a notification. Bam! Now, every morning that I don't get a notification, I know things are working as intended. If I ever wake up to a notification, then I know I have work to do.

What Happens in the Background is Ignored in the Background

Crucially, I use Pushover to alert me about problems with background tasks. Modern web apps include lots of always-running or periodic asynchronous behavior, and because — when they fail — they don't directly result in user feedback or a big, loud error page mistakes, bottlenecks, and bugs often go unnoticed or unaccounted for.

Pushover solves those issues. It's trivial to write code that checks for bad behavior or that catches difficult-to-reach-but-critical bugs and just send off a notification.

I used to use email for this sort of thing, and while email is still a good solution, the setup is actually more involved. Most VPSs aren't allowed to send emails directly anymore (due to concerns over spam) and configuring an email provider is just as much work if not slightly more work than using Pushover. In some cases email is more flexible and might be better for larger teams, but I almost always reach for Pushover these days instead of email.

It's just that good.

Crossing The Wording Threshold

A few years ago, I wrote a post about the number of words this blog contained. Well, that was then; this is now, and those counts have changed pretty drastically in the intervening time.

Using the same method as before, I can now report that this blog contains just over 100,000 words spread across 270 posts! A pretty significant achievement.

$ find archive/ -name "*.md"|xargs -I {} cat {} | wc -w
  101042

I also remade the previous graph for comparison.

A histogram of the binned words per post

It seems like I've written more 300-400 word posts than before which explains the smoothing out in the middle section of the chart.

For those keeping score at home, the new longest post is this one from 2017.

A Terrifying Realization

It occurred to me that my last post about this topic was back in 2017—5 years ago!—and that means I'm very quickly approaching my 10 year blogging anniversary.1 It's hard to believe it's been so long, but I guess it has.

The first post on this site was back in December of 2012, which is just months away. I'll save the reminiscing for the retrospective post, but for now I'll just admit that it's a lovely coincidence that I reached 100,000 words on my 10 year anniversary.

1 If you count my first two blogs (which are now gone from the web) it has already been ten years, but let's not do that.

That Time I Lost Control Of A Server

Good security hygiene is essential for software developers. The thing is: we tell ourselves that, but rarely ever do we actually experience the effects of bad security hygiene. While deterrence is the point of good hygiene, it's helpful to walk through the real world consequences of bad hygiene and not just talk about the theoretical side of things.

So let's talk about the time one of my servers was hijacked.

Disclaimer
Right off the bat I need to note that the server in this story was not connected to any product or service that my company, SkyRocket Software, runs. This was a personal toy server that had no connection with anything I make or sell.

The Story

I've run servers for lots of projects over the years. I manage servers for Pine.blog, Nine9.cloud, d20.photos, and Adventurer's Codex. I also run several servers for client projects, this blog, and for toy projects. One of those toy projects is a server for an annual holiday Minecraft extravaganza with some friends of mine.

Once long ago, I was going through the process of setting up a new Minecraft server (this was before Linode provided easy-to-deploy Minecraft servers). I wanted to set up the server before going out that evening, and so I was in a bit of a rush. Being in a rush, I didn't bother to set up the server according to Linode's excellent guide on Securing Your Server. Instead, I just set a short trivially guessable root password, logged in as root, and got to installing Minecraft.

About halfway through the process I needed to leave, so I disconnected from the server and went to dinner.

When I returned home that night, I found an email in my inbox from Linode telling me that my server had been forcibly shut down because they'd determined that it was being used to send spam emails and help DDOS another site.

I had been gone only about three hours, but it had taken less than one hour from when my server had been instantiated to when it was compromised. It had happened shortly after I'd logged out.

Immediately I felt terrible for falling victim to such a simple, brute-force attack, and the experience has been one of those that makes me appreciate the often painful security hoops we sometimes need to jump through as developers.

Lessons Learned

Having fallen prey to what was likely a simple brute-force attack on my root account, I promised myself that I would never again fall prey to such an attack. Now, whenever I set up a server I always set aside plenty of time to do so, use long and complex passwords, disable root logins over SSH, and follow that Linode guide I mentioned earlier. Other experiences, like those with spambots, have made me more cautious and careful about the functionality my sites expose and how they expose it (Pine.blog doesn't offer free blogging & image uploads for a reason).

Keychain can generate long passwords easily

Keychain can generate long passwords easily, though I wish it could make even longer ones.

All in all, I count myself unlucky that I had to learn my lesson this way, but I count myself very lucky that I learned it by losing control of an unimportant and trivially replaceable server.

The internet is a very hostile place to those who aren't prepared for it. This is true on a societal level, and on a technical one. If you ever need a reminder of just how dangerous it is, try having Fail2Ban email you whenever it blocks a hostile IP address or watch your access logs for bots trying to break into your site (usually using maliciously crafted Wordpress/Drupal URLS). Things like that happen all day, every day; we just don't usually see them.

Anyway, that's the story. Hopefully everyone reading this takes it to heart so that this story remains but a cautionary tale and nothing more.

Hacks Can Be Good Code Too

Writing code is, like everything in life, all about making tradeoffs. Code can be quick to write, but at the same time unreadable; it can be fast, but hard to maintain; and it can be flexible, but overly complex. Each of these factors are worth considering when writing Good Code. Complicating this is the fact that: what constitutes Good Code in one situation may not be ideal in another.

Good Code is not universally so.

It is incredibly difficult to explain why one set of tradeoffs are worth pursuing in one case but not in another, and often times reasonable people will disagree on the value of certain tradeoffs over others. Perhaps a snippet of hacky string parsing is good in one place, but not in another. Often times, the most significant cost of solving a problem "The Right Way" is time.

When deciding whether to do something The Right Way or to cheat and simply hack something together, I often try to consider the exposure the given code will have. Consider these questions:

  • Do other systems touch this code?
  • How many developers will need to interact with it over time?
  • How much work would be involved in building out the correct approach?
  • How much work would be involved in building out the bad approach?
  • How valuable is the intended feature?
  • How much additional maintenance does the bad solution require?

Each of these answers helps me decide what kind of code I should write. These questions neglect multiple other factors (e.g. performance, readability), but they are a good starting point.

In a recent example I needed to modify the blog engine that powers this site as well as a few others. I wanted a simple feature that would count the number of articles on the side as well as the total number of words in every blog post, and display those values on the home page. As I've said before the blog engine for this site is very old, and has been rewritten several times. It's well beyond needing a massive rewrite, but that's not something I really want to do right now.

The blog engine is written in Python, provides a command-line interface, and uses Git Hooks both client and server-side to build and deploy itself.

I originally considered writing this feature in Python: counting the number of words in each article, adding a new context variable to the template rendering process, and then rendering the pages as normal. But that would require touching substantial pieces of the codebase (some of which I no longer understand). It would probably take me all evening to dive into the code, understand it, make the change, and test it. To be honest, this feature was not worth wasting an evening on. So I decided to just hack something.

As I said, I use Git to deploy the site. So I just added a new line to the HTML template:


<p>
    This site contains {+ARTICLE_COUNT+}
    different writings and {+WORD_COUNT+}
    total words. That's about {+PAGE_COUNT+}
    pages!
</p>

And then I added a new step to the pre-commit hook that runs after the template rendering process, but before the changes are committed and the site is deployed.


WPP=320
WORDS_N="$(find archive/ -name "*.md"|xargs -I {} cat {} | wc -w)"
WORDS=`printf "%'d" $WORDS_N`
ARTICLES=`printf "%'d" $(find archive/ -name "*.md" | wc -l)`
PAGES="$(( WORDS_N / WPP ))"

TMP_HOME=`mktemp`
cp ./index.html $TMP_HOME
cat $TMP_HOME |
    sed "s/{+ARTICLE_COUNT+}/$ARTICLES/" |
    sed "s/{+PAGE_COUNT+}/$PAGES/" |
    sed "s/{+WORD_COUNT+}/$WORDS/" > ./index.html

Let's check in and see how this hack fit my criteria above:

Do other systems touch this code? No
# of Developers? 1
Time for Correct Approach? 2-3 hours
Time for Bad Approach? 10 minutes
How Valuable is the Feature? Very
Additional Maintenance Burden? Not much

Is this elegant: absolutely not. Did it take basically zero time? Yes. Have I thought about it since? Not until writing this post. Would I have done this on a team project or a commercial product? Absolutely not. It's a feature for my personal blog engine and a feature that is specific to one particular low-value site that I run.

In this case, a hack is an example of Good Code. That's because Good Code is a relative construct.

At Vs. On: A Story Of Semantic Data Modeling

As most good software developers eventually learn: time is hard. Time-based bugs are incredibly common and are sometimes difficult to solve. There are a ton of misconceptions about how time and dates work in the real world and the simple solution is rarely correct for any significant length of time. Performing calculations based on times and dates can get messy, but so can simply storing them. There's a lot to be wary of when building out a data model with timestamps involved, and as always, a lack of consistent naming can cause a ton of problems.

Over the years I've come to use a specific terminology for dates and time in my data models. In general, I prefer not to use data types in variable names, and I prefer my code to read as passable English where possible. This means, I tend to avoid names like: date_created or published_ts which contain the data type in the name, and I avoid names like: created which give me absolutely no indication of the type or what it is used for.

Instead, I prefer to take cues from the English language. For timestamps or any data type that represents a precise moment in time, I use the suffix at. For dates or times that represent more abstract things like wall time or calendar dates, I use the suffix on.

As an example let's say I have the following data model:

class BlogPost:

    # ... other fields ...

    created_at = TimestampField()
    updated_at = TimestampField()

    posted_on = DateField()

This convention tells me that I should expect the posted_on field to contain a date or time but not both, and that it represents an abstract notion of time, whereas both the created_at and updated_at fields represent a specific moment.

I arrived at this convention through asking myself questions about the data in plain English. Consider the following questions:

  1. Q: When was this post published?
    A: It was published on the 25th of January.
  2. Q: When was the post record created?
    A: It was created at 12:00 PM on January 24th.
Disclaimer This convention doesn't always work because usually people would use at to describe any time (e.g. "I arrived at noon"). But once I settled on the convention, it wasn't confusing. It just doesn't always read nicely.

Knowing when to use a timestamp vs. a calendar date or wall-clock time is another issue (and a complicated one), but at least with this convention, I know which one I'm dealing with.

Now that I think about it, it might make sense to name timestamps with an aton suffix since question #2 technically uses both at and on.