The Internet's Original Sin

A piece I wrote was published in the San Diego Union Tribune today. The piece is a short dive into a possible alternate history of the web and the internet as it could have been.

In a bucolic way, the piece compares the development of the internet and the inevitable consequences of a few seemingly insignificant policies adopted by ISPs in the early days. The piece draws a through-line from those policies straight to the emergence of the centralized platforms we know today.

In short: ISPs killed the internet and made platforms like Facebook and Amazon inevitable.

The piece compares America's idealized conception of the westward expansion on the frontier to the development of the internet in the last few decades.

There are no homesteads on the internet frontier. This is true on both the large and the small scales. Whether you start a website yourself, or use Facebook, you’re still effectively renting space from someone else. On the internet, we are all simply users of other people’s computers, tenants renting homes we can never own.

The title for the print version is much more attention grabbing.

However, the development of the internet was fundamentally different from westward expansion in two key ways. First off, there was no one already living on the internet frontier—which isn't relevant to the piece directly, but it's an important distinction. The expansion onto the internet frontier was bloodless.

But secondly, ISPs made the internet a renter's frontier. Their decisions to not assign static IP addresses to homes or user accounts, and their decisions to provide non-symmetrical connections and block ports 80 and 443 made the internet fundamentally different than it was intended. On the internet today, truly owning your online presence is only possible for a very small subset of individuals and companies; the landlords of the internet.

No one would accept a world in which you could never own the car you drive, or the house you live in. Some people prefer to rent or lease, but cars and houses can still actually be bought. On the internet there is no path to ownership. Each of these stumbling blocks ensures that ordinary people can never truly own a piece of the internet frontier.

If computers are the land of the internet, then our personal information is the crop. Instead of being given our own small farm to start our online lives anew, we’re forced to be digital sharecroppers. In exchange for a portion of our privacy, we are allowed the chance to cultivate lands we do not own. On the internet, we are not citizens. We’re just users. Is this freedom?

Check out the full piece for yourself →

Last Year I Started Reading A Physical Newspaper

I subscribe to a lot of local and national news outlets, and as most people do these days, I read the news on a screen, because well, I'm a 30 year old millennial and it's 2021 and not 1980. But last September, out of a mix of both idle curiosity, anemoia, and a strong urge to put an end to pandemic-induced doom-scrolling, I signed up to receive a printed copy of the Sunday paper.

I've been happily reading a Sunday paper ever since.

Every Sunday Most Sundays, a paper appears before I wake up, I stumble, half asleep, out to get it, I make coffee, and then I read the news. It's a quaint experience, and it's a habit I've really come to enjoy and look forward to. The whole process takes anywhere from 30-90 minutes, then it's done, and I go about my day. On Sundays I do my absolute best not to read the news on my phone, as I do every other day. Once I've read the paper, I'm done reading the news.

Newspapers on a surface

A stack of papers I had lying around. Photo credit: me.

Part of the reason I originally signed up for a physical paper was to force myself to read it. As I said, I subscribe to a lot of new sites, and I find a lot of articles that I want to read, but I almost never actually read them all.

With so many stories to read, I found myself naturally gravitating towards the most exciting or most outrageous news. National news is almost always much more polarizing, engaging, and exciting than state and local news, but the paradox is that state and local news is often far more important and more likely to directly influence you in your day-to-day life.

What's exciting, urgent, and engaging is rarely what's important. I noted this in a previous post about my policy blog, Democracy & Progress.

A few years back, while listening to the Ezra Klein Show, Ezra lamented that we as a society didn't spend more time focusing on local and state politics—where our time and energy is often better spent. Collectively, we don't focus on state and local politics, and yet it's only there where a lot of policy solutions can be done. That conversation stuck with me...
California, Democracy, and Progress

I found that having a physical paper show up at my door made me more likely to read it. Simply throwing it away felt wasteful.1 That slight guilt motivates me to keep to my own goals.

In the last year, my knowledge of local & state politics has grown immensely, and I've really enjoyed the process of learning more about my state and city. I also know a lot more about both San Diego and California and what progress is and isn't being made.

Lots of people both inside and outside of the media lambast the news industry for being overly negative, and while it certainly is, national outlets are catering to a much more diverse and varied reader-base. On top of that, almost any political news story in a country as large and sufficiently polarized as the U.S. is bound to upset around 50% of readers. A political news story out of Michigan, Utah, or Texas (to pick three states at random) will likely upset Californians and vice versa. State and local news is more likely to comport with your views—assuming you somewhat agree with your community at-large.

I often discuss the news with friends and I usually get asked some version of the question, "How can you read the news so much? Isn't it just depressing?"

It can be, sure. But California has done quite a few things lately that I really like, and there's a lot of potential here for us to tackle big problems. If you're constantly focused on national news, it can feel like nothing ever gets done (because nationally nothing ever gets done), but locally that's just not true. Huge things are happening in California, and it feels good to know what they are.

I've also learned a lot about things I didn't think I would be interested in. The internet has trained us to seek filter bubbles and echo chambers, and to retreat into our known interests. A newspaper, because of its physical constraints, cannot be all things to all people. There are often pages of articles about topics I wouldn't say I care about, and yet I find myself reading them. On Sundays I find myself learning about up-and-coming bands, concerts, local events, and investing tips whether I wanted to or not, and it's been a really positive experience; not universally positive, but still very positive. Echo chambers aren't great, and physical papers are a good escape from them.

I'm not sure I had much of a high-minded conclusion to this retrospective, but I can say that I've enjoyed the anachronistic ritual of reading a physical paper, and I expect to continue doing so for a while.

1. They all get recycled. Don't at me.

Automated Podcasts With Automator & Overcast

I've mentioned before that I use Siri as an editing tool. I write a piece, lightly edit it, and then have Siri read it back to me. This helps me catch unintended grammatical errors and clumsy sentences. Building on that principle, and Hewell both ship with a feature that use iOS's AVSpeechSynthesizer API to read articles or location information aloud.

That said, I often find articles that I want to read, but after a long day staring at a computer screen, I don't want to actually read them. Lots of sites these days provide spoken audio for their articles—which is great—but the vast majority don't.

That's where Automator comes in.

Save Spoken Text to File

This Automator service simply runs a bash script that takes the contents of the selected text as input, feeds it to the built-in macOS say command, and outputs it to a file on the Desktop named using the contents in my clipboard.

Check out the full script
cd ~/Desktop;
# A hack to get stdin into say through Automator. For some
# reason simply saying -f didn't work for me.
while read line; do echo "$line" done < "${1:-/dev/stdin}" |
    say -o .spoken_text -f -

TITLE="$(pbpaste -Prefer txt)"
if [ -z "$TITLE" ]; then
    TITLE="Spoken Text"
# Sanitize the article title. Writers love colons which macOS hates
TITLE="$(echo "$TITLE" | sed -e 's/[^A-Za-z0-9._-]/_/g')"

# Conver the audio and be quiet about it
/usr/local/bin/ffmpeg -i .spoken_text.aiff -loglevel -8 -y "$TITLE.aac"
rm .spoken_text.aiff

The script also uses FFmpeg to convert the audio to an AAC file so that I can then upload it to Overcast, my preferred podcast player.

By default, macOS will include Automater services in the right-click menu, but I've also bound the script to Cmd+Ctl+Shift+S (which is similar to my existing Cmd+Ctl+S shortcut for reading the selected text aloud).

The macOS Services Menu

Now, I can discover new articles to read, perform a quick set of keystrokes, upload the audio to Overcast, and then go for a walk while I catch up on the day's interesting news!1

I've provided the Automator service as a zip archive below if anyone wants to play with it.

⬇️ Save Spoken Text to File.workflow

1. There are a few quirks to this workflow still. Websites are filled with non-article content, so to avoid selecting those, I typically following the following steps:

  1. Turn on reader mode (Cmd+Shift+R)
  2. Copy the title of the article to the clipboard (Cmd+C)
  3. Select the article text (Cmd+A)
  4. Run my Automator service (Cmd+Ctl+Shift+S)
  5. Upload the new AAC file to Overcast

I admit, it's a little cumbersome, but it does work really well.

Retrospective On A Year Spent Writing

Around this time last year, I started writing a lot. Since then, I've published a book, started a policy blog, and started writing for a local paper. All in all, I think it's safe to say that I've written about as many words in the past 12-15 months as over the preceding 10 years.

A crude calculation of the word-count of this blog shows that I've written approximately 93,719 words. That's a lot, but considering that Going Indie is over 62,000 words and my published articles total almost 7,000, that 93,000 looks a lot less impressive.

$ find archive/ -name "*.md" | xargs -I {} cat {} | wc -w

I've learned a lot about myself and my writing in the past year. I've learned how to pitch articles, how to build up the courage to submit them, and how to research and write about complex topics. I've also gotten better at defining my assumed audience.

I'd always intended for this blog to be a place for me to write about whatever I wanted, and while I have written about a number of topics here, over time I've gravitated towards discussions of software, the tech industry, and personal matters. Last year, during the depths of the pandemic, I wanted to expand and write about public policy, but this blog never felt like the right place to do that. Hence why I started Democracy & Progress, and why I continue to write for the local paper.

Writing for both D&P and the paper has helped me focus my energy on working to better inform people and convince them to take an interest in a given subject. It's also helped me better understand the in-depth nuances of topics I previously thought I knew something about. Nowadays, I write not only as a way to inform and convince others, but as an exercise to educate myself. They say you don't fully understand something until you try to teach it, and that truism has held strong for me this past year.

Writing this much has also been a catalyst that pushes me to write even more.

I've really enjoyed this deeper commitment to writing, and while it remains just a hobby, it's an incredibly fulfilling and enjoying one that I hope to continue as long as I can.

Grove, A New Tree-Planting Wellness Game 🎉

Today marks the release of my newest app: Grove! Here's a brief description of the app from the product page:

Grove Logo

Grove is an augmented-reality game where you plant virtual trees in the real world! Collect the various kinds of trees, learn about them, and earn achievements all while getting outdoors and enjoying your virtual garden.

Grove is part game, part educational app, and part wellness app! In Grove you care for your trees and tend to your garden, and in turn, you stay fit and healthy.

Grove Product Page

If you're at all interested in the app, please do give it a try, and let me know what you think. I can't wait to hear your feedback (and see some of your trees)!

What is Grove?

At its core, Grove is part game, part AR-wellness app with some educational bits sprinkled in. The player plants virtual trees in real-life locations and builds out their virtual grove. Each tree is unique and randomly generated. Trees can be of several collectable types, each with their own unique artwork and animations. Players tend their grove by regularly watering, fertilizing, and harvesting from their trees and well-tended trees grow big and strong.

Each tree has a unique name, fun facts, and secret stats that determine the bonuses it gives when harvesting. Harvested resources can be sold at the market for coin that in-turn can be used to expand the player's grove and help tend their trees.

Grove is also a social app. Players can invite their friends to play with them and visit each other's trees. Lonely trees drop fewer seeds, but trees with friendly visitors are happier and more productive.

The app also includes some optional in-app purchases that can provide additional boosts, or unlock a secret Developer Diary and custom avatars to show off to friends.

As players tend and grow their grove, they earn achievements for their progress and rewards that help them advance further.

Where's the Wellness?

In Grove, as in real life, trees need space to grow; they can't be too close together. In order to plant trees, they need to be spaced apart and trees can only be watered, tended, and harvested from when the player is nearby. In essence, think of Grove as an app that encourages uses to go on a daily walk to tend their grove. Trees need to be spaced at least 30 meters (~100 ft) apart so there's plenty of walking to do when you've built up a full size grove. The app also awards bonuses and rewards for completing daily step goals.

And then there's Climate Change

Yup, you read that right. Climate Change is a gameplay mechanic.

Trees in the real world naturally absorb carbon dioxide from the air and turn it into wood, leaves, and branches. One technical name for processes like this is Carbon Capture and Sequestration, and tree planting is one technique that can be used to mitigate the effects of Climate Change in our world today.

In Grove, your trees capture carbon too! (virtual carbon that is) As your trees grow they capture carbon at the rate of real trees using data collected by the European Environment Agency. This helps players get familiar with this crucial emerging technology and get a feel for just how much tree planting can do to help the environment. Also, there's achievements for capturing lots of carbon.

A New Challenger Appears!

I've been working on Grove for the past six months and it's been a blast to build. I've never built a game before and while Grove is technically more of a wellness and education app than a game, there are certainly game-like components.

Grove is also the first iOS app I've built that heavily relies on custom assets. Usually I try to stick to drawing simple things in code or simply structuring the app to focus more on textual content, but for Grove that approach simply would not do. It needed to be cute, and it needed to be beautiful. I'd like to thank Grove's designer Victor Teles for everything he's done to give Grove a unique and adorable feel.

If you'd like to learn more about how (and why) I built Grove, give the app a try and unlock the Developer Diary. I've written a deep-dive there that goes into exactly why and how Grove came to be.

I'm sure I'll be going into more detail on various aspects of Grove in due time, and especially on my podcast: Indie Dev Life, so be sure to stay tuned for updates.

As always, thanks to all of my beta testers and to everyone who contributed to Grove. This launch would not be possible without you.

Check out Grove →

Unbounded Possibility Is Bad For Productivity

Being productive is hard; especially if you're working by yourself or working remotely. When you're working alone you have a lot of freedom, but that also means you have a lot of slack. No one is holding you to a schedule or deadline, and nothing is stopping you from procrastinating or getting distracted.

Even when you're focused, it can be hard to decide what to focus on, since there's often no required order in which things be done. From an objective perspective, whether I choose to call my bank today or tomorrow makes absolutely no difference. The same is true with what features I choose to implement on any given day. As long as the features get done, the order and the exact date they're completed isn't really important. Some features must be done before others for technical reasons, but others are completely unrelated and can be developed in any order. But this ambiguity is precisely the problem.

If you could work on anything at any time, what should you work on right now?

I'm going to generalize here: I don't think humans deal with unbounded possibility very well. We long for some sort of structure—or at least I do. When presented with the choice of doing any feature I want, I'm left unfocused and forced to decide—moment by moment—what features to build, which not only wastes time, but increases decision fatigue.

Lights in the Infinite Dark

Speaking with a friend earlier over the weekend, we stumbled on a maxim that I think sums up the solution pretty well:

Planning is the art of bringing order to chaos.

I've found that arbitrary deadlines, like arbitrary goals keep me motivated and focused. Without some sort of deadline or goal, I feel adrift and it's difficult to force myself to work on anything for a significant period of time. So I create artificial deadlines and goals, sometimes completely arbitrarily. Often times, I'll just pick a date on the calendar based on nothing but gut intuition, and then I change it later if necessary.

Without goals and deadlines Infinite Focus Infinite possibility gives no guidance.
With goals and deadlines Focus with Direction Adding goals gives you a direction.

By setting completely arbitrary deadlines and goals, I'm able to narrow down the unbounded, infinite possibility that is creating software into a simple series of steps. This isn't a new idea; tons of people do this. I just find it interesting to think of deadlines this way.

Whether your planning process involves ultra-precise scheduling, or just a notes file with some rough deadlines in it, having any sort of plan at all gives focus to your efforts and it guides you through the haze of infinite possibility.

Even if your deadlines are completely arbitrary and can be changed at will, having them is the most important thing.

Imports Are Endorsements

When you import someone's code, are you endorsing them?

At first glance, the answer might seem simple: of course not! And while it's pretty obvious that imports are not universal endorsements of the code's author, they aren't entirely void of meaning either. Endorsements aren't an indivisible quanta—some fundamental particle that cannot be divided—they exist on a spectrum.

Supply chains are tricky things

Importing code written by someone else is always a risky endeavor. Most often external dependencies work and work well, but they also expose your software to additional risk. The fact that you are willing to depend on someone else's code implies some kind of inherent bond of trust. It implies a relationship between the developer (or organization) and the code author. Importantly, it also implies that the developer finds the author's code valuable in some way.

Dependencies are part of a software's digital supply chain—along with any other provider we use to power our software. And in today's world, where alternative dependencies abound, many people understand that the various links in the supply chain aren't simply bound together out of mutual necessity. They choose to depend on each other, and so there are shared values and responsibilities that are common to all in the chain.

Using an example out of the news, Apple doesn't manufacture many of the components in its devices, yet when it's partner Foxconn is found to be abusing workers, we place some of that blame on Apple for choosing to work with Foxconn given their past behavior. Similarly, Google and Microsoft do not generate their own power, yet they've made efforts to rid their supply chains of fossil fuels, and the public has—rightly—heaped praise on them for these actions. From fashion to technology we understand that companies are somewhat responsible for choosing ethical and responsible supply chain partners. Why should developers be any different?

Our decisions matter

I think most people would agree with the decision not to use software written by an outspoken white-supremacist, but even that extreme example implies that there is some threshold where the author's views would impact the technical decision to use a given toolset. The literature, music, and film worlds are well-accustomed to this debate. Authors leave a mark on their work. How big that mark is remains a subject of debate, but there's no debate that the author has at least some impact.

Obviously big tech companies and organizations don't suffer because one company decides not to use their stuff—ideas require collective, industry-wide action to produce results.

The point is that our decisions to use Facebook's frameworks, Google's toolsets, Apple's platforms, or Amazon's services must be informed by their creators' behaviors and policies. Sometimes these decisions will be good for business, and sometimes not. Other times they might be incredibly beneficial or utterly unremarkable. Regardless of their effects, these decisions matter.

Some readers might bemoan this idea, claiming that I'm making software political, but everything is political in some form. Software doesn't exist in a vacuum and there are real consequences to our choices that echo beyond the apps and websites we build.

Whether we like it or not, the role of engineers is to manipulate the real world to achieve some end, and how we do that work has just as much import as what end we achieve.

I, for one, am driven to do what I can to mitigate the effects of Climate Change, so I host all of my new services in data centers powered by renewable energy and I'm working on migrating my existing services there as well. My hosting platform is a part of my digital supply chain, and I bear some responsibility for the emissions my services produce. The downside is that those servers are in Europe now, so my ping times suffer a bit, but to me that tradeoff was worth making.

Destinations matter, but the road to the destination matters too. Developers achieve our ends through importing other people's code, and those imports matter. Choose yours well.

Easy And Ethical Traffic Monitoring With Goaccess

Traffic monitoring is a staple for web businesses, but for some reason, we've outsourced a pretty simple problem to mischievous third-parties. While there are well-behaved traffic monitoring platforms, I've developed a few homegrown solutions that have worked really well for me and my business. If you're looking for an easy traffic monitoring solution, and you're conscious of your user's/visitor's privacy, you should try one of these solutions. I promise, they're pretty simple.

Option 1: Just Don't

You always have the option to just not do traffic monitoring. Often times we can convince ourselves that data we collect is precious or useful when it fulfills no real business or personal need.

If you're a blogger, then traffic might matter to you, but it probably shouldn't. Back when I used to use Google Analytics I also had very few visitors to this site. Was it useful to know that 13 people had seen my article? Not really, but it felt useful. In the end it was just another stat for me to endlessly refresh. Progress bars are fun to watch, but you'd probably be better off writing another post, or just going for a walk.

If you own a business that sells a product, then remember this: it's not actually relevant how many hits your website gets. It's important how many products you sell. At one point, Going Indie was featured on Product Hunt, which was awesome, but that featuring resulted in very few actual sales. Was it worth my time to endlessly refresh the PH dashboard? No, and I kinda wish I didn't have the option.

Real-time dashboards are addictive dopamine factories. Sometimes it's better to just avoid them.

Option 2: Use GoAccess

If you need to have some sort of traffic monitoring, then give GoAccess a try. GoAccess aggregates webserver access logs and provides reports either live in the shell, or as really elegant and self-contained HTML files.

I've used GoAccess for years, and it's become my default solution for traffic monitoring. I've automated my reporting using my new helper RPi. Every week, the RPi generates and aggregates the reports for my various websites and emails them to me.

Sample GoAccess Report

A sample GoAccess HTML report

There are downsides to GoAccess though. Since it's using access logs, the numbers are inflated by bots and browser prefetching. GoAccess has ways to filter out some of those things, but in most cases, I've just gotten used to the numbers being bigger than they really should be.

One upside to using server-side traffic monitoring is that your stats are unaffected by people who are using ad-blockers or who refuse to enable JavaScript (are there still people doing that?)

Option 3: Roll Your Own

For some projects, I've needed more reliable and accurate traffic stats. To do that, I decided it would be best to roll my own. As I said earlier, traffic monitoring is a pretty simple problem-domain—as long as you're willing to live with some margins of error. My California policy blog uses a homegrown traffic monitoring solution that is so maddeningly simple, I will include it below in its entirety—formatted for readability.

(function() {
    if (window.fetch) setTimeout(function() {
        fetch('/pageview?u=' + window.location.pathname)
    }, 2000)

This snippet sets a timer for two seconds and then fires a request off to /pageview which simply returns a 200 response. The site is statically generated—just like this one—so it can't do any processing or custom request handling, and there's an empty file called pageview in the webroot directory. I join all of my access logs together, remove anything that doesn't contain a request to /pageview and voila!

zcat /var/log/nginx/access*gz | grep pageview > $STATSFILE;
cat /var/log/nginx/access.log | grep pageview >> $STATSFILE;

/usr/local/bin/goaccess \
    -f $STATSFILE \
    --ignore-crawlers \
    -p /etc/goaccess.conf \

These reports won't include any requests made by searchbots, any request that didn't execute the JavaScript, or any request made by a user that didn't keep the page open for at least two seconds. This solution gives me simple and effective traffic stats that leverage the data my servers were already collecting, with no additional or accidental data collection required!

What Really Matters

Traffic monitoring is a useful, but addictive tool, and it's easy to get caught up in the data they collect and convince yourself that it's more useful than it really is. At the end of the day, I just need to know, roughly, how many people read one of my articles or how many visited the homepage of a service I run. I don't need to know who they were or anything else about them, and I don't want more data than I need.

Due to the limitations of server-side monitoring—even with my JS snippet—GoAccess can't provide you with exact traffic numbers; nothing can. But like I said, you probably don't need exact numbers. You probably only really need the order of magnitude, which server-side monitoring can easily provide.

How I Use Docker (For Now)

In a recent episode of Indie Dev Life I went into some detail about how I use Docker to host my software. I discussed my experiences with and guidelines for using Docker in production. This post is a continuation of that discussion.

I've been using Docker to run my software in production ever since the launch of Adventurer's Codex, and MyGeneRank back in 2017. In my technical discussion blog post for both projects, I talked a little bit about Docker and its place in the stack. I also discuss Docker and its role as a deployment tool briefly in Going Indie.

Over the years I’ve managed to tune my services to be incredibly easy to upgrade. For example, since Nine9s is written in Python and uses Docker, a deploy is simply a git pull and docker-compose up. Nowadays, even those steps are automated by a bash script. Having such a simple process means that I can deploy quickly, and it lessens the cognitive burden associated with upgrading a service, even when that service has gone without changes for months.

Over time, Docker's role in my software has morphed and evolved. During the initial launch of Adventurer's Codex, I depended heavily on community-built Docker files for large portions of the architecture. But over time Docker has actually shrunk to fill a much more limited role.

The Problem Docker Solves (for Me)


I use Linode for my server hosting, so I'm already operating within a VM, and depending on the software, I might have multiple virtual servers powering a given service. Docker simply provides isolation for processes on the same VM. I do not use Docker Swarm, and I've always just used the community edition of Docker.

To me, Docker has become a tool that makes it easy to upgrade and manage my own code and other supporting services. All of my code runs in a Docker container, but so do other systems that my code depends on. For example, and Nine9s both use memcache for template caching since support for it is built into Django—my preferred web framework. Each web server runs Nginx on the host which reverse-proxies to Docker containers running my Django apps.

Both services also perform asynchronous processing via worker nodes. These workers are running inside of Docker.'s workers are spread across various machines and pass requests through their own custom forward caching proxy containers backed by a shared Redis instance also in Docker.

This setup ensures that I can easily upgrade my own code, and it ensures that exploitable services like memcache aren't exposed to the outside world.

In short, I've found that Docker works great for parts of the stack that are either upgraded frequently or for parts of the stack that are largely extraneous and that only need to communicate with other parts on the same machine.

I've largely stopped using Docker in cases where there are external tools that rely on things being installed on the host machine, or where the software requires more nuanced control. Nginx is a great example. All of my new projects have Nginx installed on the host, not in Docker. This is because so many tools from log monitoring to certbot are designed to run on a version of Nginx installed globally. I use Nginx as both a webserver for static content and a reverse-proxy to my Django apps. If you want to use Nginx in Docker, I'd suggest only using it for the former case. The latter is better installed on the host.

I'm still torn about running my databases and task brokers in Docker. Docker (without Swarm) really annoys me when I'm configuring services that need to be accessed by outside actors. Docker punches through CentOS firewalls which renders most of my typical tactics for securing things moot. I've also started to question the usefulness of Docker when I'm configuring a machine that serves only one purpose. Docker is great at isolating multiple pieces of a stack from each other, but on a single-purpose VM it seems like it's just another useless layer that's only there for consistency.

Docker on CentOS is particularly irritating as the devicemapper doesn't seem to release disk space that it no longer needs. This means that your server is slowly loosing useful disk space every time you update and rebuild your containers. After about 3 years of upgrades,'s main server has lost about 20GB of storage to this bug. Needless to say, I'm investigating a move to Ubuntu in the near future.

What about Docker in Development?

As with Docker in production, I have mixed feelings about the role Docker plays in my development. I dev on a Macbook Pro, and my Django apps run in a plain-old virtual environment. No Docker there. That said, I do use Docker to run extraneous services—like Redis, memcache, or that forward caching proxy.

I stopped running my Django apps in Docker a while back for much the same reason that I no longer run Nginx in Docker. Even with Docker's recommended fixes, Django's management CLI is frustrating to use through Docker and I've had more than one issue with Docker's buffering of log output during development.

Docker: Four Years In

Overall, I really like Docker. It makes deployments super simple: just git pull and docker-compose up (or use my fancy shell script that does zero-downtime deploys). That said, I'm certainly not a Docker purist. I use Docker in a way that reduces the friction of my deploys, and I'm starting to use it less and less when it's just another layer that serves little purpose.

Like every tool, Docker has it's role to play, but in my experience it's not the silver bullet that many people think. I haven't used Docker on AWS via ECS, so I can't comment on that. Perhaps that's where Docker really shines. I still prefer a more traditional hosting strategy. Either way, Docker will remain an important tool in my toolbelt for the foreseeable future.

Lessons On Variable Naming From Breakfast Burritos

This morning I ordered a breakfast burrito from a local taco shop. Normally this would not be news and obviously would not warrant a blog post or any in-depth analysis, but it was early and I hadn't yet had coffee, so my mind was loose and my thoughts wandering. As I looked over the menu, I pondered the two vegetarian breakfast burrito options:

  • Mushroom burrito filled with mushrooms, potatoes, eggs, and cheese
  • Potato burrito filled with potatoes, eggs, beans, and cheese

At the counter I asked for the potato breakfast burrito, and I intended to order the latter of the two, but it occurred to me that they both contained potatoes and therefor my order was ambiguous. What after all makes a burrito with potatoes, eggs cheese, and mushrooms deserve a different name than a burrito with potatoes, beans, eggs, and cheese? What makes the latter not a bean breakfast burrito, as the beans are the item that is unique to the latter burrito whereas potatoes are common to both? Are potatoes a more significant ingredient? If so, why?

I received my order—which was correct by the way—and went home, but as I walked I wondered, how is it that the cashier and I understood each other? There was so much ambiguity in the names of those menu items. How were we able to make sense of the obvious ambiguity?

Naming is Really Hard

If you haven't seen the connection by now, let me drop the pretext. These same questions also relate to how we choose to name our variables and our functions in code. Naming after all is hard, and I think my burrito example helps explain why.

It is often said that the three hardest problems in computer science are naming and off-by-one errors.

In a more rigorous naming system, I assume that most people would come to the conclusion that the second burrito is probably mis-named. It should be called the "bean breakfast burrito" since, as I mentioned, the beans are the distinct ingredient that make the latter burrito not strictly a subset of the former.

That said, beans are not normally considered a main ingredient in a burrito. In the conventional burrito naming scheme, more appealing or distinct ingredients, or ingredients not considered to be condiments, take precedence. This naming scheme is the reason why a burrito with carne asada, pico de gallo, and guacamole would be simply called a carne asada burrito and not a guacamole burrito.

These same conventions exist when we name variables and functions. We can imagine a scenario where we have a list of users and need to filter out which users have recently logged in and which among those have active subscriptions to our service.

def get_active_subscribed_users():
    all_users = get_all_users()
    active_users = (user for user in all_users if user.is_active)
    <variable> = (user for user in active_users if user.has_active_subscription)

The first two variable names are fairly obvious, the question becomes: what do we name the third variable so that it is not ambiguous? We could of course call this new variable active_users_with_active_subscriptions, but to many that would be too long, and to my eyes that makes it seem that this variable contains a list of (user, subscription) pairs.

We could name the value active_users, actively_subscribed_users, or even just relevant_users if the criteria for what relevancy means is clear enough in context. Some developers prefer to simply refer to these as users but I find that incredibly confusing. Others may prefer to define the variable users and then redefine it as they filter down the list to suit their needs, which I find even more confusing and unclear.

In practice I tend to prefer the third option along with a comment explaining what I mean by "relevant". This only exacerbates our problems though. If two groups of "relevant" users meet in a new context, their names would clash and we would need to find new names for these groups.

The context is here is key. If we instead fetched the same list from another function call, we could drop the qualifier entirely.

def get_active_subscribed_users():
    users = get_active_users()
    # We can avoid the question entirely if we simply return the list here.
    return (user for user in users if user.has_active_subscription)

Names are a Leaky Abstraction

As with our breakfast burritos, we could simply default to the names being a list of the components, but that can become overly burdensome very quickly. Our potato burrito would be unceremoniously called the "potato, eggs, bean and cheese breakfast burrito", which is unambiguous but also cumbersome. It can also cause problems as forgetting to mention a single component could confuse the reader and lead them to believe that a reference to a potato, egg, and bean burrito was not the same as your potato, egg, bean, and cheese burrito even if you were both referring to the same thing.

As programmers we aren't taxed by the character; we can have longer variable names, but at best those names should be descriptive, succinct, and distinct. Issues arise when names, by their nature, don't convey the whole story. Names almost always convey a summary of their true meaning. They can't effectively convey the context in which the name was given or the inherent value of the named thing. Out of context a name might be confusing, but that confusion may vanish when used in the appropriate context.

Likewise, in some contexts a potato breakfast burrito is the same thing as a mushroom burrito, but today it wasn't.