BiteofanApple Archive About Apps Microblog Photos Links
by Brian Schrader

Real Life Science Fiction

Posted on Tue, 06 Feb 2018 at 05:59 PM

Earlier today SpaceX launched their first Falcon-Heavy Rocket with Elon's personal Tesla onboard. All in all the launch went really well, and congrats to everyone at SpaceX for making it possible 🎉. There was a lot of amazing technological development and hard-core engineering that had to happen to make all of it possible, and it was amazing to watch, but one part in particular felt surreal.

Falcon Heavy is an absolutely huge rocket. According to SpaceX:

Only the Saturn V moon rocket, last flown in 1973, delivered more payload to orbit.

For me though, the real crowning moment, the minute everything stopped being real and starting being Science Fiction was when the first stage boosters came back down, and landed in perfect sync; it felt like complete magic.

Most days we're trapped in the real world, but things like today's landing are so awesome that, for a second, it can feel like we're living in a Fantasy or Sci-Fi universe. While I don't really approve of SpaceX's corporate practices, they do truly amazing and fantastic work, and the people there are top notch. I can't wait for the next launch.

The Web as a Social Network

Posted on Fri, 02 Feb 2018 at 12:12 PM

Brent Simmons on

Twitter and Facebook are convenient, sure, but so are fossil fuels, and the cost was similarly unknown for a long time. But now we have some idea just how bad these things are for the world....

[With Micro.Blog] your posts are just a normal, everyday part of the open web. At this writing, mine appear on — but it’s on my to-do list to have those appear on my main blog (this blog) instead. (Probably won’t happen until after I ship the app I’m currently working on.)

And this is how it used to be, and how it never should have stopped being: my blog is me on the web. I own my blog: I own me.

Here's where I'd normally write something like this:

Brent is right. Blogs and microblogs can and are the future of social networks. They allow you to keep control of your content, and protect you from huge, often nefarious corporate whims...

But any reader of this site will already know my feelings on the Open Web and Social Networking. So this time I've decided to make this post a retrospective on what I've said before, and how my opinions, and the web has changed over the last few years.

The Open Web Series Collection

My first post about the Open Web was from July 2014, when this blog was still in it's infancy. Since then I've basically written about this subject whenever it's come up in the circles I travel in. At first I was just concerned with preserving my site and my thoughts for the future, but over time I morphed into the Open Web loving nerd you see before you today.

How much history is lost by those who live it not writing it down. With the internet, we have the ability (assuming storage is cheap) to preserve everything we ever say, think, write, or post. That's really powerful, and possible accountability and privacy issues aside, its probably the most important use of an invention in the history of man.

I've retweeted this tweet multiple times, mostly because it's hilarious, but every time I want to do so, even though I have it saved in Instapaper and favorited on Twitter, its easier to google the text of the tweet (as close as I can remember) and let Google do the work.

After a while I got it into my head that RSS could be used for a new (old?) kind of social network, and I've been rolling with that idea ever since.

What would it mean to have a social network (like Dispora) that is decentralized, and independent of corporate ties? Could such a thing exist?

Then I was (am still) mad at Twitter for a while, which just stoked the Open Web flames.

Could it be that the time for new social networks is past us? I hope not. RSS succeeded as an open standard, and so did email (though email had an early start). The podcasting industry is growing, powered by RSS... RSS use floundered after Google Reader shutdown, but RSS is still with us. That's important.

Create systems that are ambivalent about the open or closed web. If I create a tool that's good at posting content to Facebook and Twitter, it should also post to RSS feeds...

I tried making a standard once, but putting something on GitHub in a README doesn't make it a standard.

A few weeks ago I began drafting a new standard for open, platform independent communication service. You can think of it as Twitter meets RSS...

One of the critiques of RSS feeds in a world dominated by Facebook and Twitter is that RSS just isn't fast enough. You can't hope to achieve what Twitter calls "in-the-moment updates" and "watch events unfold" if your client is polling each web site's RSS feed once an hour for new microblog posts...

After a while I started noticing that a lot of creators and bloggers I followed were moving back to open, platform-less distribution methods (like CGP Grey and his RSS feeds and email lists).

[Social Media Platforms] have to do this to make money. It just sucks that how they make money is by compromising the people that made the service what it is (cough Twitter cough).

I was still mad at Twitter.

Twitter is a complementary medium to blogging, but it's not a replacement...

By knocking down a few walls and moving some furniture around, blogging is preparing for a comeback, and we'll all be better off for it.

More recently though I've gotten more realistic and still more optimistic about the future of social networks and openness. Facebook and Google's Algorithms still loom high and mighty, but I started to see a resistence forming, and then Manton Reece released

I've long heard people say that we've lost video, music, and messaging to the walled worlds of YouTube, iTunes, and WhatsApp/Facebook/Twitter respectively, and while I believe that is the current state of the web, I don't believe it's the end-state.

If we're going to allow... algorithms to be such a huge part of our lives, which I don't believe is necessarily a bad thing, then they should at least be subject to some sort oversight.

I've written lots of stuff about services like Manton's, and the Open Web in general. It's really important that we preserve not just the way the Web used to work, but as Manton says: the way it should work.

The future looks bright for the open web and social networking, but the fight is ongoing.

Primitive Tech is Here for the Long Haul

Posted on Fri, 24 Nov 2017 at 02:46 PM

John's most recent Primitive Technology blog post has me pretty excited. The new video is great, as his videos tend to be, but the post mentions something a bit more exciting.

I bought a new property to shoot primitive technology videos on. The new area is dense tropical rainforest with a permanent creek. Starting completely from scratch, my first project was to build a simple dome hut and make a fire.

This is just me speculating here, but he wouldn't be buying a whole new plot of land to shoot videos on if he wasn't planning on making more videos long term. While I never had any reason to believe he'd be stopping anytime soon, such a big purchase is great news for him, his channel, and fans like me.

Primitive Technology Blog →

The FCC Moves to Dismantle Net Neutrality

Posted on Thu, 23 Nov 2017 at 09:18 PM

Jonathan Shieber for TechCrunch →

Federal Communications Commission Chairman Ajit Pai today made good on his long-standing pledge to tackle regulations established in the last administration designed to protect the distribution of internet content.

On Tuesday, Pai distributed to the other commissioners at the FCC a draft of his suggested rule changes under the auspices of the “Restoring Internet Freedom Order.”

The move sets up a December 14 vote at the FCC that could have broad ramifications for the entire internet. Under the rules established by the Obama administration, internet providers are required to provide open access to their networks for all digital content.

I'm really sad to see the FCC going forward with their plan to dismantle the Net Neutrality protections. What's worse is that Ajit Pai seems to know full well what this will mean for the web, and seemingly no amount of public comment against his proposal can disuade him.

It seems like the only recourse we really have now is either to chip away at Ajit Pai's resolve with public comments before the December vote, or wait for some act of Congress to reverse the decision: something I can't even imagine them doing.

Help Protect the Internet by Contacting Your Representative →

MyGeneRank: Behind the Scenes of the Newest ResearchKit App

Posted on Wed, 25 Oct 2017 at 10:11 AM

I'm super excited to announce that MyGeneRank, an app that I've been working on at my jobby-job at the Scripps Translational Science Institute for a year and a half, is now available on the App Store, and the source code is available on GitHub!

I've wanted to talk about this project for a while, I've written many unpublished posts about how it works, and now the time is finally right. If you're looking for the scientific or research parts, I'm going to leave that to the paper we published. I want to talk more about my experiences and what I've learned in building the system.

As a quick overview: MyGeneRank is a ResearchKit based research study app aimed at providing users with their Genetic Risk for diseases, and measuring their reactions to this information. The first disease being Coronary Artery Disease. I'm definitely not a doctor, statistician, or biologist, and everyone else on the team handled all of the scientific work, but I am a Software Developer and I worked on the vast majority of the API, Computation Engine, Website, and iOS App development, DevOps, and System Administration as the sole developer. I learned a hell of a lot during the last year and a half and looking back, I'm not sure how I even got this far. Since the source code is available, at least for the API, (iOS source is coming) so now you too can see my mistakes and pass judgement! 🎉

From a Technical Perspective

Vaguely, MyGeneRank's backend has three main parts: a database (Postgres), a Django REST API (which is open source), and what we've called a Computation Engine. All of it runs in-house, maintained by yours truly. The API and database are pretty self-explanatory, and the "engine" is really a Celery cluster and Redis queue which runs, among other things, a series of Python wrapped command-line tools and custom R scripts to calculate a person's genetic risk given their 23andMe genotype data. While the computation stuff is a sort of special case (what with the CLI tools and R scripts), the API's design goal was to be as industry-standard practice as possible. It's 90% covered with tests, leverages Travis-CI, uses DRF and Celery for the vast majority of its work, and everything runs in Docker containers on CentOS.

If this stack sounds familiar to readers of this site, then you're catching on. In my post about Adventurer's Codex's stack I spelled out basically the same setup. In truth the stack for AC was heavily influenced by MyGeneRank. I took everything I learned building MyGeneRank and ported it to Adventurer's Codex a year later. That's how developers work, we do things once, then copy-pasta it everywhere.

Scientific Computing at Scale: Performance and Throughput

MyGeneRank has very demanding computational needs. Currently, we have 178 cores and almost a terabyte of RAM powering the app and its backend. Turns out, calculating a person's genetic risk, even using genotyping data and not NGS, requires a lot of computational power. Scaling this kind of intense scientific computation for public use was one of the most challenging (and enjoyable) parts of the project. But even now I don't really have many concrete answers to the problem other than the twin suggestions: add more cores and make your work as functional and therefore parallel as possible.

Into the Weeds for a Bit

The calculations needed to return a given user's genetic risk score can be broken into roughly 110 individual tasks and is mostly trivial to parallelize. What takes ~110 minutes of CPU time per user can be done in 3.5-4 minutes of wall time on our current system, but as any web developer knows, even that kind of processing time is hard to scale. The first couple tasks are run in series and then two chunks of tasks are run in parallel. The first chunk contains a single task which calculates the user's genetic ancestry, and the other chunk has 52 two-part tasks. This means that at any one time, 53 tasks per user are running during the bulk of the computation. These first 52 tasks take ~1.5-1.8 minutes each depending on which chromosome they're processing (some are bigger than others) and then the second batch takes about the same amount of time per chunk. The genetic ancestry takes ~3.2 minutes. Once all of these tasks are complete there's a final step that calculates the actual risk score, which is fairly instantaneous. What this means is that the time from start to finish for a given user's score is parallelize-able up to ~54 cores, then it's core speed that matters, which is harder to improve. The extra cores we have allows us to calculate more scores at once, but even with our huge core count, we can only calculate ~3-4 users' scores at a time. The good news is that all of the steps are really good at keeping memory use low; CPUs are the bottleneck here.

Improving API and Website level performance is much more straightforward than doing the same for the backend. Like most sites, MyGeneRank sits behind an Nginx Reverse Proxy with some out of the box microcaching for popular pages.

At time of writing, I'm not sure what the load will be like when we finally announce the study publicly, but I've spent a lot of time worrying about and trying to ensure that the site can handle the loads we hope for. There's been a lot of interesting news and blog posts over the years about what kind of download numbers an app, and especially a research app can expect, and I wanted to build MyGeneRank with those kinds of numbers in mind. Once the project has hit its first month I'm going to do a retrospective on how it all went and we'll see if my performance enhancements were enough.

Lessons Learned

There's a lot of little things that I've learned in building MyGeneRank (and later Adventurer's Codex). When we started, I'd worked on a few toy iOS apps and a few corporate web projects, but MyGeneRank turned out to be of a completely different scale.

Before MyGeneRank, I'd never used Django, or Django REST. I'd heard of them, and had a friend who used them, but aside from a few toy projects in Flask, my web experience was in front-ends or Java/Spring (and I guess PHP). My work at that time was mostly in writing analysis pipelines in Python and since it's my preferred language, I wanted to use it for MyGeneRank. To this day, the structure of the API project is a little wonky and apps aren't where they should be; both are cruft from those early days. I try not to worry about it too much since I was learning as I went, and this kind of legacy cruft is impossible to avoid unless you knew everything at the start, and we most assuredly didn't. I can say that Django/Django REST has shown me just how boring building websites really is because it does most of it for you automatically and supports anything you'd ever really need right out of the box; you should definitely use it.

The modern web is really complex and there's a reason that it takes so many skilled developers to build large systems. Server setup, administration, DevOps, reporting, and application development are all sub-disciplines unto themselves (which is why they're separate jobs at most places). And I've found that jumping in and out of these different worlds can result in a sort of Programmer's Jet Lag as your body adjusts to the new environment after spending days in a completely different one.

On the native side, Apple's frameworks can be fun to use, and their OS frameworks, documentation, and user guides are world-class, but their tooling can also be frustrating and slow at times. The iOS app is written entirely in Swift and that has had some major effects on the development. Swift's tooling is still very new and the language has changed drastically since it came out. Having worked in both, I can say that, while I do enjoy Swift, nothing has made me appreciate the maturity of Python more.

Overall, my advice for building these kinds of systems is the same as when I wrote a similar post about Adventurer's Codex:

...ask people who've done it before... The internet is great, but it's actually pretty difficult to find out how to design modern web systems from scratch with just a vague notion and Google.

MyGeneRank, to me, represents my passing from a junior to a senior developer in a lot of ways. By no means have I learned all there is to know, but having now built two large web projects, and being the sole developer for one of them, I feel like a different person from the one who started the project a year and a half ago. I'd love to know what you all think of the source, and if you find a bug, file an issue please.

OAuth Over XMPP

Posted on Tue, 24 Oct 2017 at 06:24 PM

As I've said before, Adventurer's Codex uses XMPP for it's real-time features. During development we ran into a couple of interesting challenges with integrating such a mature system with our new-ish web stack, one of which was User Authentication.

The majority of Adventurer's Codex uses an OAuth Provider model for user authentication but Ejabberd (our XMPP server) requires that the username and password be sent at connection time. Obviously we didn't want to have two different auth schemes to support, and we didn't want our client app to store any passwords (hence OAuth). We spent a while hunting for different possible solutions, and in the end we stumbled into a really simple one.

Ejabberd allows for authentication to be handled by an external script, which allows us to use our core database as the auth backend. We could, in principle, use a Django management command to make a call to our database, hash the password that we were given by the client and compare it to the one stored in our database, but not only is that a lot of work and error prone, it's too coupled to our database layer, and it still required the client to store the user's password.

In the end we went with what might seem like the obvious solution: just keep using OAuth. After the client receives the initial user data at load time, it sends the same OAuth token as the password along with the user's XMPP JID to Ejabberd. Ejabberd then calls out to an external script which makes an HTTP request to our API to see if the user exists and that the token is valid. Clean and simple.

A visualization of the OAuth over XMPP process.

There's a few major advantages to using this method. First, The client no longer has to store user passwords, which not only makes our implementation simpler, but also protects our users from a whole host of attacks. Second, the user's XMPP session is now bound to the same limits as the rest of their access to the site which greatly simplifies permissions handling. Third, and perhaps the most interesting is that Ejabberd is no longer tied to either the Django CLI or the database and can be spun off as essentially a separate microservice on another machine.

Check out the Ejabberd Auth Script →

todolist update

Posted on Tue, 10 Oct 2017 at 03:35 PM

Over the last week, I've made some changes to my todolist script: I've cleaned up the printing a bit and removed the temp file.

todolist terminal output

I had to remove the temp file because it was actually causing performance problems with BBEdit. Since the temp file came into and out of existence every few seconds, BBEdit's project view would dutifully redraw the project file list twice in quick succession, wasting quite a bit of CPU power1, and sometimes causing my MacBook's fan to spin up. Now that the temp file is gone, that problem is too.

1. I'm not sure why BBEdit needs so much power to redraw the project list and I've reached out to their support. Hopefully they can resolve the issue. At the very least my script is a little more well behaved so it's no longer an issue.

todolist →

Mini-Rant About Documentation

Posted on Mon, 09 Oct 2017 at 03:31 PM

I want to talk about documentation. iOS1, Nginx, Python, DRF, Django, Celery, and Postgres have excellent documentation, but documentation only helps when your question is "How does this thing work and what does it do?" Documentation, at least code level docs, are useless when it comes to figuring out what you need in the first place. Celery can tell you how to use Celery, but it isn't as great at telling you why you might need it. I've become convinced that user guides are as, if not more, important than code level documentation, and we as a community need more of them.

1. For their credit, iOS and really all of Apple's developer resources have excellent user guides that explain not only how to use a thing, but why and where you might need it (thinking about it, this could be because iOS and macOS have been around long enough to develop these kinds of docs).


Posted on Thu, 28 Sep 2017 at 02:28 PM

I've talked before about how I use TODO comments in my code to lay out what I want to do before actually doing it. To help me keep track of all of these TODOs in my code I wrote a little script yesterday and I've put it on Github for anyone who's interested.

The script looks through all of the code (by default Python code) in a given destination directory, greps for the TODO comments, and prints them nicely in a constantly updated list in the terminal. The output looks like this:

Todolist Terminal Window

Writing this script I learned a couple of new things about terminal commands like how to clear the screen without deleting the scrollback or just printing newlines (i.e. what clear does). I've put the script in my /usr/local/bin and called it todolist so now I can invoke it from anywhere and get a nice little list of what I've put off working on.

todolist on Github →

Accidental DevOps

Posted on Tue, 26 Sep 2017 at 12:30 PM

Since I became a developer, I've always worked on small (3-5) or single-person teams. Even at my current job, I'm the the lead and only full-time developer. In more recent projects (including Adventurer's Codex) this means that I'm the DevOps guy and System Admin as well. I'm by no means an expert in either, but I can do both.

I started learning how to manage and administer servers when I started this site back in 2012. Back then I never thought that all of those hours spent configuring Apache and PHP would lead to anything, but those countless hours of frustration taught me the basics. Fast-forward 5 years and I'm developing three major projects (two unannounced) and I'm DevOps and SysAdmin for all three. It's crazy to think about.

I'd highly recommend any new developers to follow the same general path I did: start a project or blog and learn to deploy it yourself. I started with a cheap old-style webhost and FTP, and slowly moved to managing the whole stack on Linode. I'm using Docker on new projects, and for now, I'm scripting my own deploys (though this could change soon if I migrate one project to Ansible).

As developers it's sometimes easy to forget that we write software that actually runs on some actual hardware in some actual datacenter somewhere. Knowing how to do many of the things that DevOps and SysAdmins do will not only make you a better developer, it gives you the ability to do more on your own. You often don't need tons of layers of software to deploy yours if you know how to do it from the ground up (especially if it's a smaller project). Those tools make it easier sure, but they're not required.



Creative Commons License