BiteofanApple Archive About Code Twitter
by Brian Schrader

WWDC and Open Source Swift

Posted on Tue, 09 Jun 2015

I'm pretty happy with yesterday's announcement (excluding the "One more thing..." part). OS X got a very understated release, but that's because, I suspect, that it seems to be mostly bug fixes, and that's great news. I'm mostly excited about the bug fixes and performance enhancements, but there were some solid updates to Safari, and Maps as well (transit isn't available for San Diego yet, but hopefully it will be soon).

On the iOS front, the iPad got all the love, and that's long overdue. The "new" (Surface Pro style) multitasking is nice, and the popovers look cool (Netflix and Twitter at the same time!). I'm also a fan of the new battery improvements and the Spotlight search API (Finally).

To me though, the biggest news of the whole conference was the announcement that Swift will be Open Source1. For Apple, this means that Swift will be placed alongside Go and Rust in the new, hot systems programming language category. Chris Latner expressed his desire for Swift to be open source last year at WWDC, and I'm glad he was able to convince the higher ups. Making Swift open source allows it to be a real alternative (and competitor) to Rust and Go, and I can't wait to see what people do with it. Swift interoperability with Python is also something I look forward to hearing about since Python already interfaces with Rust quite nicely (examples). Using Swift, Rust, or Go as your low level language instead of C has a lot of advantages, and its great to see Apple keep pace with the outside world. More competition and choice in the systems programming language world is always good, and there will shortly be 3 great options to choose from.

1. Yes I'm aware that Apple said, "later this year." Yes, I am also aware that FaceTime was supposed to be an Open Standard. I contrast with this: ResearchKit is on GitHub; this is a new Apple.

Software 'Engineering'

Posted on Mon, 25 May 2015

Ben Adida:

...most people have a pretty good idea of the trust they're placing in their doctor, while they have almost no idea that every time they install an app, enter some personal data, or share a private thought in a private electronic conversation, they're trusting a set of software engineers who have very little in the form of ethical guidelines.

Where's our Hippocratic Oath, our "First, Do No Harm?"

I've talked about this before. Software Engineering is unlike any other field that calls itself 'Engineering'. Unlike licensed engineers in other fields, there's no code that binds Software Engineers to a strict set of ethics, and no state certification to revoke when Software Engineers are found malpracticing. This kind of lax attitude toward development has indeed helped the industry boom the way it has, but success hides problems.

It's shocking really, coming from an education in Aerospace Engineering, the lack of precautions that a lot of software engineers take when designing their systems. In school, from the beginning, we had concepts like "factors of safety", "margins of error", and "graceful failure" drilled into us. Included in our final project —creating a design for a supersonic business jet— was the requirement that it be able to complete its entire mission with One Engine Inoperable (OEI). If one of the two engines burst into flames at takeoff, the plane still had to be able to fly. It's these kinds of regulations (FAA in this case) that enable commercial airlines to have the amazing safety record they do; people's lives are at stake. A professor one told my class, "These calculations are important. If your numbers aren't right, someone could die."

Personal data is extremely valuable, and precious to the person it belongs to. Although their life may not be at stake, their finances, their livelihood, and their personal affairs might be. Software has reached a level of pervasiveness (and arguably did so years ago) that an engineer's decision (or lack thereof) can affect millions of people. If any other branch of Engineering tried to design and build a something that would affect that many people (their data or their wellbeing) you'd bet they'd be licensed.

the responsibility we have as software engineers →

Python multiprocessing and unittest

Posted on Tue, 28 Apr 2015

I've been having an issue with unit testing Microblogger when my tests need to use Python's multiprocessing module. I've been looking at this code for days now and I can't seem to find the bug. I'm hoping that by writing down my thoughts here, I can think through the problem.

Basically, the test is trying to verify that a User object can be created with information from a remote XML feed. The test gives the User module a URL and tells it to fetch all information at that resource.

    def test_cache_user(self):
        user = User(remote_url='http://microblog.brianschrader.com/feed')
        user.cache_user()
        self.assertEqual(user._status, dl.CACHED)
        self.assertEqual(user.username, 'sonicrocketman')
 

The cache_user method starts up a crawler to go out and parse the contents of the URL provided.

    def cache_users(users):
        ...
        from crawler.crawler import OnDemandCrawler
        remote_links = [user._feed_url for user in users]
        user_dicts = OnDemandCrawler().get_user_info(remote_links)
        ...

Everything is ok still. Inside that OnDemandCrawler().get_user_info() method, the OnDemandCrawler crawls the URL given and then calls self.on_finish(). This is when things get funky.

    def on_finish(self):
        self.stop(now=True)

The stop command tells the crawler to shut down, the now keyword just tells it to force stop the crawling process and don't wait to cleanly exit.

If we look at the source to the microblogcrawler (v1.4.1) we see that stop does the following:

    def stop(self, now=False):
        ...
        if now:
            # Try to close the crawler and if it fails,
            # then ignore the error. This is a known issue
            # with Python multiprocessing.
            try:
                self._stop_crawling = True
                self._pool.close()
                self._pool.join()
            except:
                pass
        ...

The curious part is that self._stop_crawling = True part. In the tests for the microblogcrawler both forcing the crawler to stop and normally stopping it work fine. The issue arises when trying to stop them in a unit test. For some reason the crawler doesn't stop.

Here's a sample crawler and the output it produces when run as a unit test:

    class SomeCrawler(FeedCrawler):
        def on_start(self):
            print 'Starting up...' + str(self._stop_crawling)
        def on_finish(self):
            print 'Finishing up...' + str(self._stop_crawling)
            self.stop()
            print 'Should be done now...' + str(self._stop_crawling)

>>> python -m crawler_test
>>> Starting up...False        # Correct
>>> Finishing up...False       # Correct
>>> Should be done now...True  # Correct
>>> Starting up...False        # lolwut?

For some reason the crawler isn't receiving the signal to stop. Looking at it from my Activity Monitor it appears to stop (the 4 worker threads are closed), but then the crawler creates 4 new worker threads and does it all over again.

The last step of this process is inside the crawler itself. The crawling process is controlled by the self._stop_crawling attribute:

    def _do_crawl(self):
        ...
        # Start crawling.
        while not self._stop_crawling:
            # Do work...
            ...
            self.on_finish()

From this code, if the _stop_crawling attribute is set to True, then the crawler should finish the round it's on and close down, but the value of the attribute doesn't seem to be sticking when it's assigned in the stop method above.

If anyone has any ideas as to what the issue could be, I'd love to hear them. I'm pretty much out of ideas now. As I said before, the tests in the microblog crawler (which are not unit tests) work fine. The issue only comes up when running a test suite through unittest itself.

Microblog Crawler v1.4(.1)

Posted on Sat, 25 Apr 2015

Version 1.4.1 of my MicroblogCrawler is out on PyPi! Technically v1.4 was out last week but it had a fairly large bug that needed fixing. 1.4.1 has patched it and it's ready for prime time.

v1.4.1 is full of enhancements, a few of which are listed here:

  • Calling stop now actually stops the crawler. This bug was due to a nasty bug in Python's multiprocessing module (9400). The crawler now alerts you when such a problem arises by outputting it through the on_error callback.
  • Fixed a bug that would cause feeds to throw errors if no pubdate element was found. Elements are not parsed but are discarded, and on_error is called.
  • Fixed a major bug when attempting to stop the crawler immediately.

The full version notes are available here.

The major enhancement in this version (besides the graceful exiting) was the addition of a workaround for a bug in Python's multiprocessing module. The bug has to do with what happens to exceptions raised in child processes. When they are raised, they are pickled and sent back to the parent process. The problem arises when an exception is not pickleable. The child process hangs and never exits. The interesting thing is that the bug was first reported in 2010 and affects all versions of Python since 2010 (i.e. 2.7, 3.2, 3.3, 3.4). This bug has been baffling me since I started converting the crawler to be multiprocessed, and its nice to finally have a workaround.

If anyone out there is using MicroblogCrawler, I'd love to hear from you, and pull requests are very welcome!

PEP 484 - Type Hints for Python

Posted on Wed, 08 Apr 2015

Guido van Rossum:

I'm for any improvements that will help my favorite language run smoother, with fewer errors, and maybe faster someday*.

This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and performance optimizations utilizing type information. Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring.

There's been a big push for better static analysis in Python for the last few years, and there've been attempts like this before (see Cython) and having a language level standard for Type Hints would bring the benefits to all the various Python implementations.

# An example of the proposed type hint syntax.
def greeting(name: str) -> str:
    return 'Hello ' + name

I admit, the new syntax looks very Rust/Swift-like, and that's probably by design. One thing that worries me, and which isn't obvious from that code sample is that Python Type Hints will (must) include generics and blocks (i.e. lambdas, closures, etc). When those get into the mix, the Type Hint system starts to look a little messy.

from typing import Mapping, Set

def notify_by_email(employees: Set[Employee], overrides: Mapping[str, str]) -> None: ...

Even though that code isn't particularly pretty, the Type Hints can help the static analyzer find errors that could potentially be very hard to track down. As I said, I'm completely in favor of this addition to the Python syntax.

As a final note, for those of you worried that Python might be changing to a statically typed language, fear not.

It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

PEP 484 - Type Hints →


* According to the PEP, the goal of Type Hints will not be performance based, but they do go on to say, "Using type hints for performance optimizations is left as an exercise for the reader," which keeps me hopeful that PyPy or maybe even CPython could use them for that purpose as an added benefit.

Patreon

Posted on Tue, 31 Mar 2015

This week I took to signing up for a Patreon account and finally supporting my favorite video producers. Its exciting to see them getting due payment for what they create, and Patreon makes it really easy to be a patron, and the benefits are awesome.

One of my favorite series Extra Credits, a normally video games oriented series, create a mini-series a year ago called Extra History, where they told the story of the Punic Wars in their typical educational fashion. That mini-series is easily in my top 5 of the videos they've ever made, but they couldn't justify continuing the mini-series because their main sponsor was a gaming magazine. With Patreon, that has changed. Direct funding from their viewers means there's enough interest in Extra History to justify more and more videos, and its great.

With direct support supplementing ad sales, creators can make judgements based on interest instead of solely on popularity. This means they can make more of the kinds videos that they want to make, instead of what will just be popular. If you haven't already, take a look at Patreon. The amounts you can sign up for are trivial, but they make a difference.

Archive

Subscribe to the RSS Feed. Check out my code on GitHub
Creative Commons License
BiteofanApple is licensed under a Creative Commons Attribution 4.0 International License.