The Highly Experimental Blog of Erik Rose

An Open Letter to Subscription-Software Authors

Wed 16 August 2017

Otherwise upstanding software companies are moving toward rental models. This saddens and disturbs me. While perhaps lucrative in the short term, this business model deceives users about the nature of software and, long-term, imperils the future of quality products by throwing the incentives of maker and user into disalignment.

First, let’s cast a straw man aside: if what users are really paying for is the time-proportional maintenance cost of contracts (Netflix) or infrastructure (Dropbox), we don’t have a problem. If Dropbox or Netflix’s servers go away, their apps are immediately useless, and those servers are physical objects which fall apart at a rate proportionate to the passage of time. But bytes on a disk don’t need a continual infusion of money, and pretending otherwise is simply dishonest.

Now let’s examine some of common justifications for the rental model.

It’s easier for our users. They now get the app on all platforms “for free”. In reality, you had to develop the app separately for each platform, so it makes sense for the purchase price to correlate with the amount of your effort the customer is enjoying. Proportionality makes markets efficient. Not selling enough copies on the Amiga? The Amiga version goes away, and neither you nor the non-Amiga user has to pay for it anymore. This also leaves room for someone who can more efficiently maintain an Amiga version to enter the market. Further, I’d like to meet any user who thinks paying more for software is “easier”.

It takes money to keep developing features. Of course! And people have been paying for features—in proportion to their utility—for decades. However, you now seek guaranteed revenue regardless of whether you deliver value. Is it not fairer to offer something for sale and let users decide, version by version, whether it is worth their dollar?

We want to stay in business to keep offering good software for you. That is a fine goal, and I hope you succeed. However, your inability to save money toward future development is neither your users’ fault, nor can you ethically make it their problem. Why should the software business be immune from risk? It already enjoys zero marginal cost. Is it too much to expect you to invest your revenue wisely?

It takes money to keep up with fixes and OS updates. Users should rightly pay for those. We as software makers need to get over our guilt at charging for maintenance; it’s where most development effort goes, after all. If users feel they have a moral right to perpetual compatiblity updates, they need to be educated out of their naivete. No one expects their faucet manufacturer to come to their house to replace O-rings forever, and software’s more abstract nature changes nothing. But, again, this is something for users to decide case by case. If they wish to remain on an old OS with old software, that is their choice: the sheer passage of time while bytes on disk remain unchanged costs no money, and we should not hoodwink people into believing that it does.

A major impetus behind this reality-denying movement is the App Store model, in which users pay once and then enjoy free updates forever. It is short-term thinking. Clearly, in a finite world, depending solely on new user acquisition for funding is infeasible, as some would-be rentiers point out. However, the most market-efficient model is the one that has worked well for half a century:

You make an update and offer it for sale.
Existing users decide whether that update is worth their money, whether for new features or for fixes.
Either it sells, and you make a profit, or it does not, and you have a clear signal from the market to change something.

The elegant coupling between money made and value delivered rewards good software and punishes bad.

But what should sellers do who are captive to an app store? The least-bad alternative is to occasionally make a clean break: publish Foo Version 2, and charge for it again. Be upfront with your users that this is your model so everyone can make informed decisions. On iOS, Apple now supports sharing storage across a single vendor’s apps, so migrating data to the new version can be transparent.

Telling is the success of agile upstarts like the makers of Affinity Photo. They are vacuuming up the users who scoff at perpetually renting incumbents like Photoshop, proclaiming “No subscription” right on their front page. I look forward to giving them my dollars, not just in this initial loss-leading phase but as they ultimately fall back to a charge-for-upgrades model. And if they instead choose to rent, I look forward to offering my dollars to the next company who is willing to honestly earn them.

Comments

Privacy: An Operational Definition

Mon 22 February 2016

I value privacy. I inconvenience myself daily to preserve mine, keeping my cell radio off, using encryption, and distrusting other people's servers. But, when I recently sat down to think about a project that had privacy implications, I realized I was missing an operational definition of the word. While I could tick off examples of violations, the generalization was missing. So here is my attempt to rectify that.

At its root, privacy is the ability to keep “secrets”. Secrets aren't necessarily nefarious ones, for our purposes here; they are usually prosaic, like those we protect by closing the bathroom or bedroom door. Here are some other kinds of potential secrets:

Your location
Your activities
Your communications
Your associations with others
Your health
Your plans for the future
Your stored data or keys to unlock it
Your opinions or beliefs
Generalizations about any of these (e.g. metadata)

Some people hold certain of these more dear than others. Most will disclose them in some situations and withhold them in others. The above list isn't comprehensive, but it shows the breadth of the possibility space.

We can then define privacy in terms of those “secrets”:

Privacy is being secure against...

The forced appropriation or sharing of “secrets”
Storage of secrets longer than is necessary to achieve an agreed-upon transaction

“Forced” goes beyond mere physical force or legal compulsion. It also includes any coercive effect achieved by an imbalance of clout, time, or expertise. For example, force includes making ordinary activities contingent on (or unnecessarily annoying apart from) the disclosure or storage of secrets: things like driving (under the watchful eyes of license plate scanners), talking on the phone (leaving logs that persist long after the bill is paid), using a computer program (which phones home with your habits), using a mapping app (which leaks your location), buying or selling, reading a document, or borrowing a book. When you visit a web page containing a Facebook Like button, not realizing that the mere presence of the button tells Facebook what page you're on, force has been employed by means of an imbalance of expertise. Force is also commonly effected through confusion, as through unclear laws, confusing privacy UIs, or promiscuous defaults.

Concretely, then, why do I want privacy? What do we lose if we give it up?

Violations of privacy facilitate criminal prosecution of ordinary behavior. Civil-liberties lawyer Harvey Silverglate estimates you commit 3 felonies a day. Having everyone be “a criminal on paper” is a great tool for oppressive regimes.
Stored secrets routinely leak to unintended recipients. In 2015, there was a smorgasbord of high-profile leaks, like Target (which leaked banking info), Premera (medical claim info), the IRS (tax records), and mSpy (chat logs). Storage and sharing are really equivalent, as time goes to infinity; computers are blabbermouths.
Leaks create chilling effects on behavior and discussion, public and private. This inhibits democracy, since it multiplies the risks of researching or debating unpopular ideas. Sometimes risky ideas are good ones: Alexander Hamilton and James Madison had to adopt pseudonyms to discuss their politics in print.
Privacy promotes peace in a diverse population, since without it you cannot selectively withhold your religious, political, or romantic preferences from those who might take offense.
Privacy is necessary to the development of the self, allowing one to experiment with different ideas and behaviors while constraining social costs to a manageable scope.

Clearly, privacy is not just a fuzzy feeling but rather a real thing with concrete consequences. Is my cautious value system the right one? Will it become less necessary as laws and social etiquette evolve to either defend the privacy of individuals—or, perhaps, just cut them some slack? Or is this going to be a drawn-out technical war with safe languages, onion routing, and endpoint security playing major parts? Those are questions for another post, but a rigorous discussion of any of them will be well-supported by the foundation of a good definition.

Comments

Why Types Are Not Documentation

Tue 16 June 2015

Type annotations have been oozing into Python and getting all over everything this week. One of my open-source libs is getting a big update, and we’re playing with Sphinx’s type hints for arguments. Elsewhere, friends on Twitter question the need for comprehensive documentation in general, proposing to hoist much of it up into type systems. And I continue to tinker away on an experimental new language, which prods me to think about types as both an implementor and a prospective user.

I keep returning to 3 distinct things that often get mixed up:

Constraints
Behavior
and Intent

With these ideas in mind, the roles and limitations of types, tests, and documentation become clear.

The Coming Naming Mess

First, how will explicit constraints, in the form of type annotations, change the Python idiom?

Any programming community is the sum of its language’s specification and its accumulated usage patterns. You can derive one from the other no more than you can reconstitute the English vulgate from a dictionary. As type annotations gradually infiltrate Python culture, I feel myself pulling back. This isn’t because it’s bad to let machines reason about constraints; they love that stuff, and it would be nice to catch some mistakes more automatically. But type information is already woven into the Python idiom, and simply slapping the type fish on top makes for an unappetizing concoction.

Specifically, Python nouns—like the function arguments that type annotations describe—idiomatically include both semantic information and structural hints.

def frob(file_path, should_flush):
    ...

In the above, file_path is clearly a string, and should_flush is obviously a boolean. Just adding type annotations does not improve clarity at all; in fact, it adds noise for the native reader:

def frob(file_path: str, should_flush: bool):
    ...

Note the redundancy—not even beneficial by dint of being separated in space—of “path” and “str”, “should” and “bool”.

To add type annotations without sacrificing readability, we must revise our idiom, lifting out the type information rather than just repeating it:

def frob(file: str, flush: bool):
    ...

This is fine in isolation. We did lose some resolution as “path” became “str”, but in this case it isn’t too bad. However, think of what happens as codebases collide: mixtures of Python 2 with Sphinx type hints, pure Python 3 with annotations, and code that targets 2 and 3. Naming conventions will necessarily be either redundant on 3, uninformatively terse on 2, or an inconsistent mixture. I don’t look forward to that transitional period. And given that we’re already 7 years into the adoption of Python 3, it may be a very long one. Indeed, we can't go around blithely renaming kwargs (lest we break callsites), so it may be permanent.

Can We Extract Intent From Constraints?

The siren song of type systems is that a sufficiently advanced one can substitute for documentation. This, in my experience, is true of only the most mechanical statements: this arg is a string, only one worker can access this value at a time, this function performs IO. I have yet to encounter generated documentation that makes newcomers shriek with joy at its elysian clarity. Readers of C++ or Java-derived autodocs—even on the level of individual subroutines—generally shriek with other emotions entirely.

The reason is that types—at least as we have them today—are only constraints. They specify invariants that hold as we move through a program, and, from a human perspective, they tend to be fairly low-level trivialities. (In fact, “type” is an unfortunate word, a carryover from early languages where all they guarded was the difference between floats and chars. We could more usefully call them “invariants” if that weren’t already taken.) Types convey structure and allow us to mechanically enforce rules for keeping it intact. But structure alone is not enough to convey meaning.

For example, what can you know about this well-typed function?

def f(x: str, y: int) -> str:
    ...

It has some constraints in effect, but, as long as it meets those, it can do anything, even just returning "foo" all the time. It could be a string-repetition function that prints its first argument a certain number of times, but we can’t deduce that from only the types. Let’s fill in an implementation and see if that helps:

def f(x: str, y: int) -> str:
    return '{}#{}'.format(x, y)

Now our function has behavior. It’s clear what it does, but it could still mean a great many things; we could not yet write a test for it. What is it trying to do? Is it succeeding in it? Let’s add naming and see intent begin to filter in:

def source_url_with_line_anchor(path: str, line: int) -> str:
    return '{}#{}'.format(path, line)

A lot of meaning comes flying out of the names, basically for free, just by choosing good ones rather than bad. But we’re still not quite done.

def source_url_with_line_anchor(path: str, line: int) -> str:
    """Return a URL to a source code file at a given line.

    :arg path: The checkout-relative path to the source file
    :arg line: The line number to point to

    """
    return '{}#{}'.format(path, line)

With actual documentation in place, the intent—not just the actual behavior—of the function finally becomes clear. At this point, we could write tests, whose purpose is always to map intents to (testable) behaviors. We would notice that the function is buggy: it should URL-escape any weird chars that make their way into path. Of course, we could define a Url class and lift some of the documented intent up into the type annotations, but that gets us only a baby step closer: there’s still no telling which URL we intend to return. If type systems went that far, we would have no need of any other facet of programming. Though, at that point, I suspect the required constraint declarations would look a lot like Prolog and be so incomprensible as to send us scurrying back to handcrafted documentation. We’ve been playing with constraint solvers for over 40 years, and even the most mature ones are not shy about expressing intent in English. For that matter, it’s been shown that C++’s template system is Turing-complete, but I wouldn’t want to trade my docs for it.

It’s intent that humans need in order to understand a system. We grope for intent even when it’s not there, in all kinds of complex systems: casting viruses as wanting to replicate and gasses as wanting to bubble up out of solution. Perhaps we all make the best use of our wetware by overzealously anthropomorphizing everything. But as programmers, we should recognize that constraints are for proof machines, behaviors are for tests, and intents are for each other.

Comments

On Interviewing

Wed 17 July 2013

I got a mail the other day from a co-worker who asked, essentially, “Do you have tips for interviewing senior developer candidates?” I accidentally a whole long thing, so here it is in its entirety, with names changed because I really like the name “Zeke” and don’t get to use it much.

Hi, Zeke. It was interesting to see the other day that Google analyzed their interview practices and found exactly zero correlation between interview performance and performance on the job. So don’t do what they do. :-) Typically, they ask some really tricky questions—it often seems like the interviewers are using the interview as an opportunity to show how clever they are.

I can’t tell you if my method works, either, but I’ve had very few regrets in hiring. In short, I interview for the basics: I don’t care if you can balance a red-black tree, because nobody ever does that in our work, and you can always look it up. I just want to know if you can rub two lines of code together (fizzbuzz, which I do during initial screens), pick some sane tests (usually I make them write tests for fizzbuzz, which makes them refactor it to use generators or whatever), and understand the rough performance characteristics of a dict. Anything else is a bonus. I guess my overarching criterion is “Would I let this person loose on my project?”

My goal during an interview is be convinced that the candidate is awesome, and I act as a facilitator to help him or her convince me. The best sort of interview is the sort I had with you. I saw an interesting project on your resume (your web-based SNES emulator) and just let you expound on it for half an hour, asking probing questions here and there. In the course of that conversation, you made clear that you understood hash tables, client/server design, and even hardware architecture. That was really ideal.

The worst sort of interview is the quiz-show sort, which, in its purest form, can feel adversarial and artificial. This sort even offends some very senior candidates. (Though I’ve had some “senior” DBA candidates be offended at simple SQL questions and then utterly fail them, so don’t take offense as a proxy for expertise.) I’m sorry to say that about two thirds of my interviews do degrade into a quiz show, and that generally indicates a lack of preparation on my part. When I have time, I Google-stalk the hell out of the candidate and try to find at least one relatively complex project of theirs to pick on, like your SNES thing. The earlier in the interview process somebody thinks to ask “What’s a piece of code that you’re proud of?”, the easier this is. Ask it in a screen, and it can inform all subsequent interviews.

Here are the areas I like to cover over a series of interviews. Pass all or most of these, and I am happy to let you loose on my project:

Basic coding (fizzbuzz, refactoring). This focuses hard on the ability to put things into good categories: do they split up code into functions that represent compact concepts so they’re easy to understand and reusable? Do they pick good names?
Testing sanity. Do they write the minimal number of tests to find the likely bugs, or are they just all MOAR TESTS ARE BETTER BWAAAAHHHH!?
Ability to navigate the open-source world. Can they write so they can be understood on a mailing list, IRC, or a wiki page? Can they take feedback and give it? This is an easy “pass” if they have a history of open-source contirbutions, but not everyone has that luxury.
Multi-machine architecture. Can they put memcache in the right place on a block diagram? Know what the plusses and minuses of DB replication are? Have some idea when and how to index something? This doesn’t require large-volume experience—I myself just had a history of crappy servers that couldn’t keep up with load.
Have heard of the web and grok request/response
Have some passion and opinions. I don’t want a pushover who won’t add anything to decisionmaking. I sometimes try to start a fight about editors or languages or something to uncover this.

Then I like a candidate to have one or more areas of deep knowledge:

Fancy-pants algorithms. I’m into string algorithms right now. Somebody else might be into distributed data structures.
Intimate knowledge of some data store, like an RDBMS or HBase
Mad JS, DOM, or browser chops
In-depth knowledge of Django or some other framework we use

If they pass that first list and have one or more from the second, they’re a great addition to Mozilla IMHO: they can start making stuff almost immediately, and, even before they get to know their project, they can act as an advisor in their areas of expertise from the second list.

Hopefully this comes off as useful to somebody, even if it’s just the people I’ll interview in the future. At the very least, it made me get my thoughts about interviewing in order.

Comments

I Will Stab Password Management In The Face

Thu 11 April 2013

Today I made the monumental error of trying to authenticate myself to work.com. Half an hour, two email accounts, a pair of password databases, and a missing-in-action password-reset email later, I collapsed defeated. I will be emailing my quarterly feedback to my colleagues instead. I tell you, password management needs to be killed with a knife.

This is out of control.

I grant that passwords are an insufficiently strong form of authentication. In time, Persona might wipe them all—or at least the web-based ones—off the face of the earth, putting all our eggs in the baskets of our email providers (which has its own problems). But for now, I've got more password databases than a reasonable person might have passwords, and it's getting worse by the day. This moment, I have before me 3 databases: a commercial one containing 236 entries; a Mac Keychain with 458; and Firefox's, with another 270. This is only on a single computer, mind you; there are various half-synced, slowly diverging copies scattered about on other devices, and it is rare that a password pulled arbitrarily from any one of these works on the first try. The Keychain generously offers 6 or 7 duplicates for many accounts, and I have to manually scan the mod dates to have any hope of success. This is beyond insane.

How did we get into this mess?

The ideal place for password management is, of course, in the OS. There, it can…

Work across applications and protocols
Know when to evict encryption keys from RAM
Track which applications have access to which entries and guarantee the process hasn't been tampered with since access was granted

It also doesn't hurt to have the motivational nudge of the API provider behind a password storage standard.

Though Apple made a promising start with its Keychain, it has since squandered every inch of its lead in this space. Mobile Me provided Keychain syncing across machines (though of dubious accuracy and at an additional $100 per year). iCloud provides none. Older versions of 1Password stored their data in the Keychain's single-file format, which made syncing treacherous. Newer ones abandon it for a custom, multi-file format which can be synced more easily. The rats have left the ship, and my Keychain entries are noticeably moister each time I use them.

It is a shame that Apple has no apparent interest in bringing the Keychain up to snuff, as nothing else provides the smooth integration made possible by its privileged place in the OS. Firefox Sync, LastPass, and Persona are all limited to web-based passwords. And more general databases like 1Password and PasswordWallet are still cumbersome in their ability to remember and auto-fill non-web credentials: ssh, SFTP, wifi, mail, calendar, and encypted disk images. The world is more than the web.

The cross-platform curse

Because every third-party tool is decidedly more at home on one OS than the others, we end up in the insufferable position of being unable to edit our credentials on one device or another:

1Password has been promising edit support on Android "real soon now" for years.
PasswordWallet has great iOS sync but none on Android.
KeePassX and LastPass take the diplomatic tack of looking and acting equally abhorrent on all OSes.

The way out

Since we clearly cannot rely on any single party to write a password manager to meet all needs…

Web and non-web
Syncing
First-class support on all platforms and devices

…the obvious answer is a standardized file format we can all share. The Agile Keychain is a pretty darn good swat in this direction. It's just JSON, encrypted with 128-bit AES. It stores one entry per file, so you can use simple tools like rsync to synchronize it. Nobody's claiming any patents on it, and Agile Bits has published a rather nice sketch of the spec.

Why aren't we keeping all our passwords in this?

Comments

What They Don’t Tell You Before You Move To Silicon Valley

Wed 27 March 2013

When you move to the Bay Area, people tell you “Rent is high!” and “Public transit exists!” But they don’t think to mention a lot of the day-to-day differences that take newcomers by surprise. I’ve been here a few years now, but some of the more striking ones are still fresh in my mind:

You have to shove gasoline nozzles in really hard, or gas won’t dispense. This is important to figure out quickly. I visited about five different stations and nearly emptied my tank before I realized the pumps weren’t broken—merely guarding against escaping vapors.
Most local shops welcome dogs. They indicate it by putting a water bowl out front.
Surfer speak like “right on” is unsurprising, but there’s also a trademark, rapid-fire “yeahyeah”, used to indicate casual agreement. I like it.
Cars don’t require yearly inspection, but you do need a “smog check” every two years. California doesn’t care if your brakes work or your wheels stay on, as long as your emissions are clean.
There are Bizarro World versions of various food products. Hellman’s mayonnaise doesn’t exist, but Best Value comes in a cosmetically identical jar. Twizzlers are replaced by Red Vines, a slightly waxier alternative. And when you say “skim milk”, you get quizzical looks. Ask for “fat-free”.
Winters are mild—you’ll love it—but many trees still shed their leaves. However, they simply turn brown and fall off; there’s no pyrotechnic detour through reds or yellows for any but a few. However, if you ask your native coworkers, they will wax poetic about the fall colors and tell you the exact coordinates of the one red tree in town.
Turn signals don’t work. My theory is that the lingering Reality Distortion Field bleeds the charge right out of the relays.

What about my fellow transplants? Got any more?

Comments

Function Names: To Verb Or Not To Verb?

Wed 20 March 2013

I thoroughly enjoyed Brandon Rhodes’ PyCon talk, The Naming of Ducks, at which he presented and justified a panoply of Python naming best practices, far beyond the mere spelling rules of PEP 8. Among his recommendations was a common one: name your functions as if they were verbs so you know what they do. create_database, polish_silverware, televise_revolution—all make fine function names, or so goes the conventional wisdom.

Except that I’ve almost entirely dropped this practice in the last few years.

My motivation has been simple and pragmatic: verbing a function means obscuring what it returns. What does create_database return? A connection object? The name of the database? Nothing at all? So, instead, I’ve actually taken to nouning my functions, as I care much less about what they’re doing—that’s an implementation detail I’m happy to abstract away—than what they return, which is the immediate concern at the call site. This also fits nicely into my style, which tends toward the functional:

(csv_file_reader(f) for f in files_from_zip(zip_path))

Oddly enough, function-nouning forms a neat corollary to one of Brandon’s other guidelines: naming variables to make clear their types. I’m a big fan of this: name an arg connection_seq or connection_map, and callers can probably guess what kind of data structure to pass in without reading any docs. Combine this with function-nouning, and you make two things automatically clear:

The nature of the return value
The information that factors into the function’s result—care of the nicely-named args you pass in

Thus, the input is clear, and the output is clear. In fact, the only thing that’s left a mystery is the process by which the function effects the input-to-output transformation, and I contend that’s of secondary importance: most of the time, I’m content just to have the output I need and don’t bother about the implementation. Isn’t that the entire point of abstracting functions to begin with?

What do you think? To verb or not to verb? Perhaps a more imperative style changes the tradeoffs, or maybe there are practical advantages in verbing that I’m not thinking of.

Comments

How to Really Reduce Greenhouse Gas Emissions

Mon 11 March 2013

My friend Fred recently got me thinking about environmental stewardship programs. Like Fred, I’m skeptical of a lot of consumer-targeting “be green” campaigns. As with hotels’ pleas to wash your towels less frequently, they are often corporate profit-boosting measures in altruistic clothing. No one ever offered me a room rate discount for using my towels a bit longer.

More generally, the “ride your bicycle” and “buy cloth shopping bags” crusades that dominate public consciousness are optimizing in the wrong places. In 2009, 16 ships burning essentially tar released as much atmospheric sulfur (the stuff that causes acid rain) as all the cars in the world.

As another example, about a quarter of greenhouse gas emissions are from electricity generation—26% in 2004. The U.S. is still 42% reliant on coal, the dirtiest fuel in common use. Coal releases over twice as much carbon dioxide as the next dirtiest fuel per kWH generated, natural gas. So, what we should be doing is running full-speed toward safe, clean nuclear. Even old generation-II reactors are 62 times less carbon-intense than coal. Modern generation-III reactors may be even lower. The entire transportation sector—including commercial interests driving big, dirty trucks—emits only 13% of greenhouse gases, and fixing that requires energy storage technology we don’t have, along with the coordination of millions of individuals and companies to purchase new equipment. This is not where to optimize.

While carting around reusable shopping bags and driving electric cars makes us feel good, it’s an awfully inefficient way to attack the problem. If you want to conserve the environment, write to your legislators instead. Campaign for cleaner shipping fuel standards. And welcome safe, modern nuclear plants into your back yard.

Comments