Releasing code is my thing…I used1 to ship Mac OS X updates at Apple, Firefox updates at Mozilla, the web frontend at Facebook, and many mobile apps at Facebook. Because of those experiences I’ve seen a lot of what works and what doesn’t, which platforms do things better, and tweaks that could improve the lives of developers shipping software.
Lately I’ve been giving talks about how to ship mobile apps at scale for conferences and startups2. While many of the experiences and challenges for popular apps like Facebook are unique, it is becoming clear that a lot of the experience shipping apps is universal. I decided to write down some of the current frustrations and suggested changes, starting with how I would change the App Store. It got so long that I broke it into a series of blog posts.
Note: I am sure there are other issues that can be fixed in the app store, particularly related to discovery and monetization. I know nothing about those, as apps I worked on had brand recognition with millions of users and were free.
Many of these suggestions I have already given directly to the TestFlight / iTunes Connect team3, though some are new. I’ve ordered my suggestions from what I think would be the highest impact to lowest impact. This is the second suggestion, the first was to get rid of app review in some situations.
👥 Raise the beta tester limit
While Apple really should get rid of beta app review, there is another large beta pain point affecting virtually every successful app developer–limited testers.
Apple currently allows iOS developers to ship pre-release apps to 2,000 beta testers via TestFlight. This is not enough testers to validate apps broadly at scale.
It is impossible for 2,000 testers to cover all the combinations of OS versions, networks, devices, and usage patterns. Even if those factors do not affect your particular app, the inherent statistical properties of finding issues in a population point to the need for a larger testing audience.4
Facebook itself has over 11,000 employees so adding 2,000 external testers doesn’t provide value for an app that has millions of users. Even for smaller apps and companies, with the prevalence of A/B testing and feature flags you often need to split your beta audience into 2n groups to test the combinations together. This is virtually impossible with 2,000 testers.
Google Play has no such limit and leaves it up to the app developer to determine the optimal number of testers necessary to ensure production quality. Facebook has over 2,000,000 daily beta testers on Android making sure all the various devices, networks, languages, and features are tested and the app is ready for release. This large beta population allows Facebook to use beta as a predictor of production–the whole reason for having a beta and something that is impossible with 2,000 testers.
Every major app developer I have talked to thinks shipping on iOS is scary while shipping on Android is a breeze–largely because of beta audience size. There are more frequent “oh shit” moments and mad scrambles on iOS for bugs that slipped to production due to lack of a large beta audience. Of course, fixing those bugs quickly and pushing out a fix (after a perhaps lengthy review) is scary too and can cause even more issues. No one feels good or safe pushing to the App Store because they are shipping an app tested by at max 2,000 testers to (hopefully) millions of users–it’s always push the button and pray. The 2,000 beta tester limit is hurting users by allowing more bugs to slip through and causing unnecessary churn for app developers.
Someone at Apple told me that there is no technical reason why the limit is at 2,000. They said it is a policy that came down from on high. This policy appears to be created by someone who has never shipped an app and it needs to change.
Apple’s own iOS and Mac OS X beta testing programs have millions of testers, so they clearly recognize the benefits of having a larger-than-2,000 testing pool. Not allowing developers the same benefits is downright insulting.
Apple should change the TestFlight user limit to 2,000 testers or 5% of active installs, whichever is greater. This will allow app developers to test at the scale of their app without routing around app review.
Next I have some suggestions for how apps are deployed in the App Store.
- I am now doing my own stealth startup in this space. ↩
- If you want me to talk at yours, email me at my first name at my last name.com or contact me on Twitter @LegNeato. ↩
- If anyone at Apple needs more context or wants confidential information, people in the Program Office know how to get in touch with me. ↩
- As Philipp von Weitershausen pointed out on Facebook: “if you have an error that occurs for users, say, 5% of the time, and you want to make sure you catch it with 99% probability, you need log(1-.99)/log(1-.05) = 90 testers. Now imagine the error rate is much lower, and you have a bunch of those low-firing errors. And perhaps you want to catch more than 99% of them… You get into the 1,000s and 10,000s quickly.” ↩