Wednesday, August 26, 2015

How is testing (probably) done at Facebook ?

As a Social networking Platform, Facebook meets and sometimes exceeds the characteristics of a Platform such as achieving scale, morphing of product (adopt different shape at will like Transformers example), and extends new features (via exposed APIs), users, customers, embrace third-party collaboration.
Since architecturally, Facebook has reached at certain level of Platform maturity- it is interesting to view it as a case study in Software testing because the approaches they have used to test have eventually assisted to build Quality into the platform and have made the engineering teams productive.  One thing to note is that there is less information publically available on how Facebook does testing so the below data is more my inference from going through relevant contents (stated below). This data may or may not be true as it hasn't been validated officially by someone from Facebook, but still, do enjoy reading.
The data in the table below is organized from the below two sources-
http://www.stickyminds.com/interview/facebook-s-no-testing-department-approach-interview-simon-stewart  (represented in the black font in the Comments section of the table)
http://www.quora.com/Facebook-Engineering/How-is-software-testing-done-at-Facebook (represented in the orange font in the Comments section of the table)


CategoryFacebook's approachComments (These are the direct references from the quoted articles)
Independent Testing departmentNoFacebook's approach to Org design is different. Even though they are so heavily focused on Mobile development, they dont have a separate Mobile department. Below comment-
We don’t have a “mobile department” since we found that hard to scale appropriately. Instead, teams working on features, such as photos or the News Feed, own that feature on every platform we support, from the mobile web, “traditional” desktop browsers, through to mobile platforms such as iOS and Android.
Approach to Software releasesAgile, RapidThere are a number of different models for how to do software releases, no matter whether it’s on the web or an app to mobile, but they all play with three factors: time, features, and quality. Naturally, quality should always be pegged to “as excellent as possible,” so that leaves a choice between choosing to release when a suite of features is ready, or doing time-based releases.
The feature-based releases seem appealing on the surface, but prove problematic to deliver consistently. After all, when was the last time you saw every software project at a company meet its planned ship date with everything working as expected? So releases get held up as some features are finished before others, and sometimes features need to be bumped as priorities change.
All that means that we do time-based releases. Our release cycle has the app ready to ship every four weeks, though it might take longer than that to get into people’s hands because we also need to get into the app stores.
Anatomy of Facebook's testingLayered approach to testing,
Majorly automated
The improvements we’re making now may be less obvious, but we have automated tests which track things like power consumption, memory and CPU usage, and how we use bandwidth, the goal being to improve (or, at least, hold steady) all of those metrics with each release.

The key to this speed is automation. There’s just no time to do a full manual run through of every feature before a release. A traditional QA department, following scripts to verify that everything worked as it should, would dwarf our development team. Instead, we’ve placed layers of automated tests to ensure that regressions are as infrequent as possible.

In order to enable fast release cycles, feedback loops need to be as tight as possible. There’s no space in this for QA to be kept at arm’s length or until the end of the process (which is madness: “Quality” isn’t something you can add as an afterthought).

Another facet of our testing matrix is site behavior testing. Michael Stockton and other engineers have put a lot of effort into making it possible to asynchronously test the site as the user's use it. We use WebDriver (http://seleniumhq.org/projects/w...) to run site behavior tests like being able to post a status update or like a post. These tests help us make sure that changes that affect "glue code" (see http://en.wikipedia.org/wiki/Glu...), which is pretty hard to unit test, don't cause major issues on the site.

Engineers can also use a metrics gathering framework that measures the performance impact of their changes prior to committing their changes to the code base. This framework (which is crazy bad ass btw) allows an engineer to understand what effects their changes have in terms of request latency, memcache time, processor time, render time, etc.

We're still tuning the testing process in order to maximize engineer efficiency and minimize the time spent waiting for tests to run. Overall, the priorties are speed of testing, criticality (yes it's not a word meh meh meh) of what we test, and integrating testing into every place where test results might be affected or might guide decision making.
Maintenance of AutomationDisabling not-needed testsOne of the things that Facebook does is to only promote automated tests into their regular test runs once they’ve demonstrated stability. We’re ruthless about disabling flaky tests, and equally ruthless about deleting disabled tests.
Focus on Regression specific automationBig timeUltimately, the automated tests are taking more and more of the strain out of development, because regressions are being caught sooner, and therefore being fixed faster, sometimes before the code has been committed.
Test Automation ROI philosophyCost vs GainsIn the film Fight Club, there’s a scene where one of the characters explains how the auto industry choses whether or not to recall a vehicle. It’s an equation that something like, “the cost of recall” needs to be less than the “cost of a payment if something goes wrong” multiplied by “likelihood of a payout being needed.” Automated tests are much like that: the cost of writing and maintaining them (however you measure “cost”) needs to be lower than the cost of not writing them.
Approach to Manual testingMainly Dogfooding, Crowdsourcing,
Internal employees focused on testing
In the initial days, Facebook started with Manual testing only and then slowly evolved.

During those four weeks, every day we push a new build of the app to “dogfooders” within the company (a charming phrase, coined by Netscape, which describes the process of “eating your own dogfood” --- you naturally want it to be as tasty as possible).
all Facebook staff are encouraged to try out the “release candidate” builds. That means that by the time our app lands on your phone, you can be sure that it’s been given a thorough test drive.

Outside of setting the expectation that the individual engineers and their teammates are going to test their particular changes, we also put huge emphasis on "dog fooding" (see http://en.wikipedia.org/wiki/Eat...) changes to the site for up to a week before the general user will see the changes. This means that testing the site falls on the employees using the site overall. We all pride ourselves on finding and filing bugs that we find as we use Facebook for our on purposes. Every FB employee uses the site differently, which leads to surprisingly rich test coverage on its own.

There is also a swath of testing done manually by groups of Facebook employees who follow test protocols. The results (or I should say issues) uncovered by this manual testing are aggregated and delivered to the teams responsible for them as part of a constant feedback/iteration loop.

Culture of testingInitially less, now built into Engineering process We started from a position of not really having a culture of testing, but that’s changing over time as people see the value in the existing tests we have.
Testers and coding skillsHigh coding skillsOne refrain I hear occasionally is that knowing how to program will somehow “damage” a tester, because they understand how the software and machines work. My view is the exact opposite: Knowing how something works gives better insight into potential flaws. Essentially, I think that understanding how to code widens the set of tools available to a tester, without diminishing what they can do in the slightest.
Focus on QualityHighOn the other hand, we deeply respect the people who have chosen to spend their time on Facebook. One of the mantras of our release engineers is that no release should ever leave a person worse off than they were before.
Approach to Defect PreventionIntegrated with Development processWe also have an extremely robust Lint process that runs against all the changes an engineer is making. The lint process flags anti-patterns, known performance killers, bad style, and a lot more. Every change is linted, whether it's CSS, JS, or PHP. This prevents entire classes of bugs by looking for common bug causes like type coercion issues, for instance. It also helps prevent performance issues like box-shadow use in mobile browsers which is a pretty easy way to kill the performance of big pages.

No comments: