For companies that primarily do business through their public-facing Internet site, best practices for production, staging and development are imperative. For any Web-based business, whether e-commerce or presence-based, it is essential that content and systems be updated to remain competitive, which means change and that means risk.
Web-based businesses wrestle with complexity on a daily basis. Typically, there are multiple developers involved, different development timelines, and numerous components being built simultaneously, tiered systems to synchronize for updates, etc. One misstep in the development and deployment cycle has the potential to cost the business millions of dollars.
In my various capacities, I have spent years refining best practices for production, staging and development environments. Normal best practices these days are for the following:
- A serious version control system (VCS) or distributed version control system (DVCS). Depending on the architecture, I’m still using Perforce, which is expensive, but über-reliable. Many folks like GIT, Subversion, and Mercurial.
- Serious issue management to track platform development and bug fixes/handle actionable team communication, etc. I recommend JIRA.
- Development environments that run on developers’ personal machines in a sandbox or VM, or if substantially small in terms of the number of processes, just in the OS (if they’re running desktop/laptop Linux). Development environments should be a little version of production.
- New development that is branched in the VCS and not let into the mainline branch until complete. Branches can be team-centered or developer-centered depending on how many developers are working on the particular piece of code. But a modern VCS makes it easy to branch early and merge back in only when development is complete.
- Staging is essential and should be gate-kept. Development should be promoted to staging through the VCS. This way, the act of promotion provides a dry-run for the move from stage to production.
- Scripted tests for Web-based platforms. I suggest looking at Sauce Labs for this, or roll-your-own using Selenium.
- Staging is generally used as a dry run for a move to production and final testing. Nothing should go to production that doesn’t succeed in stage.
- Stage should have a lazy copy of the entire environment if possible (full user base, etc.), but shouldn’t directly touch anything that could cause user confusion (sending email to users, etc.). It can be quite difficult to get a reliable stage environment set up, especially if you need multiple servers to do it (DB, File, GUI, etc.). But, the more complex the real environment, the more complex the stage environment and the more important it is to get staging working correctly.
- Stage is also where you perform human testing on things that can’t be scripted. There shouldn’t be need for a lot of this on a Web app, but there are some things that are hard to do in an automated fashion (i.e., does the site look “bad”?).
- Unit tests for internal functionality need to be baked into the development process. High-volume organizations tend to make automated running of the 3 major test types (integration, unit, and scripted GUI). As a safeguard, a trigger in the check-in process is important so that code that violates testing cannot be checked into the code base at all and instead remains on the developer’s machine.
- If you can do it, Continuous Integration is a boon to keep things from getting too far afield. Code compilations, environment synchronization, and spawning automated testing when checks occur are all great tools and can be done very inexpensively with Jenkins.
- Work on production is considered to be a complete fail. The only reason to do it is catastrophic failure of something that does not happen in the stage and development environments and is a sign that the stage is an inadequate representation of production. Usually, this would only be to change configuration parameters.
- If production breaks roll back to the previous version of the site. If that fails, then you have no choice but to work on production.
- Note that even for catastrophic situations, changing code directly on the production servers is just as likely to make things worse if they haven’t passed the rest of the testing in stage.
- You can do a lot of debugging on the production site, where debugging is gathering information about something that isn’t working correctly. You shouldn’t, however, use the production site as a test bed or development environment.
- If your tests don’t do things that you wouldn’t want done in the production environment (adding and removing users and files, changing parameters, simulating failures, etc.), then you don’t have sufficiently detailed tests.
This is the tip of the iceberg, but before anybody tells you that these methods are too overbearing for a small team, I would suggest that’s a short-sighted view.
At my company, we practice this in a team of small developers for mobile, desktop, and Web apps. The result is that, in over two years, we have not suffered data loss or corruption in the Web site, which handles our registration, e-commerce, etc. Additionally, we have caught many things that would have been problematic for users if they’d been released. Similarly, regression testing in mobile and desktop apps had kept us from releasing versions of the software that don’t work on all of the supported platforms and has saved us a lot of customer difficulties.
I hope this is helpful, and I invite you to comment. Good luck.