Introduction and Background
Although rapid iterative development has become a popular approach to software development, many development teams are unsure of the implementation of QA through this model. Traditional QA practices, such as manual tests and regression tests, cannot adapt to the iterative rhythm, which requires multiple online launches in a day. Development teams often face a dilemma: sacrificing quality for speed, or regularly fixing the release while compromising business agility.
What are some QA best practices to minimize risks while keeping up with the pace of rapid iterative development? Let us explore the successful QA practices of the RippleTek wireless technology team from 2014 to 2016.
QA in Development
The introduction of specialized testing personnel brings about additional communication costs and creates bottlenecks when multiple systems are headed for parallel release. Therefore, RippleTek did not designate specialized testing personnel but conducted functional tests with developers. The pre-launch validation process is described below:
1. Test our personalized development environment.
2. Test in the pre-release environment. Ask the product functional designers to assist in the test if necessary. This step mainly aims to check whether the understanding of the business features is correct.
3. Propose a merge request to merge changes to the branch for release.
4. Read the changes line by line with another developer, and explain the reason for every change. In case of any problems discovered in functional implementation, suggest and make changes before launching the product. If there is only a problem in style, launch the product first, and then restructure and relaunch the product.
Benefits of Pre-Release Environment
- Convenient for integration and testing.
- Convenient for communications between developers and product personnel.
- Facilitates communications with developers during code review.
QA in O&M
When the merge request passes the code review and is eligible for launch, developers can launch the feature online by themselves. To reduce risks while improving efficiency and comfort, the entire launch action only requires a single click by the developer in the CI system. This will launch the feature online fully automatically. In addition to the one-click deployment, the O&M infrastructure should provide the following basic functions:
One-Click Rollback
In principle, it is best to avoid rollbacks unless a severe problem that impacts availability occurs and no other options are available. Despite the rarity of rollbacks, your design should support rollback and any changes made should be as retractable as possible. If developers discover an unexpected problem following the launch of a new product version, they can revert the product to the previous version with only one click.
Error Monitoring and Push Alarms
RippleTek uses O&M bots for collecting and analyzing system logs, and monitoring the trends of the number of error logs. When the number of error logs of a service increases sharply, the system sends an alarm to the service developers and O&M personnel. In this manner, developers will receive alarms if there are problems with the new code. Within a few minutes after the launch, the developers can decide whether to fix the product or to perform a version rollback.
Gray Release
If the change released in a basic function module affects the global function, you can adopt a gray release to minimize risks. In gray release, developers initially use the new code with a small amount of traffic. The running status of the code is observed for a specified period. After confirming that the new code does not experience exceptions, you can then apply it to the entire network.
Data-driven QA
Data-driven QA is a testing method that constantly monitors the current system status to detect for anomalies. In data-driven QA, the implementation effects of new and old versions of the code are compared through observation, analysis, and statistics.
Based on this definition, the Error Monitoring and Push Alarms function can be categorized as a data-driven QA method. Other commonly used data-driven QA methods include:
- Using the O&M bot to periodically check whether the invariant constraint between key business data is still valid. For example, you must always ensure that the accounts in an accounting system are balanced. If the system detects any imbalance, it should promptly issue an alarm and identify the cause.
- Checking whether key business indicators are stable. For example, the number of successful orders in the previous hour should not have a large deviation from the average value. A sharp decrease may indicate a decline in system availability while a sharp increase may indicate malicious click farming.
- Using statistics and comparing between key performance indicators of different product versions to determine which version is better in quality.
Conclusion
In rapid iterative development, QA should not be an independent process before business changes or releases. The ideal QA process must permeate development, O&M, and data analysis processes. With adaptive QA practices, your online production environment can support dozens of releases a day while mitigating risks.