Blame Game: Dev vs. Ops

What are the effects of a software delivery process on the participants, and why does it lead to conflict? To understand this problem better, we must examine the daily motivations of the key people. As more features are completed, the developer's reputation improves. Throughput and a good velocity are considered to be a reflection of great performance by the developers. In many situations, from the developer's viewpoint, the new features available on test machines will be indistinguishable from the features that are deployed on production systems available for users.

Programmers, testers, database administrators, and system administrators experience challenges every day. These problems include risky or faulty deployments of software, an unnecessarily sluggish delivery process, and suboptimal collaboration and communication due to walls. These issues often lead to an overall slowdown that causes the company to lag behind its competitors and thus be placed at a disadvantage.

Conflicts During Deployment
Conflicts between development and operations teams often originate from time pressures. Typically, a new software release must be deployed quickly. Another scenario that requires operations team to react quickly is when the system is down, and restoring it quickly becomes the highest priority. This situation often leads to a blame game where each side accuses the other of causing the problem.

Common scenarios include the following:
1. Development passes a new release to operations, but the latter is unable to get it running on the production system.

2. The operations team contacts the members of the development team to get them to fix the problem; the former describes the errors experienced while trying to bring the release to production.

3. In response, development blocks communication and does not offer any help.

4. Development claims (correctly) that the software release ran in a test environment without any problems. Therefore, development reasons that the operations team is at fault for not bringing the release to life. Finger pointing from both sides may end in angry phone calls, rancorous e-mails, and even escalation meetings.

5. After escalating the issue to the boss and his or her boss, two engineers are again assigned to look into the malfunction.

6. By investigating both the testing and production environments together, they discover that the two environments are different in some minor detail, such as a technical user or a clustering mode. Neither party knew about this difference.

By discovering the error together and aligning the systems with each other manually, the teams can successfully deploy the new software release to production. Of course, the blame game continues later because each party thinks it was the task of the other side to flag the difference as an issue or to adjust the systems accordingly.

Conflicts After Deployment
The blame game also often emerges when a new release goes live. Here is one scenario:

1. Many more users are using the new features than the company expected.

2. Response times slow down until the software stops responding entirely. The users are panicking because this is the worst-case scenario.

3. Escalating the issue leads to finger pointing: development claims it is the fault of the database group, the database team blames the network group, and others speculate that the server group is responsible for the outage.

4. After going through intensive emotional torture, the company begins an objective examination that finds the root cause of the issue. However, by this point, the users have already been scared away from the new application. As a result, the company incurs heavy losses in sales and reputation.

Conflicts About Performance
The blame game can also occur in the following scenario:

1. A performance issue suddenly happens in the production system.

2. Following a blame game, the developers identify an issue in the application. They work long days before finally providing a patch that fixes the issue.

3. Nonetheless, the operations team is still annoyed by the performance issue and is reminded of similar instances in the past, where every subsequent patch further reduced the stability of the software. As a result, the operations team is leery of applying the patch.

4. The team members suggest coding the software themselves because of those problems. This suggestion, in turn, heightens the emotional tension between the teams. The operations team wants the patch to be applied in a test environment to ensure that it does not infect the overall stability of the application and that it actually fixes the performance bottleneck.

5. Unfortunately, the test environment does not fully mirror the production environment, and there are no test scenarios or test data that can imitate the exact same problem on the test machine. Days and weeks go by, and the patch is still not applied to production. All those long days spent working on the solution were for naught.

6. The management and business units increase the pressure to solve the problem, the development and operations teams continue their conflict, all are frustrated, and the performance issue on the production machine is not solved.

Unfortunately, these horror stories are common (although not necessarily the rule). Have you experienced these types of scenarios? Did you come to the conclusion that, in retrospect, it would have been much better (and much faster) to work on the solution together from the start?

Project life is hard, and many conflicts may arise between development and operations. However, it is possible to narrow down these problems. The core issue responsible for most of the conflicts between operations and development is that the operations department is often considered to be a bottleneck.

Good Tips N Tricks

Blame Game: Dev vs. Ops

0 comments:

Post a Comment

Email Newsletter

Labels

Followers

Popular Posts

Labels

Popular Posts

Labels