Chapter 1. Defining Objectives

One of the biggest failure modes for data teams in biotech organizations, particularly the junior members on these teams, is to view their role and their responsibilities in terms of a narrow, mostly technical scope. Once the members of your data team know what their ML model needs to predict or how their app needs to function, they want to get those things done and move on to the next technical project.

This makes sense in the context of a traditional tech company like Google, Meta, or Apple where the deliverables are primarily technical. But in a biotech organization where the overall objectives are very different, someone needs to make sure the data teams’ technical objectives actually match up with the larger scientific goals. And while it would be great if the scientists could dictate their technical requirements precisely and accurately, they rarely have the knowledge and background to do it well. So the responsibility often falls to you to make sure objectives are aligned.

This is even more important in the context of a techbio organization where the tech is expected to play an equal (or greater) role relative to the bio. For these organizations, the tech side should not only contribute to the overall objectives, but help shape them. You and your team will have the deepest understanding of the capabilities that data and software have to power more ambitious scientific objectives. So if you focus exclusively on technical objectives, that’s a missed opportunity.

The four Reciprocal Development Principles covered in this chapter address different aspects of how you and your team can begin to adopt a broader view of your scope and objectives. The final section of the chapter summarizes practical measures you can take to put these principles into practice.

Don’t Stop at Technical Objectives

Principle 1: Your highest priority is to drive progress towards your organization’s scientific objectives.

I’m going to start a bit abstractly with what you care about, what you pride yourself on accomplishing, and what you think fundamentally makes your work valuable. These are the things you prioritize because they define who you are.

If you come from a technical background, it’s natural to frame these priorities in terms of technical excellence. You want to build the most accurate models and the fastest pipelines. You want to minimize technical debt and build intuitive interfaces, all while maintaining thorough documentation. If your scientist colleagues don’t appreciate it, that’s on them.

But when your priorities stop there, you abdicate the job of defining these technical goals and aligning them with the bigger picture. Maybe your predictive model answers an interesting (and publishable) question, but it ignores the less interesting yet more existential questions that would actually drive the pipeline. Maybe your dashboard gives bench scientists the information they asked for, but doesn’t fit into their workflows in a way that they can use. If the model is accurate and the dashboard is deployed ahead of schedule, but the drug never makes it to the clinic, then the data and the bench teams have both failed.

If you narrow your scope to technical objectives, you push your data team towards a service role while allowing the bench teams to take sole responsibility for driving scientific goals. Forget about the equal footing between data and bench teams that was supposed to make your biotech company a techbio company.

Prioritizing the organization’s scientific objectives rather than just your own technical objectives means taking responsibility for aligning those technical deliverables and requirements with the science. Every single person on your team, no matter how junior, should be accountable for ensuring the work they’re doing is what the organization needs, and that it’s integrated into a coherent pipeline and platform. Every technical decision should be traceable through a deliberate and defensible sequence of “whys” to an organizational objective. You can’t rely on someone else to do it for you.

Explicitly Define Project Goals

Principle 2: Design projects around deliberate scientific objectives coordinated with the overall organization.

The first principle was about how you think about your own role and priorities. This one is about how you communicate internally and externally, to reinforce this broader conception of scope.

Many teams don’t deliberately write down project goals because they seem obvious: the goal is for the thing that you’re building to work. But that’s a technical goal, reinforcing a narrow scope of responsibility. To push yourself and your team to adopt a broader scope, and push your stakeholders to recognize this broader scope, you need to deliberately identify and communicate objectives, rather than let team members and stakeholders make assumptions.

To get from technical goals to broader scientific goals, start by asking why the technical goals were chosen in the first place. How will the information be used downstream? How are they doing it today? What would happen if you couldn’t meet the requirements? What would happen if you could do better than expected? Find out what scientific goal your project is addressing, then make that your own objective.

If you’re building a predictive model, the goal of the project isn’t to create an accurate model. The goal is to improve whatever process or larger project is going to use the model. If the model is accurate but it doesn’t improve the larger process/project for some other reason, the modeling project has failed. The same is true for building an app or a data pipeline, or anything else.

Even internal projects that don’t directly address a specific scientific goal should have indirect impacts on the science. If you make your pipeline ten times faster and nobody notices, it probably wasn’t worth the effort. But if someone does notice, and it makes it possible for them to use the predictions in a new way, then there’s your scientific impact. If your code refactor eliminates 90% of your technical debt, but you never touch the code again, then you could’ve just kept the technical debt. But if the refactor allows you to deliver your next update in half the time, enabling a new experiment, there’s your impact.

Deliberately identifying these broader goals doesn’t just help you communicate with stakeholders; it helps you prioritize your work and even identify shortcuts to meet the scientific goals faster. Can you answer the scientific question with a bar chart instead of the deep learning model that the scientists wanted? Are there easy questions the bench scientists didn’t bother asking because they thought they were too hard?

Whether you repeatedly ask “why,” map out downstream dependencies, or use any other method to identify the scientific objectives, deliberately writing down these goals forces you to think about that broader scope. This increases the chances you’ll notice the issue that would make your 100% accurate model unusable, or realize that an afternoon of work could make it unnecessary. Even if you can’t predict the exact impact of a project, you should be able to estimate it, or identify potential impacts. And if you can’t come up with a compelling scientific impact then maybe you should put that project on hold until you can.

Technical Milestones Aren’t Progress

Principle 3: The primary measure of progress is scientific discovery.

The third principle appears to be about metrics, but it’s really about how you coordinate with bench teams. It’s an intentional riff on the principle from the Agile Manifesto that working software is the only measure of progress. The Agile version was a response to early software development approaches in which separate teams would follow detailed specifications to separately write software components over the course of months or years before fitting them together at the very end.

These early development practices measured progress in terms of the separate components so they felt like they were making progress the whole time. But measuring it in terms of working software encourages you to put the components together as early as possible so you can show progress. And while this may seem to be a minor difference, it turns out to have a huge impact by reducing the need for detailed specifications and the risk of missing issues that don’t arise until everything comes together.

In the context of an embedded data team, it’s best to measure your progress in a way that both pushes you to more effective development practices and aligns with your new, broader scientific objectives. Since you want your objectives to be about scientific progress, your measure of progress should be too.

Deploying your ML model isn’t progress, but using it to design a better experiment is. Adding a table to the database isn’t progress, but using it to explain an unexpected outcome is. Even analyzing data from an experiment isn’t progress until it leads to understanding that drives a decision.

In the Agile context, the key insight from adopting a new metric was that you could put all the components of your system together before they’re completely functional. This seemed crazy and scary at the time, but turned out to be one of the most impactful parts of Agile methodologies. In the Reciprocal context, the key insight may be similarly scary, but it’s not that bad once you get used to it:

You can often make as much or more scientific progress with a prototype or even a proof-of-concept as with an optimized, production-grade tool/model/application.

The failure mode when you don’t realize this is to wait to try things in the lab until you’re absolutely sure of your technical work. Even if you’re getting regular feedback on your preliminary results, you may wait to release your model until you can make it a few percent more accurate or until you build a user interface for the scientists. You may wait for the bench scientists to ask for it instead of asking them to try it. You may be using Agile methods to ensure you have working software the whole time, but if you haven’t created scientific impact, you haven’t made progress.

If, on the other hand, you push yourself to make scientific progress by having the bench scientists use your tools as early as possible, you’ll start to discover a new bag of tricks. Instead of making your model a few percent more accurate, do a sanity-test experiment in the lab to see if the current accuracy makes a difference. If it doesn’t, then you’ve got some other work to do before improving the accuracy. Instead of making a user interface, manually run the analysis for the first few experiments. It’s more work but you’ll learn a whole lot more about how your users want to interact with your tools.

By trying to get to scientific impact as soon as possible, not only will you identify potential issues before they become real problems, you’ll also get a head start on building the processes and habits around the tools. The processes and habits that integrate your tools into the larger organization and drive scientific discovery.

You’ll still get to make your model more accurate (if it’s actually necessary) and build that user interface. Just not until you see that initial impact and progress.

Sometimes the Boring Solution Is the Best

Principle 4: The simplest technical solution that will reliably meet scientific objectives should be chosen over complex or novel approaches with marginal improvements.

So far, you’ve expanded your scope beyond just technical work, you’ve deliberately identified broad, scientific objectives, and you’ve started seeking scientific impact as early in the development process as possible. But if you thought that was difficult, just wait. Because to do those things well, you often need to select boring technical solutions that will get the job done sooner.

You were hired for this job because you understand industry best practices and you can build complex, advanced, sophisticated systems blindfolded. You probably have a list of tools and libraries and programming languages you’ve been wanting an excuse to learn. Not to mention all the things you want to be able to add to your resume.

Well, too bad.

Building a project in that new language you want to learn is just going to make it harder for the next person to maintain, not to mention how much longer it will take you to write it. Using the latest libraries and tools often means questionable documentation and undiscovered bugs.

Moreover, the most interesting tools and frameworks are often built for complexity and scale that is common in the tech sector but often overkill in a biotech lab. Think Kubernetes or Redshift. Your application doesn’t need to scale to billions of users if it’s only for your bench team. Your pipeline doesn’t need to be able to handle petabytes of data if your datasets are all measured in gigs.

If you do need to handle those sorts of technical requirements, or legitimately expect to in the future, then go ahead and design for that scale. But those tools generally come with overhead and complexity that aren’t worth it otherwise. A relational database is much easier to use than a distributed data warehouse if you only have millions of rows. A microservices framework is going to be more headache than it’s worth for a single server and a few dozen users.

At the end of the day, the goal is to get to a solution that is good enough to drive the science, and start driving it soon enough to have an impact before the bench team moves on to something else. If the cutting-edge technical solution is going to take twice as long to implement and three times as much effort to maintain, then it’s time to choose the boring solution.

Putting It All Together

Taking responsibility for your organization’s scientific objectives (Principle 1) requires you to both shift how you, your team, and your stakeholders think about the role of data teams, and to begin adopting specific development practices that may seem a bit uncomfortable at first.

Shifting how you and others view your team is a long-term process. Most importantly, you’ll begin to deliberately talk about your scope and objectives as you want them to be, with the team and with stakeholders. This includes setting explicit project goals that address scientific rather than technical objectives (Principle 2). But you’ll also push your team to think beyond technical objectives in their own work, and measure their progress accordingly (Principle 3). Plus you’ll push your stakeholders to expect and support this broader scope.

The development practices that support this broader scope may take some getting used to, but as with the new practices that came with Agile, this new development approach will pay off in the long run. First, once you begin measuring progress in terms of scientific discovery rather than technical development (Principle 3), you’ll begin integrating your models and tools into experiments and wet lab processes as early as is feasible, rather than waiting until they feel ready. And to speed things up even more, you’ll begin choosing boring solutions that match the scale you’re faced with rather than the scale you might find at a tech company (Principle 4).

These practices are fundamentally different from the traditional role of data teams in biotech and the practices you may have learned in the tech sector, but they create a solid foundation for the rest of the Reciprocal Development Principles, and for your data teams to be successful.

Get Leading Biotech Data Teams now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.