From the benchtop to the field: My 4-Step Playbook for R&D Projects

Step 1: Frame constraints (budget, time, environment, ownership).
Step 2: Design observability (health, logs, metrics, change tracking).
Step 3: Prototype honestly (no demo magic; diagnosable failures).
Step 4: Run a scoped pilot (hypotheses, signals, evidence, decisions).

The hardest part of an Edge/IoT project is not getting a demo to run in the lab but getting that same system setup to behave when it’s deployed in the real world, sitting in a noisy factory, or trying to talk over a sketchy 4G connection.

I’ve been through this scenario numerous times throughout my career.

I learned the “out of the lab” game first while working as a research technologist at Ryerson, then after locking down a strategy, replayed the strategy in Finland where I took university lab work into pilots with local companies, and later doing similar work in industry and consulting. Same pattern every time: beautiful demo in a controlled environment, chaos as soon as it meets the real world.

This post is the playbook I actually use now when someone says:

“We’ve got a prototype working. We’re scared of the field.”

No hype. Just the four steps I keep coming back to.

Step 1 – Frame the problem and the constraints

Most projects frame the idea mostly as:

“Smart camera for X”
“Sensor network for Y”
"Edge/AI analytics for Z”

While the above outlines the immediate results of a project, often it’s not enough to decompose the project down into achievable pilot milestones.

If you want a field pilot that doesn’t totally fall apart, you need to frame the problem plus the constraints.

The questions I insist on before anything else

Projects should be built with an iterative workflow rather than waterfall, since capturing all the requirements upfront may not be possible. For those interested in how the agile process works in acquiring requirements, refer to Martin Fowler's blog post where he goes further into agile requirements engineering. To kick off the first iteration in projects, I push all those involved to answer the following questions. Especially in writing, so we can have a single source of truth.

Pilot Kickoff Checklist

Use this to kick off the first iteration. If you can’t answer these, you’re not ready to “go to the field” yet.

Confirm the business goal and what “success” means (metrics, thresholds, decision criteria).
Identify the stakeholders and who makes the final go/no-go call.
Define the pilot scope (sites, users, devices, duration, and what’s explicitly out of scope).
Lock the budget and timeline (including hard milestones and review points).
Assign a long-term owner (who runs it after you leave / after handoff).
Document the physical environment constraints (power, network, temperature, mounting, access).
Verify security + privacy requirements (data types, retention, access control, audit needs).
Map legacy systems and integration points (APIs, protocols, authentication, constraints).
Decide what telemetry you will collect (logs, metrics, traces) and how you’ll inspect it remotely.
Plan failure handling (offline mode, retries, safe defaults, rollback strategy).
Establish a support workflow (who gets paged, how incidents are tracked, response SLAs).
Agree on iteration cadence (weekly/biweekly check-ins, what triggers a pivot).

Who are the stakeholders?
- Who funds this?
- Who uses it day-to-day?
- Who gets yelled at when it breaks?
This grounds the project in stakeholder expectations and avoids drifting into ‘What exactly did we fund?’ territory.
What domain realities are non-negotiable?
- Healthcare privacy?
- Union rules on shop-floor changes?
- Safety certifications?
- IT security policies?
Usually projects need to work within specific regulatory and compliance environments to avoid any legal/regulatory roadblocks down the road.
What are the hard constraints?
- Budget: What’s the actual cap, including “hidden” costs (IT time, travel, licenses)?
- Time: What’s the real runway before someone decides “this isn’t working”?
- Legacy systems: What do we have to integrate with (old SCADA, ancient SQL box, weird proprietary API)?
- Ownership: When I’m gone, who owns it — ops, IT, a vendor, “nobody”?
This is essential to enforce limits and prevent cost overruns or excessive time sink.
Physical environment
- Temperature range, dust, humidity, vibration, power quality, network reliability.
- “Office dev kit on the desk” is not the same as “mounted 5m up in a barn.”
The goal of R&D projects in essence is to have a near-market-ready product that can go onto further iteration processes. The majority of projects are run by customers of the organizations rather than the organization itself.

How constraints change the design (real-world pattern)

A (sanitized) pattern I’ve seen more than once:

In the lab, we had:

Unlimited power
Stable Wi-Fi
Admin rights on everything
Controlled lighting/scene/temperature

In the field, the reality was:

Devices on shared, flaky LTE
No direct internet from the “inside” network
Strict change-control: IT touched everything, slowly
Dynamic environment with changing lighting and temperature.

Because we surfaced those constraints early:

The architecture shifted from “devices talk to the cloud directly” to “devices talk to a local gateway that handles batching, retries, and controlled outbound traffic.”
We dropped features that depended on low latency and focused on “eventually correct” data.
We designed around “how IT already works” instead of fantasizing about greenfield DevOps.
We looked for energy optimizations on devices rather than default to a desktop/laptop deployment.

Takeaway: If you don’t perform constraint discovery up-front, you’re not “designing an Edge/IoT system,” you’re designing a system that will fail in real-world conditions.

Step 2 – Design for observability from day one

If you can’t see what’s happening in the field, you are flying blind. And in Edge/IoT, things will go wrong.

The list below is a set of examples of issues I have encountered in my field deployments:

Random rebooting of devices.
Network failures.
Sensors drift.
Certificates expiring.
Someone unplugs “the weird box” to charge their phone.

So before we talk about “features”, I ask:

Once this is in the field, what questions do we need to answer without SSH’ing into the box?

The minimum questions I design for

I always make sure we can confidently answer:

“Is the device alive?”
- Heartbeat/ping, last seen timestamp, basic health status.
- If I can’t answer this from a dashboard or simple query, the design isn’t ready.
“Does the data make sense?”
- Is the temperature within physically possible bounds?
- Did the camera suddenly go black / overexposed?
- Are we getting identical readings for 3 hours straight from a “noisy” sensor?
- Basic summary stats to spot drift or stuck sensors.
“Is there a network problem?”
- Local log of connectivity events: disconnects, reconnects, latency spikes, bandwidth drops.
- Clear separation between “the app is broken” and “the network is trash.”
“What changed recently?”
- Versioning for firmware / container images.
- Config history: who changed what, when.

How this shows up in the design

Concretely, this means:

We define a tiny “health check schema”:
- { device_id, timestamp, app_version, battery/power info, connectivity summary, basic sensor stats }
- We can “ping” the device to confirm it’s alive.
Run basic anomaly checks to flag drift, stuck sensors, and out-of-range values.
Devices emit structured logs (not just printf/console.log spam).
There’s a lightweight metrics pipeline (even if it’s just pushing JSON summaries to a central endpoint).
Store code and config in Git so every change is attributable and reviewable.

If observability is “phase 2” or the second iteration, you’re already in trouble.

In the field, observability is a key line item in your budget.

Further reading: If you want a practical, vendor-neutral starting point for traces/metrics/logs, OpenTelemetry is a good baseline: OpenTelemetry documentation.

Step 3 – Build “ugly but honest” prototypes

My experience working in sales has led me to understand that customers do not trust pre-recorded glossy demos. After seeing many demos by Red Hat, where they perform live demos even though they encountered issues during the demo, it helped to show the real world understanding and reliability of the tech industry. If a prototype looks like a polished product but we can’t see how it breaks, that’s a risk, not an asset.

I’d rather ship something visually rough that:

Exposes every failure mode loudly.
Lets us inspect metrics and logs.
Makes no attempt to hide duct tape.

What “ugly but honest” looks like in practice

Typical characteristics:

Off-the-shelf boards, dev kits, or small industrial PCs.
Open-source components wired together:
- MQTT or NATS instead of a custom message bus.
- Open-source gateway agents instead of a homegrown daemon.
- Existing monitoring stacks (Prometheus/Grafana/ELK/etc.) instead of building a dashboard from scratch.
Command-line tools (curl) and raw JSON during early tests.
Hard-coded debug flags turned on by default.

I’m fine with:

Clunky UI.
Extra cables.
Manual provisioning steps.

I am not fine with:

“Magic” behavior we can’t explain.
Hidden retry logic that masks intermittent failures.
Undocumented scripts that only one engineer knows.

Glossy vs honest: what consistently happens

Patterns I’ve seen again and again:

Glossy demo path:
- Everything is optimized for a single, controlled walk-through.
- Edge cases are hidden.
- Telemetry is missing because “it makes the logs noisy.”
- Result: first real-world incident turns into a forensic nightmare.

Ugly but honest path:
- Prototype looks like a science fair project.
- Logs are verbose, metrics are crude but present.
- Failure modes show up early and are diagnosable.
- Result: we find the ugly truths in a controlled setting, not during a 3-am outage.

My rule: If the prototype can’t tell us why it failed, we’re not ready for a pilot—no matter how slick the demo video is.

Step 4 – Run a focused pilot and capture evidence

A pilot is not “let’s just put it out there and see what happens.” A useful pilot is a time-boxed experiment with clear stakes. Before we deploy anything, I force the team (including non-technical stakeholders) to agree on:

1. Hypotheses

Examples:

“If we deploy sensors in these three locations, we can detect condition X at least Y hours earlier than we do now.”
“If we move this classification to the edge, we reduce bandwidth costs by ~Z% without hurting accuracy.”

The key:
A hypothesis is falsifiable. It can be wrong.

2. Success and failure signals

We define:

Quantitative signals
- Latency thresholds
- Uptime targets
- Detection/false-positive rates
- Manual work reduced (even as rough estimates)
Qualitative signals
- “Would you be upset if we turned this off?” (for frontline users)
- “Is this making your job easier or just adding noise?”

And just as important:

What does failure look like?
- “If we see X type of error more than Y times per week, we pause the pilot.”
- “If adoption by staff is below Z% by week N, we either change the design or kill it.”

Further reading: If you want a deeper framework for defining success signals (SLOs) and building monitoring that supports decisions, the Google SRE books are a strong reference.

3. Time window and environment

We decide:

How long the pilot runs (and when we review it).
In which locations / contexts we’re allowed to deploy.
What’s explicitly out of scope (e.g., “No production integration with system ABC yet.”).

4. What evidence we will capture

After the pilot, I want a simple document outlining the following:

What worked
- Where the design matched reality.
- Metrics that met or exceeded expectations.

What failed
- Technical failures (e.g., sensors misbehaving under certain conditions).
- Human failures (e.g., staff ignoring alerts because they’re noisy).

What decisions are now easier
- “We now know this connectivity model won’t work; we need a gateway.”
- “This vendor’s hardware is fragile in cold environments; drop them.”
- “This part of the pipeline is over-engineered and can be simplified.”

This evidence is what separates:

“We feel like it went okay” → “Here’s what we learned, and here’s what we’re doing next.”

Example walkthrough: applying the playbook to a real pilot

The following is a composite story based on one of the projects I’ve worked on. A version of this story appeared in a publication; here I’m focusing on the project-management lessons.

A joint project was created to investigate building a comprehensive energy monitoring solution for apartment building tenants. Three teams were assembled to focus on various elements of the energy monitoring solutions (hardware, cloud services, and customer UI dashboard). In the lab, we had ideal conditions: stable network, clean power, easy accessibility to the hardware, and full cellular/network access. The hardware deployment was in an apartment building basement next to all the electrical meters with very low cellular signal.

The additional factor was more of a human resourcing issue. The hardware team was fully staffed and ready to go, whereas cloud and UI team availability was highly variable in the project leading to drift in the milestone deliveries.

Walkthrough — Step 1: Framing the problem and constraints

Once we went over the project scope we uncovered the following:

The project was being funded by the government as a way to determine new products to sell to consumers.
The target group would be apartment building tenants trying to get an understanding of their energy usage.
Secondary target group would be selling this solution to energy suppliers as a turnkey solution to offer to their customers.
The owner of this system/solution would be the corporation overseeing this project.
Data privacy and security had to be ensured because the project was in the EU.
Budget was pre-allocated with assumptions made before team selection was complete.
Timeline was fixed with government defined milestone/deliverable dates.
Gaining access to the physical site for installation will be problematic due to limited availability of onsite staff.
The electrical meters were underground, which made cellular connectivity unreliable.

That pushed us toward:

The site would need to be configured with a router/gateway solution.
Raspberry Pis will have to be placed next to each electrical meter box with WiFi connection to the router/gateway since network cabling couldn't be pulled.
Arduinos with IR sensors will have to be attached to each meter to detect the consumption pulse light emitting from the meter in real time.
The Raspberry Pis will buffer data from the Arduino if the internet goes down.
We would need to self-host the cloud servers to maintain physical security. Servers will also need to use TLS/SSL to ensure all data is secure.
The web app customer dashboard will need to use the latest authentication (bear in mind this was 2012).
Additional lab testing was needed to test various outage scenarios since access to the on-site location was limited.
Because we depended on other teams, the hardware team also built a minimal cloud stub to keep testing unblocked.
The project needed to be easily replicated to hand over to the overseeing corporation for productization.

Walkthrough — Step 2: Designing for observability

Instead of diving straight into implementation, we listed a set of criteria to help determine if the solution will succeed in the field.

Clear health endpoints (is device alive, last sync, last error).
Basic metrics: processing times, queue sizes, error counts.
Connectivity summaries: how often we lost contact, how quickly we recovered.
Determining if the data is correct.

We defined a small set of questions the corporate sponsor wanted answers to without calling an engineer:

“Is it running?”
“Is it roughly keeping up?”
“Is it the network, the device, or the app?”
"Is the data coming in correct?"

Everything else was secondary.

Walkthrough — Step 3: Ugly but honest prototyping

The first deployed units were not pretty:

Off-the-shelf hardware.
A mix of open-source components.
A very plain internal dashboard for health and logs.

But:

When a device overheated, we saw it in the metrics.
When the network flaked out, we had history.
When the app crashed, we saw the trace.

No one would confuse it for a product, and that was the point.

Walkthrough — Step 4: Focused pilot, clear evidence

We agreed on a narrow pilot:

Limited number of locations.
Fixed time window.
Hypotheses around reliability, performance, and staff acceptance.

At the end, the evidence looked like this (simplified):

Reliability: Better than expected in some environments, worse in others we didn’t initially think about.
Performance: Edge processing worked, but certain workloads needed batching.
Operations: Staff liked not having to manually intervene; IT liked the transparency of logs and metrics.

The project didn’t “magically” graduate from pilot to full deployment overnight.

But:

The team had a concrete list of design changes.
They could justify next-stage funding with real data.
Future decisions were easier because they were grounded in what actually happened in the field.

That’s the whole point of a good pilot.

Common failure modes in field pilots (and what to do instead)

Field pilots rarely fail because the team is “bad at engineering.” They fail because reality introduces constraints you didn’t model, and your system can’t explain itself when it starts behaving differently. Here are the most common failure modes I see, mapped to what fixes them.

Failure mode: You don’t have a shared definition of success, so everyone argues about outcomes afterward.
What it looks like: “The demo worked” vs “The pilot didn’t deliver value.”
Fix: Define success metrics and decision criteria up front (Step 1).
Failure mode: Scope creep disguised as “small changes.”
What it looks like: “While we’re here, can it also…” until the pilot becomes a product rewrite.
Fix: Freeze scope for the pilot window and force new requests into a backlog (Step 1 + Step 4).
Failure mode: The system needs hands-on debugging to answer basic questions.
What it looks like: You can’t tell whether the device is alive, stuck, drifting, or offline without SSH access.
Fix: Add health signals, structured logs, and a minimal telemetry plan (Step 2).
Failure mode: “The network will be fine” optimism.
What it looks like: intermittent connectivity, dead zones, flaky Wi-Fi, captive portals, firewall rules, weak LTE.
Fix: Design for offline operation: buffering, retries with backoff, and clear “last known good” states (Step 1 + Step 2).
Failure mode: Sensor drift and environmental interference make the data “technically correct” but practically useless.
What it looks like: heat, vibration, humidity, EMI, lighting variation, unexpected placement changes.
Fix: Baselines, periodic calibration checks, anomaly flags, and environment-aware thresholds (Step 2 + Step 4).
Failure mode: Updates are scary, so nothing gets updated — or updates break the pilot.
What it looks like: version mismatches, config drift, “works on device A but not B.”
Fix: Treat software/config as versioned artifacts in Git, publish releases, and track deployed versions (Step 2 + Step 4).
Failure mode: The pilot works, but no one “owns” it operationally.
What it looks like: the system becomes shelfware after the initial excitement because support is undefined.
Fix: Assign a long-term owner and define support workflow early (Step 1).
Failure mode: The security model is bolted on late and blocks deployment timelines.
What it looks like: surprise data retention rules, access control debates, missing audit requirements.
Fix: Decide data types, retention, and access boundaries up front; design “minimum viable security” for the pilot (Step 1 + Step 2).
Failure mode: The pilot collects lots of data but produces no decisions.
What it looks like: dashboards exist, but there’s no “so what?” or next step.
Fix: Build your review cadence and decision checkpoints into the plan (Step 4).

If you’re stuck between lab and field

If you’re an R&D manager, founder, or tech lead sitting on a promising prototype and dreading the “real world” step:

Start by documenting your constraints.
Design observability in before you worry about polish.
Favour ugly but honest over slick and opaque implementations.
Treat your pilot as a structured experiment, not a hopeful launch.

If this resonates and you want a second set of eyes on your own Edge/IoT project:

Check out my About and Process pages to see how I usually collaborate with teams.
Or, if you’re already staring at a lab demo you don’t trust in the field, reach out and we can walk through these four steps against your specific situation.