tuxonomy

by AdBrick
Back to Resources
Data Engineering

From Naming Convention to Data Pipeline

Turning campaign names into warehouse dimensions

11 min read

There's a quiet irony in how most marketing data pipelines work. Platform APIs deliver rich performance metrics — impressions, clicks, conversions, spend — in clean, structured formats. But the business context that makes those metrics meaningful — which brand, which market, which campaign type, which audience — is locked inside a campaign name string that the pipeline treats as an opaque label.

The name Acme_UK_Awareness_Prospecting_Meta_Q1-2025 contains six dimensions of structured data. But in most warehouses, it sits in a single campaign_name column, and an analyst has to extract those dimensions manually every time they need to slice performance by brand, market, or objective.

This whitepaper is for data engineers, analytics engineers, and marketing data analysts who want to close that gap — turning naming conventions from passive labels into active data infrastructure.

The Analytics Gap

Advertising platform APIs are generous with performance data. Meta's Marketing API, Google Ads' reporting API, DV360's reporting service — all of them return detailed metrics at the campaign, ad group, and ad level. What they don't return, in most cases, is the business taxonomy.

Google Ads doesn't have a brand field. Meta doesn't have a market column. DV360 doesn't report by campaign_objective in the way your internal taxonomy defines it. These dimensions are encoded in the campaign name by the person who set the campaign up, and extracting them is left as an exercise for the data team.

The result is a persistent gap between what the platform knows about a campaign (its performance) and what the business needs to know (its context). Bridging that gap is the job of campaign name parsing — and most teams are doing it inefficiently, unreliably, or not at all.

Without parsing, cross-platform questions that should be trivial become multi-hour exercises. "What was our total Awareness spend across Meta and Google in Q1?" requires an analyst to manually identify which campaigns are Awareness campaigns, across both platforms, reconcile naming variants, and aggregate. With parsing, that question is a single SQL query against dimension columns that already exist in the warehouse.

Three Approaches to Parsing

Teams generally land on one of three approaches to extracting dimensions from campaign names. Each has a distinct trade-off profile.

Approach A: Regex in the Warehouse

The most common approach — and usually the first one a data team reaches for. Write a SQL function or dbt macro that splits campaign names by separator and extracts positional segments into dimension columns.

-- Typical dbt/SQL approach
SELECT
  campaign_name,
  SPLIT_PART(campaign_name, '_', 1) AS brand,
  SPLIT_PART(campaign_name, '_', 2) AS market,
  SPLIT_PART(campaign_name, '_', 3) AS objective,
  SPLIT_PART(campaign_name, '_', 4) AS audience,
  SPLIT_PART(campaign_name, '_', 5) AS platform,
  SPLIT_PART(campaign_name, '_', 6) AS period
FROM raw_campaigns

This works — until it doesn't. The approach is position-dependent: it assumes that the third segment is always the objective, and that every name has exactly six segments. The moment a convention changes, a new naming pattern appears, or a non-compliant name enters the system, the extraction silently produces wrong data. Acme_UK_Meta_Awareness_Prospecting_Q1-2025 (with platform and objective swapped) would assign Meta to the objective column with no error.

Regex parsing also has no concept of validation. It will happily extract Awarenss (a typo) into the objective column without flagging it. The incorrectly spelled value then proliferates into dashboards and models, where it either shows up as a separate category or gets silently filtered out.

For teams with a single, stable naming convention and strong upstream discipline, regex parsing can be adequate. But the maintenance burden increases linearly with the number of conventions, and there's no mechanism to detect when parsing is producing garbage.

Approach B: Lookup Tables

Instead of parsing the name structurally, maintain a mapping table that maps full campaign names to their dimension values.

-- Lookup approach
SELECT
  c.campaign_name,
  m.brand,
  m.market,
  m.objective
FROM raw_campaigns c
LEFT JOIN campaign_dimension_map m
  ON c.campaign_name = m.campaign_name

This is accurate for every campaign that's in the mapping table. The problem is that the mapping table requires manual maintenance. Every new campaign needs an entry. Every renamed campaign needs an update. And the LEFT JOIN silently drops dimensions for any campaign that hasn't been mapped yet — which is exactly the scenario where you most need the data (a new campaign that just launched and you want to monitor).

Some teams automate lookup table population with a regex-based first pass, then manually correct errors. This is better than pure regex, but it creates a dependency on human review that doesn't scale with campaign volume.

Approach C: Governance-Integrated Parsing

The most robust approach connects parsing directly to the system that generates the names in the first place. If a governance platform knows the convention structure — which dimensions, in which order, with which separator, and which values are valid — then parsing is deterministic rather than heuristic.

This is the approach Tuxonomy's developer API enables. The parse endpoint accepts a campaign name and matches it against the active naming rules by analysing the separator pattern, segment count, and validating each segment against the dimension's allowed values. If the name was generated by the platform, it will always parse correctly. If it wasn't — if someone manually typed a name that doesn't match any convention — the parse fails explicitly, which is a signal rather than a silent error.

The key differentiator is auto-detection. When a workspace has multiple naming rules (one for campaign-level, another for ad group-level, a third for a different platform), the parser identifies which rule matches the input string and extracts dimensions accordingly. There's no need to tell the API which convention to use — it figures it out from the structure.

The API returns structured dimension data:

{
  "matched_rule": "meta_campaign_v2",
  "dimensions": {
    "brand": "Acme",
    "market": "UK",
    "objective": "Awareness",
    "audience": "Prospecting",
    "platform": "Meta",
    "period": "Q1-2025"
  },
  "confidence": "exact"
}

This output slots directly into a warehouse staging layer — no regex, no lookups, no manual mapping.

Building the Pipeline

Regardless of which parsing approach you use, the pipeline architecture follows a consistent pattern.

Raw layer. Campaign data lands from platform APIs in its original form. Campaign names are stored as-is, alongside performance metrics and platform metadata. No transformation happens here — this is your immutable audit trail.

Parse layer. Campaign names are decomposed into dimension columns. This is where regex, lookups, or API-based parsing happens. The parse layer should also capture parse status: did the name match a convention successfully, or did it fail? Failed parses need a destination — a quarantine table or a flagging mechanism that surfaces them for review.

Dimensional model. Parsed dimensions become first-class columns in your fact and dimension tables. brand, market, objective, audience, platform, period — all available as standard columns for joins, filters, and group-by operations. This is what your BI tools and analytical models query against.

Feedback loop. Parse failures feed back to the campaign operations team. If a name can't be parsed, it means someone bypassed the naming system or the convention has evolved without updating the parse logic. This feedback loop is what turns parsing from a data engineering task into a governance mechanism.

A practical consideration: parsing is idempotent. Running the same name through the parser always produces the same result. This means both incremental and full-refresh pipeline patterns work. Incremental is more efficient at scale — only parse new or changed campaign names — but full refresh is simpler to implement and guarantees consistency if conventions change retroactively.

Handling Failures

The most important design decision in a parse pipeline isn't how you handle names that parse correctly — it's how you handle names that don't.

Quarantine, don't discard. Names that fail parsing should land in a staging table with the failure reason (unknown separator pattern, unexpected segment count, unrecognised dimension value). Discarding them silently means your warehouse underreports spend — which is worse than having a few uncategorised rows.

Surface failures visibly. A parse success rate metric — tracked over time, ideally visible on a dashboard — gives the campaign operations team a real-time signal about naming compliance. If the success rate drops from 98% to 85% after onboarding a new agency, that's immediately actionable.

Provide fallback categorisation. For failed parses, you can implement a partial extraction layer that captures whatever dimensions it can identify, even if the full parse fails. A name with a recognisable brand code but an invalid objective value can still be partially categorised — brand = Acme, objective = UNKNOWN. This keeps the data usable for broad-level analysis while flagging the specific gap.

Set an SLA. Define how quickly parse failures should be resolved. In many teams, the answer is "before the next reporting cycle." This gives campaign managers a concrete window to fix non-compliant names or to get new values added to the governance platform.

The Feedback Loop

The real power of governance-integrated parsing isn't the parsing itself — it's the closed loop between data quality and campaign operations.

In most organisations, the data team discovers naming problems weeks or months after the campaigns launched. By then, the data is baked into reports and models, and fixing it requires retroactive cleanup. With a parse-based feedback loop, discovery happens at ingestion time — usually within hours of campaign launch.

This changes the dynamic from reactive cleanup to proactive governance. Instead of "we found 200 incorrectly named campaigns from last quarter," the conversation becomes "three campaigns launched yesterday that don't match any naming convention — here they are."

For teams that track parse success rates over time, the metric becomes a proxy for overall naming discipline. A consistently high parse rate (95%+) indicates that the governance system is working. A declining rate indicates a process gap — a new team member who hasn't been onboarded, a new platform that wasn't added to the rules, or a convention change that wasn't communicated.

Naming as a Data Contract

The most mature organisations treat naming conventions the same way engineering teams treat API contracts. A naming convention defines a schema: these dimensions, in this order, with this separator, using values from this controlled vocabulary. When a campaign name is generated according to that schema, it's guaranteed to be parseable. When it deviates, the parse failure is a contract violation — an exception that gets handled explicitly rather than silently absorbed.

This mental model — naming as a data contract — has practical implications for how teams organise around the problem. The convention becomes a shared artefact owned jointly by marketing operations (who define and enforce it) and data engineering (who consume and parse it). Changes to the convention follow a process: update the rules, update the parse logic, validate backward compatibility, deploy.

When governance and parsing are integrated in a single platform, this process collapses to a single step. Change the rule, and both generation and parsing update simultaneously. There's no synchronisation gap, no deployment mismatch, and no period where new names are generated against rules that the parser doesn't yet understand.

That integration — where the system that creates names and the system that parses them share the same source of truth — is the foundation of Level 5 maturity. It turns naming from a manual process that data teams work around into an automated pipeline that they build on.