Working With Informatica Enterprise Data Catalog: What It Does That Your Wiki Doesn't
One of the clients this year runs on Informatica's catalog stack — Enterprise Data Catalog for discovery and lineage, Data Quality for profiling and rules, Axon as the business glossary layer. The expectation when I arrived was that these three tools, integrated together, would provide a complete metadata management solution. The reality was more complicated.
Here's an honest assessment of what EDC does well, what it doesn't, and what you'll need to build around it.
What EDC Actually Does
Enterprise Data Catalog is a metadata scanner. You point it at a data source — SQL Server, Oracle, HDFS, Databricks, S3, a JDBC source — and it crawls the schema, captures column lineage if the source supports it, and builds a searchable catalog of what exists. The scanner coverage is genuinely broad; the out-of-the-box connectors cover most enterprise sources without custom development.
The lineage tracking is where it earns its license cost for the right use cases. For SQL Server, EDC can parse stored procedures and views to extract column-level lineage — which source columns feed which target columns, through how many transformations. For ADF and Informatica PowerCenter, it integrates directly with the tool to pull lineage from job definitions rather than inferred from SQL. If you're in a compliance-heavy environment and need to answer "where does this PII field come from and where does it go," EDC with good scanner coverage gives you that answer.
What the Docs Don't Prepare You For
The scanner configuration UI is brittle. Connection profiles for some source types require specific JDBC driver versions that aren't well-documented, and the error messages when a scanner fails are not always diagnostic. Budget time for scanner setup and testing on each new source type — it's rarely plug-and-play.
Lineage quality degrades fast when pipelines use dynamic SQL or pass table names as parameters. EDC can parse static SQL; it cannot infer lineage from EXEC sp_executesql @DynamicSQL. Metadata-driven pipelines — which I build a lot of — are effectively invisible to the lineage engine. You'll need custom lineage entries for those patterns.
# EDC REST API — create a custom lineage relationship
import requests
def add_custom_lineage(
edc_url: str,
token: str,
source_column_id: str,
target_column_id: str,
transformation_label: str
) -> None:
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
}
payload = {
'sourceObjects': [{'id': source_column_id}],
'targetObjects': [{'id': target_column_id}],
'lineageType': 'TRANSFORMATION',
'description': transformation_label
}
resp = requests.post(
f'{edc_url}/access/1/catalog/data/relationships',
headers=headers,
json=payload
)
resp.raise_for_status()
The Axon Integration
Axon is the business glossary — owned by business users, linked to technical assets in EDC. When the integration works, a business analyst can look at an EDC asset and see what business terms are associated with it. When it doesn't, you have two separate systems that don't know about each other.
Getting the Axon-EDC linkage working requires manual curation of term-to-asset mappings, or an API-driven bulk import. The Informatica teams are responsive about guidance, but the documentation for the linkage API is thin. This is the part I ended up patching with custom tooling — more on that in the next post.
The Honest Summary
EDC is the right choice if: you need broad connector coverage, your lineage is primarily from static SQL or Informatica tooling, and you have resources to spend on scanner configuration and curation. It's not the right choice if your pipeline layer is metadata-driven, heavily parameterized, or uses non-Informatica orchestration tools without documented connectors. Those gaps are real, and they require custom development to close. As always, I'm here to help.