
This pattern ensures GenAI development begins with user intent and data model required to achieve that. GenAI systems are only as good as the data they’re trained on. But real users don’t speak in rows and columns, they express goals, frustrations, and behaviours.
If teams fail to translate user needs into structured, model-ready inputs, the resulting system or product may optimise for the wrong outcomes and thus user churn.
How to use this pattern
Collaborate as a cross-functional team of PMs, Product designers and Data Scientists and align on user problems worth solving.
Define user needs by using triangulated research: Qualitative (Market Reports, Surveys or Questionnaires) + Quantitative (User Interviews, Observational studies) + Emergent (Product reviews, Social listening etc.) and synthesising user insights using JTBD framework, Empathy Map to visualise user emotions and perspectives. Value Proposition Canvas to align user gains and pains with features
Define data needs and documentation by selecting a suitable data model, perform gap analysis and iteratively refine data model as needed. Once you understand the why, translate it into the what for the model. What features, labels, examples, and contexts will your AI model need to learn this behaviour? Use structured collaboration to figure out.
Important Notes on Data Sourcing
As you gather or plan to collect data, carefully inspect it for quality, potential biases, and ensure robust data collection methods. Some of the popular tools are Pandas (Python) – For checking missing values, duplicates, and inconsistencies. IBM AI Fairness 360 (AIF360) – For detecting and mitigating bias in datasets. Apache Airflow – For orchestrating and monitoring data pipelines.
Follow best practises in Data collection, Data Documentation, Data Labelling as defined in DataCard policy
Design for Labeling. Correctly labeled data is a crucial ingredient to an effective supervised ML system