CIOReview
| | DECEMBER 20219CIOReviewData used to be a data science problem, but it is now becoming a compliance problem as well. With the release of Ethical Guidelines for Trustworthy AI by the EU and Fairness, Ethics, Accountability and Transparency (FEAT) principles by the Monetary Authority of Singapore (MAS), AI models and consequently, the data they are trained on, must be fair and ethical. This poses a question, if real data is biased and its acquisition process cannot be controlled, then what is the alternative? The answer to the above question is AI powered synthetic data to give data owners the option to control the generation process itself. This means that biases present in the real data can be removed with synthetic data. How? Let us find out.If an ML model is trained on a dataset that contains 70 percent males and 30 percent females, it will be biased towards males because of the skewed distribution. This introduces AI transparency issues that is a huge debate in the financial industry as well as the extended research community. This can be effectively rectified by giving enterprises the ability to generate a dataset that contains balanced distributions, i.e., 50 percent males and 50 percent females from a dataset that had an imbalanced male-to-female ratio. This is possible with AI generative models that operate on data types, so the scope of bias correction is much broader. As such, it can be applied among any number of data features, giving organizations the ability to be fair, ethical, and inclusive with their data practices. Extending this example, it can be applied to literally any data feature that is present in real data that can include gender, underrepresented regions, new users with no background credit history, salary levels, and different age bands.Data bias is now becoming one of biggest challenges for enterprises and a barrier towards mass AI adoption. It may be introduced anywhere within AI / ML model pipelines, from data acquisition to modelling and how model outputs are used for actionable insights and decisions. Even these issues are somehow magically fixed, the inherent challenge is still left; real datasets are not fair because the real world is not fair. Therefore, the only feasible way to train AI / ML models at scale for the world we want to live in, is via AI powered data synthesis to create synthetic datasets that are fair in terms of legally protected qualities and other important dimensions/features. AI fairness is a relatively new topic, but synthetic data will be the most important tool for avoiding damaging trends in the actual world. Indeed, there are early indications that synthetic data can be very useful while maintaining AI / ML metrics that match the ones organizations will get with real data. Sachin TonkBiased data can cause inaccurate AI / ML models to be trained and deployed, which can perpetuate discrimination
< Page 8 | Page 10 >