Researchers from King’s College London are currently working on a study investigating ways to model the factors that can predict the success of applications in innovation programmes like MediaFutures. The growing availability of administrative data from open innovation programmes is creating new opportunities for enhancing and auditing their selection practices. Previous research has explored the use of past data to build predictive models for shortlisting applicants and allocating resources more efficiently during review. Another avenue of research relates to interpreting the model parameters to ensure that the trends detected in past data align with the intended objectives of public funders.

The whitepaper the team is working on presents a quantitative investigation of open call applications and selection decisions from three EU-funded data incubators that operated between the years of 2016 to 2021. Using data from 725 applications received by the DMS Accelerator, DataPitch and ODINE programmes, it presents a methodological approach for quantifying unstructured aspects of application texts and team characteristics, and for measuring their explanatory power in predicting the acceptance or rejection of an applicant. It also discusses tools for obtaining demographic metrics from applicants’ names where explicit data on equality, diversity and inclusion (EDI) are unavailable.

Based on our analysis, we find that the use of past data for predictive modelling is not yet a viable option. We attribute this to the difficulty of quantifying subjective selection criteria and the shortage of data that capture successful applicants. Nonetheless, we were able to detect statistically significant trends in the existing data and to audit the parameters that contributed to companies’ chances of being selected. We found that disciplinary diversity and longer application answers consistently contributed to acceptance decisions. This suggests a strong preference for diverse expertise within teams and companies that make the effort to submit informative responses at application stage.

Other metrics such as the linguistic content of applications, team size and demographic diversity did not appear to be associated with selection decisions. Of these, demographic diversity is worthy of further attention. Although we were not able to detect any overt biases against teams containing women and non-European ethnicities, we were also unable to prove a selective preference for teams that were inclusive of this kind of diversity. We estimate that only 19% of individuals named inside applications were women, and that 19% came from ethnic minorities (11% Asian, 8% African). Our findings underscore the European Commission’s ongoing concern with pursuing better representation of women and ethnic minorities in industries affiliated with data and AI.

We conclude the paper with a summary of methodological implications and recommendations for other innovation programmes and researchers who may be considering the use of machine learning approaches for predictive or auditing purposes in the selection process.

In practical terms, this work has contributed to the design of MediaFutures’ second open call, and the inclusion of EDI details in application forms.