Using Data Analysis to Uncover Biases in Funding for Medical Research

Not sure what career in data is for you? 

Take the quizCoursesHow it WorksMentorsStudent SuccessBlog

Copyright 2021

CAREER TRACKS

RESOURCES

ABOUT US

GET SOCIAL

SCHOLARSHIPS

springboard medium
springboard instagram
springboard twitter
springboard linkedin
springboard facebook

ADDITIONAL SPRINGBOARD COURSES

Roisin Cullen is a Business Analyst for Providence Health Care in Vancouver, B.C., where she is working on a new state-of-the-art hospital facility. She is also a certified CAPM, with the skills required to ensure effective project management throughout the project life cycle. She holds an M.S. in Pharmaceutical Business and Technology from Griffith College Dublin.

Shoumik Goswami is a Lead Business Analyst and Data Scientist by profession. Shoumik works with clients across the globe to help them transform their ideas and vision into working products. An IIM Ahmedabad alumni, Shoumik has 5+ years of enriching experience in the data industry where he has worked with companies like Standard and Poors and Fidelity International. He holds a mentorship position with Springboard for the Data Analytics Track and is a member of Springboard’s Mentor Advisory Board.

About the Authors

How do you grab the attention of a roomful of executives? Visualize their program data effectively and levels of engagement will increase. While working for a stem cell research company, Róisín Cullen, a data business analyst located in Canada, had the opportunity to improve many programs and processes by understanding what the pain points were. Senior executives are remarkably busy people, and often miss the little details that impact operations. Roisin was able to refine the process from one that was full of bias to one that was data-driven, by improving reporting and creating interactive dashboards that allowed decisions to be made quicker and remotely.

Although she’d taken a few courses in SQL and relational databases over the years, Cullen soon realized she needed to deepen her knowledge of data visualization to create more “Aha!” moments in her career. So she enrolled in Springboard’s Data Analytics Career Track in 2020, and, when it was time to select a topic for her capstone project, she went for familiar territory: medical research. 

Most medical research in Canada is funded by private and public sector grants. Medical researchers compete for grants in open competitions held by funding agencies, such as the Canadian Institutes of Health Research. CIHR is Canada’s federal funding agency for health research and collaborates with researchers to advance innovations in health R&D throughout the country. In her capstone project, Cullen wanted to explore whether CIHR’s funding approval process exhibited any biases, such as giving more funds to certain institutions or researchers of a particular gender.  

Like in most competitions, entrants have limited insight into the selection criteria, who makes the decisions, and which factors make someone more or less likely to “win” funding approval. The same is true for major awards ceremonies like the Oscars, whose voting committee remains largely mysterious; the murky and often controversial U.S. college admissions process; even which pop songs end up getting the most radio airplay (and are therefore more likely to be ranked on the Billboard charts). Olympic figure skating is one good example of a competitive sport often plagued by a lack of transparency in its judging system: judges have the freedom to vote anonymously, and the majority were found to give higher marks to skaters from their own country, as revealed in a study by a Dartmouth economics professor. Fundraising itself is fraught with rampant discrimination. In 2017, less than 2% of venture capital dollars invested went to companies founded by all-women teams, while 79% of funds went to startups with founding teams that were exclusively male. 

In 2017, CIHR released some of its application data to the public after Canada’s health research community called for more transparency around the balance of funding across genders, career stage and other factors. Cullen was curious to see if the application data would reveal any imbalances in funding allocation, or other preferential criteria that affords one group advantages over another. 

Analyzing data to uncover bias

CIHR provided an open source microdata file on a subset of competitions from 2012-2016, which contained two datasets:

The first dataset contained demographic data on the applicants, area of research proposed, amount of funding requested and so on. The second dataset revealed which applications were approved for funding.

In her analysis, Cullen wanted to explore two problem statements:

  1. Is there any bias in the way CIHR approves grant funding?

  2. Are there any preferential criteria researchers should know about in order to improve the likelihood that their grant application is funded?

While the dataset was relatively large, which allowed Cullen to test a number of variables, the data was curated by CIHR and represented only a sample of the dataset, potentially leading to selection bias. Certain information was withheld by the agency to protect individuals’ privacy, and the dataset was limited to just eight competitions and three grant programs over a four-year period. “It’s really up to the person using the data to have faith that the data was given as a whole without anything being changed,” said Cullen.

To map out the different hypotheses she wanted to test, Cullen created an issue tree. Otherwise known as a “logic tree” or “hypothesis tree”, an issue tree is often used by management consultants to solve a client problem in a systematic way. These flow diagrams map out a series of statements with “yes” or “no” answers to arrive at a conclusion or set of conclusions after certain hypotheses have been validated or refuted.

In her data analysis, Cullen investigated each of the following variables to determine whether CIHR had any preferential criteria that might point to a biased selection process:

  1. Amount of funding requested

  2. Number of years for which funding is requested

  3. Gender

  4. Area of science

  5. Institution

  6. Province

  7. Application deadline

Rooting out discrimination in judging practices for grants, competitions and awards is a strong use case for this type of data analysis but given a larger dataset a similar technique could be applied to create recommendation systems that advise applicants on the likelihood that their application will be approved based on a set of criteria. 

“This kind of data gives folks insight into which types of research they should aim for, and which research areas have the highest chance of being approved [for grants],” said Shoumik Goswami, a senior business analyst at Fidelity International and Cullen’s mentor during her Springboard course. “A dashboard like this becomes very important for Ph.D. candidates who are in the initial stages of planning their research.” 

Outside of academic research, such a tool could be useful for calculating the probability that a high school senior will be accepted to a particular university based on their SAT scores, the topic of their application essay, extracurriculars, and so on by cross-referencing historical admissions data.

A 4-step approach to assessing trends in grant approval data

Cullen used histograms (bar charts), regression charts, pie charts, and box plots to visualize her data. Box plots show the distribution of data by displaying a five-number summary of a set of data, including the maximum and minimum values, upper and lower quartiles, and the median. 

This type of data visualization makes it easy to spot outliers or extremes. 

1. Application approval rate and amount of funding requested 

An initial analysis of the data showed that on average, just 12% of applications submitted each year are approved. Cullen found that only two factors significantly impacted the likelihood an application would be approved:

  • Amount of funding requested

  • Number of years for which funding is requested

The average amount of funding requested for approved applications was CAD $1.15 million. Funding requested outside of the range $400,000-$1.9 million was considered an outlier and less likely to be approved, with no funding being approved in excess of $11 million. 

The data shows that applicants may be unaware of these parameters as they are not explicitly stated in application materials and were only revealed in the data analysis. Case in point: the maximum amount of funding requested for all applications was a whopping $21 million, but the largest grant for approved applications was only $10.85 million. 

Years of funding requested for approved applications was 4.5 years on average, with no funding approved for projects over nine years. Applicants could have benefited from knowing what a “realistic” grant application entails: a maximum grant size of $11 million over a period of nine years or less. 

The data appears to show that as long as an application fell within these boundaries (maximum grant size of $11 million over a period of nine years or less), there was very little correlation between these two variables and whether or not an application was approved. 

Requested amount correlation: -0.074300329

Requested years correlation: -0.037637259

As a rule of thumb, an ‘r’ coefficient greater than 0.75 is considered a “strong” correlation between two variables. The closer a correlation is to zero, the weaker it is. However, these boundaries are not clearly defined in the application criteria, meaning that researchers with innovative ideas could be denied a grant simply because they are unaware of what constitutes a “realistic” grant application, thereby proving the usefulness of Cullen’s analysis. 

“If each applicant was given their score breakdown they would be in a better position to know how to improve their application for the next round,” said Alison Cosette, director of research and development at NPD Group and a mentor for Springboard’s Data Science Career Track. “We would create a feedback loop that raises the level of all submissions. This is one benefit of transparency.”

2. Gender

In her initial analysis of funding across genders, Cullen found that the majority of approved applicants (nearly 58%) were male.

Before jumping to conclusions of a gender imbalance, Cullen performed a further analysis. The reason for the split was that nearly twice as many males (14,551) submitted applications as females (7,346). A closer look at the acceptance rates for applications submitted by males (13%) versus those submitted by females (12%) revealed a 1% difference in funding, thereby disproving theories of a gender imbalance in CIHR’s selection process.  Rather, Cullen’s analysis shows that the gender imbalance starts further upstream—potentially from a shortage of females in medical research to begin with.  

Cullen’s findings are an example of Simpson’s Paradox, where the data shows different trends when it is separated into groups versus when it is viewed in the aggregate. Even so, Cullen says her findings could form the basis for a further investigation into why fewer female medical researchers apply for funding in the first place. 

“It could mean a few things,” she said. “Is it that we don’t have enough females in leadership positions at the organizations that are completing these applications? Or could it be a lower-level researcher who’s filling out the application again indicating there may be less females in scientific research? It really depends on what’s going on at the organization itself.” 

3. Area of research, province, and institution

Cullen discovered a similar trend when she analyzed the amount of funding awarded across research categories and provinces. Certain categories and provinces appeared to have an advantage over others until other variables were factored in—some of which were not part of the CIHR dataset.

Certain provinces that attracted the most funding — Ontario(43%), Quebec (28%) and British Columbia (14%)—did so because they have the largest populations and research communities. As such, the universities and research institutions within these provinces also received the lion’s share of funding, with  11.4% of funds going to the University of British Columbia, 9.6% to the University of Toronto and 6.9% to McGill University. Even if these funding disparities are due to population density rather than selection bias, Cullen says her findings can be used to investigate the underlying reasons why certain provinces have smaller research communities, what socioeconomic factors might correlate with these findings, and what kinds of policy and funding initiatives can help rectify these inequities. 

Next, Cullen tested to see if certain research areas were favored over others. CIHR provides funding across four areas of science: biomedical, clinical, health system services, and population health. An initial analysis of the aggregate data showed that funding for biomedical research far outstripped the other categories, suggesting a skewed selection process, but further analysis revealed that the initial preponderance was due to a higher number of applications being submitted for biomedical research compared to other categories. The approval rate for biomedical research was still slightly higher at 13%, compared with an 11% approval rate for the other categories. In fact, the most common area of science identified by approved applications was “Unspecified,” indicating that this variable is not crucial to the decision-making process.

4. Application deadline

CIHR accepts grant applications twice a year in March and September. Cullen’s initial analysis found that more applications were submitted in March than in September and that the initial number of applications approved appeared to show a bias for March. But upon analyzing the acceptance rates—12% for applications submitted in March and 13% for applications submitted in September—she found that the differences were negligible. 

Investigating other inequities 

Overall, Cullen says the application process was fair across all analyzed variables, although the analysis was based only on a subset of competitions. 

“Although my analysis found that the application process was unbiased and there was no preferential criteria—based on the limited dataset CIHR had made publicly available—I think it did a good job of providing researchers with more information about what parameters they should stay within to make their applications more likely to be approved for funding,” said Cullen. 

Another advantage to this type of analysis is it revealed biases outside of the CIHR selection process that could be investigated, such as the gender imbalance in the researchers who apply for funding and the fact that smaller provinces have less active research communities and consequently attract less funding. 

“From that perspective, [the government] could look at creating programs in those provinces to bring more health researchers and scientists into those provinces,” said Cullen, “and offer more programs at those universities.” 

While the social impact of increasing transparency around the decision-making process for competitions, awards, grants and even algorithms is obvious, businesses face an unknown world of liabilities if they opt to do so, hence why many organizations continue to keep these details close to the vest.

“If an organization makes their process and factors transparent they also open themselves up to scrutiny,” said Cosette. “Some organizations are reluctant not because they are consciously biased but because they fear being held accountable for an unknown bias.  So rather than this being an opportunity to have a positive feedback loop of ‘Hey, you can do better in this area,’ there is a concern about the impact of cancel culture if any flaws are revealed.”    

Looking to the future

Already knee-deep in her new business analytics job since graduating from Springboard, Cullen said she does not have any immediate plans to continue working on her capstone project, but she is enthusiastic about the prospect of using data to shine a light on things that would otherwise remain shrouded in mystery—such as the procedures by which competitions are judged or grant applications approved. “I’ve seen how data analytics and data visualization can change a stakeholder’s perspective of their program or their organization,” she said. “I want to continue to use these tools to help bring insights to others in my career.”

Executive summary

The judging process for awards and competitions is shrouded in mystery. From controversy surrounding the Oscars voting process to the U.S. college admissions process being mired in scandal, there is clearly a need for more transparency around how awards, grants and other opportunities are distributed. Roísín Cullen, a former student in Springboard’s Data Analytics Career Track, used data analysis to investigate the funding approval process for medical research in Canada. She used a publicly available dataset released by the Canadian Institutes of Health Research (CIHR) to assess whether there were any biases in CIHR’s selection process across a range of variables including gender, institution, province, area of research, amount of funding requested and so on. 

Cullen’s analysis revealed there was no bias in CIHR’s funding approval process across these variables once factors such as acceptance rate were taken into account. That being said, Cullen’s research evinces two important insights:

1. Applicants could benefit from having more information about what constitutes a “realistic” grant application. Projects approved for funding fell within certain parameters for the amount of funding requested and years of funding requested, but these boundaries were not stated in the application criteria. Consequently, researchers with innovative ideas could be denied a grant simply because they are unaware of what constitutes a “realistic” grant application.

  • The majority of approved applications fell within the $400K - $1.9M range, with an average of $1.15M indicating that there's an implicit criterion for the amount of funding requested.

  • Funding was approved for 4.5 years on average, with no projects approved for over 9 years. This implies that there's also an implicit criterion for the time period for a research project. 

  • Recommendations: Make the CIHR application criteria for medical research grants more transparent for applicants by specifying what amount of funding and years requested constitutes a “realistic” application. Alternatively, CIHR can offer feedback to researchers whose application is rejected on these grounds and offer them a chance to resubmit their application.  

2. While there were no biases in terms of institution, gender or province in CIHR’s approval process, Cullen’s analysis revealed inequities that occur further upstream. Over 66% of applications were submitted by male researchers, and 85% of the funds went to three provinces with the largest populations. However, there were no significant differences in approval rates within these populations, indicating inequities that occur further upstream, such as certain provinces having a more active research community. 

  • Recommendations: Investigate the inequities revealed in Cullen’s analysis concerning inequities that occur further upstream. The investigation should include what socioeconomic factors might correlate with these findings, and be used to introduce policy and funding initiatives to rectify these inequities.