Updated: Sep 2, 2021


Women are employed in all workplaces, but are still greatly underrepresented, particularly in STEM fields. The 2017 State of Data Science and Machine Learning survey from Kaggle found that women represented just over 16% of the total respondents. Other reports have found that only 18% of data scientist roles are occupied by women, with less than 3% of the tech field being filled by women of color!


It was in this context that, in collaboration with Women in Data Science (Stanford University) and the New Haven Public Schools, we launched “Data Days”, an event tailored to sparking interest in data science among middle school girls. With the surge in interest in data science, now is a better time than ever to promote more women entering the field. And what better a time to start encouraging girls than middle school? A survey funded by Microsoft found that girls typically become interested in STEM subjects around age 11, and begin to lose interest around the age of 15 -- making middle school a vital time for interest outreach.


What was Data Days?

The inaugural Data Days was a two-day event hosted at King-Robinson Middle School in New Haven, CT. The event featured panels, keynote speakers and hands-on workshops, creating an inviting space for 50+ young girls to learn about and explore data science, both as a career, and as a generally widely useful toolset. 15 amazing data scientist volunteers joined us as group mentors for the event.


What was the feedback on the event?

Positive! We surveyed both the students and their group mentors before and after the event to gauge event impact. Overall, the students rated the event a 4.2/6 on average, saying the event gave them a positive impression of data science on a score of 4.4/6, and were 7/10 likely to recommend the event.


“I loved listening to the data scientist stories about how they got to be where they are and I loved how they explained the importance of data science.”

8th Grade Student


Mentor feedback expressed that the event was “refreshing”, and being able to impart knowledge of their field was “just epic”. They felt the curriculum and presentations were well structured and designed, and that “the girls got a good sense of what data science is and the importance of visualizations”. The mentors were generally very enthusiastic about the event and its impacts. 100% of the mentors responded to the survey stating that they would be interested in Data Days again in the future!


“It was so refreshing. I love kids and imparting knowledge of my field to them was just epic.”

Data Days Mentor


What was the impact of the event on students?

We saw a statistically significant increase in both interest in and familiarity with data science among students after the event! The group of students reported a 2.9/6 initial interest in data science, which increased by 24% to a 3.6/6 interest on average after the event. Of the 26 student respondents, 96% (25 total) said their interest in data science increased and 100% said their familiarity with data science increased. Average familiarity with data science, initially 2.3/6, increased by 74% to 4/6 familiarity.


“I liked how we started off really confused… but ended up coming up with a huge output.”

8th Grade Student



The students reported gaining new skills in data science, an awareness of datasets, confidence, and connections to role models/mentors after Data Days.


What exactly did the Data Days programming look like?

Data Days started off with a general session, introducing the group of middle school girls to data science on a broad level. New Haven’s instructional superintendent Keisha Redd-Hannans, whose enthusiasm and efforts were instrumental to the program, welcomed the students at the start of the event. The students then heard from Brigette Davis, Doctoral Candidate and Public Health Research Scholar at Harvard University, on her career path and work.


The girls then broke out into small groups, each with a mentor to guide them through a group data project. The small group workshops were a chance for the students to practice processing and visualizing data themselves. Each group worked on one of six projects we curated for a middle school audience. The project topics ranged from COVID vaccination rates, to popular TV shows, to Starbucks menu design.


Students had a chance to present their work after working with their groups. Survey responses indicated that both students and mentors enjoyed this group activity, particularly meeting one another. One 8th grade student remarked that she liked how they “started off really confused, but ended up coming up with a huge output” with the data project.



 

Student Group 15 analyzed shopping data from the Bureau of Economic Analysis

 

The event concluded with another general session career panel and final keynote presentation. The panel featured four speakers, Gracie Ermi (Research Software Engineer at Vulcan Inc.), Ivanna Williams (Research Scientist at Chan Zuckerberg Initiative), Marcelle Goggins (Data Engineer at RIPL), and Paula Maouyo (Product Operations at Stripe). They described their experiences working with data science in various fields, aiming to show the girls how wide data science applications can be. Data 2 the People’s Dr. Elena Grewal gave the keynote address, telling her journey through the world of data science starting in New Haven.


The girls mentioned they particularly enjoyed learning about the mentors and their careers. Another 8th grader commented that she “loved listening to the data scientists’ stories and how they explained the importance of data science”, rating the event 5/6 overall.


Continuing the Program

We are so excited by the success of our first Data Days even with King-Robinson Middle School and look forward to continuing to host programs across the country! If you are interested in getting involved in future events please fill out our interest form here.




493 views0 comments

Ameya Khanapurkar, Rebecca Grunberg, Dr. Elena Grewal


Post election day, Democratic leaders have not strengthened their majority in the House of Representatives. This result could have been the product of multiple factors, but this post focuses on Facebook spending by the individual Democratic and Republican campaigns in competitive House districts, finding that Democratic candidates in aggregate spent 1.8x more than Republican candidates on Facebook ads. We share the dataset publicly here.


This data shows that *amount* of spending is not where Democratic candidates fell behind.


We use the August 7, 2020 Cook Political Report House Race Rating to identify competitive House races and categorize races as: Likely Democratic, Lean Democratic, Democratic Toss Up, Republican Toss Up, Lean Republican, and Likely Republican. The August 7th date was chosen because it is 3 months prior to election day, giving time for candidates to raise money and allocate spend on Facebook knowing their race may be competitive. A few of the candidates moved between groups after August 7th which is indicated in the dataset.


In each group, most Democratic candidates outspent Republicans by 2-3x on Facebook, with the exception of the “Likely Republican “group which was approximately even in spending between Republican and Democrats.


Facebook shares spending in the past 7 days and then in the past 2.5 years. We collected the data on November 7th so we can share the spend November 1st-3rd for each candidate and from May 5th 2018 - November 3rd 2020.


(There are 3 uncalled races that we did not include to calculate the % of Dem wins)


There were multiple unexpected outcomes that favored Republicans over Democrats and none that favored Democrats over Republicans. Republicans won “Likely Democratic” FL-27, “Lean Democratic” TX-23, and two initially rated “Democratic Toss Up” FL-26 and SC-1 that were designated as “Lean Democratic” in the latest Cook’s report. The 27th district of Florida was the only race of these four that had the Democratic candidate outspent by the Republican candidate by a factor of 79x. The other races all had Democratic candidates spending more.


The median difference in spend is even larger if we look at spend in the last three days of the election, November 1-3 (Note: Facebook stopped political ads after November 3rd). Democrats spent even more, relatively, in the final days in the Republican leaning races, which saw a 2% success rate of flips (1 seat out of 46).


When we plot spend vs vote share, while it appears there is a slight positive correlation between Facebook spend and vote share, this is driven by races which were already rated as “Likely Democratic”.


Democrats mostly outspent Republicans across the board in their individual races, yet did not see the returns that the Democratic Party looked forward to. Whether this is a matter of effective messaging or external spending should be researched next. It is also possible that the Cook’s ratings were not accurate due to polling bias.


Additional hypotheses we could check next:

  • Messaging: Democratic messaging may be more nuanced compared to the Republican messaging, perhaps due to the coalition of progressives, moderates, and “Never Trump” Republicans that Biden assembled. For example: Police Reform means many things to the Democratic Party as a whole, whereas Republicans have rallied around the “Back the Blue” message. We can categorize messages qualitatively, and compare the amount of money invested into those messages in relation to vote share for these Democratic candidates.

  • Targeting/Ad Strategy: We can look at whether Republicans tested more ads than Democratic candidates and whether there may be differences in targeting used. This is harder to determine with the data available but we can investigate what we see.

  • External Group Spending: External group spending could have made a big difference for candidates on both sides. Additional data collection could look into spend by outside groups in a given geography and also the messaging employed by those groups.

  • Other Digital Channels: It may be that the Republicans spent more on Google ads/Youtube or other digital channels. This is additional data we can look into collecting.


Understanding the details of digital outreach and its nuances is the difference between simply throwing money at a problem and prudently looking for solutions. We look forward to hearing from you about what you think of this data and what to look at next!




77 views0 comments

Paula Maouyo, Dr. Elena Grewal, Sarah McGowan


This blog post outlines our working approach to practicing antiracism [1] as a community striving to bring data and power to the people. We at Data 2 the People firmly believe that the more openly we each share our learnings with each other, the faster we progress, and the faster we create the world we want. We believe we each have much to learn, and also that we each have something to teach. And so, as we forge ahead, let’s learn together and may

#eachoneteachone [2]


How we practice it concretely


Before we get started on specific steps, we first need to enumerate our principles for a fruitful practice of antiracism. These include: making a clear commitment to infusing this practice in everything we do; continually seeking to learn and iterating accordingly; being pragmatic in striving for progress, not perfection; maintaining low ego.


We believe it’s important to apply these principles to the following three areas, which are applicable to any organization:


  1. The team: It all starts with the people building the organization or community, so start with the team, with leadership modeling these principles and explicitly leading in normalizing discussion, action, and accountability.

  2. The work: Review how your line of work can be, and has historically been, racist (i.e. how it has assumed, even if non-overtly, supremacy or normalcy of one group over others, or unjustifiably privileged attention to one group over others), and test possible solutions to countering that.

  3. The industry: Communicate what you are learning from practicing antiracism to the broader community or industry, listen to what others are learning and sharing, and invite more people into the conversation and to take action.

Below are more details (non-exhaustive) on how we’re applying the above three-pronged approach at this time (and note, it is continually evolving!).


The team


Starting with the team is important because everything flows from the people. For a healthy practice, an organization’s commitment to antiracism/equity must be both top-down and bottom-up. Top-down, the leader(s) must lead by example in action and make explicit the importance of antiracism/equity to the team’s success. Bottoms-up, the broader team has to buy into and wholeheartedly recognize that antiracism/equity is critical to the team’s success, and to ensuring the most effective use of individual team members’ contributions.


Practicing antiracism/equity in building a team requires: recruiting people from a diverse range of backgrounds (racially and otherwise), fair assessment of candidates across backgrounds, equitable support of team members’ integration and growth after they join the team, and inviting the team to help us get better at all of the above. For us, that concretely looks like:



The work


Now, turning to the work we actually do. We exist as an organization precisely to help governments operate better and facilitate equal access to thriving for all. To help governments do this, we need to support and empower burgeoning leaders, organizers, and policy-makers who value and model equity. Here’s how we approach doing this:


The industry


We have no interest in being unique flowers in the industry regarding our insistence on equity, integrity, excellence, and truth-seeking. None interest whatsoever. Quite the opposite -- we desire for these to be industry norms. If another organization is modeling any of these better than we are, well, that will just push us to be better, and that’s a win! #eachoneteachone


Given our desired industry norms, we want to contribute to the ongoing conversation about racism and antiracism in the political data world. Hence this blog post. We also want to talk about antiracism explicitly in our conversations with campaigns, other political data science groups, with donors. Through these conversations, we can share ideas and learnings, and continue to learn, iterate, improve.


Now that we’ve shared what we’re learning and doing, what do you think? If you’re in the political world or political data world, we’d love to hear your thoughts! If you’re in a different industry or kind of community, how does the team/work/industry framework apply? Let us know! Let’s keep the conversation going, so we can grow with and learn from each other! May #eachoneteachone



 

[1] Our practice of antiracism is a fight against all forms of supremacy, i.e. any time any group or individual pursues their interests and desires at the expense of the agency of others. It is a fight for equity. It is a fight to build stronger communities, no matter their makeup; a fight for all communities to have the resources and the space to breathe, to grow, to learn.


[2] “Each one teach one” is a phrase I learned from a Black woman at a data science meetup back in 2016. I was struggling with something, I don’t remember what, and she helped me debug the situation. I sheepishly thanked her for her help, and she shrugged off my sheepishness saying the equivalent of, “Yeah, well I know you’ll be helping me or someone else with something before we know it. Each one teach one, you know what I mean?” It has since been verified by a personal Instagram-based survey (n=62...ish) that of all my Black and nonblack friends, only Black people have ever heard this phrase, and only from other Black people. In the spirit of operating generously, out of place of abundance, I share this phrase with you too, so that you, too, can incorporate #eachoneteachone vibes into your way of being, and collaborating/learning with others. You’re welcome!



156 views0 comments