How to scientifically survey all of Steam's developers

If you ever wanted to survey every Steam developer in existence, here's how you might do that.

(This is a companion post-script article to Operation Tell Valve All The Things, 3.0)

Every time I do one of my Steam developer surveys I get people telling me my results are invalid for this or that reason, and they're not wrong. The thing is, the status quo is just people loudly yelling on the internet, and assuming that everyone already agrees with them. I find that being even a little bit systematic in your approach really opens your eyes to how people really feel, especially people who might be reluctant to publicly contradict the loudest voices.

That said, there's a danger in sloppy surveys, because official looking numbers and graphs can lend a false veneer of credence and authority to someone who just wants to prop up what they already believe with junk statistics.

The best guard against this is replication. So if you think my surveys stink, I encourage you to do one yourself. I spent 48 hours on this last one, which means anyone with basic academic chops could easily improve on it. But if you want to be super official about it, here's a rough sketch for how you could do something that might even pass formal peer review.

Caveat: I'm not a "real researcher" by any stretch of the imagination, consider this a starting point and be sure to actually consult the survey design and statistics literature.

Who can even do this?

Well, Valve can (and should), for one. They have direct access to all their developers, and all their relevant statistical data (sales, country, # of games on Steam, etc). The trick with Valve doing the survey is that people are scared to tell Valve the truth. I still encourage Valve to do a systematic survey, making every assurance they can that they really want to hear feedback, even and especially negative feedback, and that there will be no retaliation. Valve has the unique ability to easily make a secure, validated feedback form that's also anonymous.

Of course, some developers may not trust it. "Sure, you say it's anonymous but I bet you're secretly logging my username so you can round up the dissenters!", but I think the size and comprehensiveness of the kind of sample Valve could achieve might compensate for that limitation.

Valve could also hire an independent research outfit to do the survey for them -- this is pretty typical for hospitals and doctor's offices, etc. Valve could set up the basic contact infrastructure or send out survey invitations in such a way that the third party collects and processes the actual responses, and merely presents Valve with a final report. Hopefully this makes people feel more free to tell the truth. The downside to this method is compliance with data privacy laws like the GDPR. But Valve has lots of smart people and lawyers, so I trust they could figure it out.

Finally, a totally independent researcher could do it themselves. If you're a grad student looking for a video game themed thesis project, this ain't a bad one. The problem is that we game developers get so many random surveys in our inboxes from random high school and college students that we just don't reply to them anymore, so for this to really gain some traction you'll need to establish credibility first (or offer an incentive to take a survey).

It might make sense to collaborate with someone with existing community trust. Some likely candidates might include journalistic outlets like Gamasutra and Ars Technica, researchers like the Game Outcomes project, or maybe even Youtubers with great investigative journalism skills like Super Bunnyhop.

Maybe a community website like SteamDB could do it. Or you could just strike out on your own and build your own stature and community trust -- it can be done.

What follows below are the steps for how a totally independent researcher might go about an actually scientific survey of as many of Steam's developers as possible.

What do you even want to know?

Figure this out before anything else. What are you trying to prove or disprove? What's the clearest, simplest, way you can test that by asking questions?

In my latest survey, this is what I wanted to know:

  • What are the most pressing issues devs have with Steam right now?
    • What are the #1 and #2 issues?
  • Is there anything everybody agrees on? How united/divided is the community?
    • (Large/rich) vs (small/poor) developers
    • (Anglosphere bubble) vs (rest of the world)
  • Have feelings changed significantly compared to the last two surveys?

This actually differed from previous years, which were more like "technical wishlists" with a few value statements mixed in. Our survey results show that Valve actually has a pretty good response rate to technical wishlists (to date we've gotten 45-50% of what we asked for in the last two surveys). Technical wishlists are much less sexy and probably less important in the grand scheme of things than "big issue" opinion surveys, but they have the advantage of being more narrowly focused and immediately actionable.

Sample ALL of Steam

The main issues with my sloppy survey's sampling:

  • There was no way to ensure developers surveyed actually were developers with games on Steam. I'm sure nearly all of them were, but I can't prove it.
  • It wasn't a random sample, and is biased towards English speakers because the survey was not translated (except into Russian). Also, I relied on my existing contact network to discover new gamedev communities through secondary and tertiary connections.
  • It's a small sample. ~200 respondents isn't awful but you can get a lot more.

So let's just sample all of Steam! How do we do that?

  1. List every developer on Steam
  2. Gather relevant info about each developer
  3. Gather contact information for each developer
  4. Contact them securely with a validated survey form

It helps that the every game on Steam has a publicly identifiable developer associated with it. Now, not every developer has one of those fancy new landing pages set up, but there's probably a way to scrape a comprehensive list of developer names just from Steam's API. SteamDB might even already have such a list in their records or API. So it's probably easy to create a comprehensive list of every developer on Steam, as well as also how many games each has released, at what times, along with basic performance metrics (number of user reviews correlates very roughly with overall owners).

SteamDB basically has all this info already so you can probably just use their API or ask them politely for it. And if that doesn't work you can figure out how to scrape it yourself. But now comes the tricky part - turning a list of developers into contact information. There's unfortunately no central database for this and I'm not sure there's any solution other than brute force sleuthing. You could probably write a web crawling script that follows links on developer pages (or googles their name hoping to find a home page), and looks for anything like an email address that a human can then go through later and clean up. Or you can do a bunch of clicking yourself, or send out a lot of friend requests on Steam to developer accounts, or a bunch of other exhausting alternatives. In any case, the ideal goal is to somehow generate a final list of at least one email address for each developer/publisher.

Okay, you've somehow gotten reliable contact information for every developer/publisher on Steam, or at least a significant enough amount to make a proper random sample. Now we need to contact them and send them our survey.

Get Responses

A key thing to consider with survey design is response rate. The longer your survey is, the less responses you'll get. Making a question mandatory ensures you'll get a response from everyone, eliminating the selection bias of people who choose to skip the question, but it also makes people more likely to just abandon the survey if they're forced to answer a question they don't want to. You'll also get more responses from an anonymous survey than one that collects identifying information.

Another problem is that the survey itself is inherently optional - you can't force people to take it. This introduces further selection bias in respondents -- namely, those who feel the strongest are the ones most likely to answer. A possible counter to this is to offer an incentive for filling out the survey. Not only does this boost the response rate, when done correctly it can even out the selection bias because now people who feel less passionate will take the survey instead of ignoring it.

Common survey incentives are $10 gift cards for every respondent, an entry in a drawing for a free iPad, etc. The right incentive can increase your response rate and ensure a better and more representative sample, but the wrong one can become a "bribe" and actually distort your survey, especially if people just want the prize and start trying to subvert your survey by e.g. trying to take it multiple times. Incentives can be expensive, and you must be clear about your prize terms if it's just a "chance" to win. Another complicating factor here is you'll need an incentive that's equally appealing to developers from other countries -- and I'm not even talking about exchange rates and cultural differences here; I just mean the basic fact that a $10 Amazon gift card or whatever probably has some sort of region lock that makes it useless to international developers.

Validate Responses

You want to be able to prove at least two things about your respondents:

  • They actually are the people you expect them to be
  • They can only answer the survey once each

If you have email addresses you know are used by specific developers, and you're not going for an anonymous survey, this is pretty easy to do. You just make sure your survey collects email addresses (google forms is set up to do this with very little added friction, which is convenient but also a little disturbing if you think about it -- there are of course many other great alternatives). This way you can simply throw out any responses that don't come from an approved email address, and you can rest assured that everyone who responded is an actual registered Steam developer. This still leaves open the possibility that someone had their non-Steam-developer brother-in-law or other random person fill out the survey for them, but that's a remote concern. More importantly, this method ensures no one can answer the survey more than once undetected. Even if they do manage to double-respond, you'll see it in the records and can just go by the earliest timestamp for each email address.

If you are going for an anonymous survey, this gets a bit harder on your end. You'll need to generate a unique survey link for each developer that is either a full clone of your base survey, or somehow reliably tracks who responded to it in a way that doesn't expose their identity, such as assigning them a unique numerical id. To be secure, this method requires that no respondent can feasibly masquerade as another respondent, for example by adding one to the survey URL's numerical suffix. I'm not intimately familiar with all the professional survey software out there, but I'd be surprised if there wasn't some solution for this, even if it isn't free. If not, you could always enlist some web dev skills and just build your own.

Optional Disclosures

If you elect for an anonymous survey, people can still opt-int to identifying themselves. You can rely on self-reporting (just stick your name and details in this field and we'll take it at face value), or you can split the survey itself in the invitation email. Anonymous users follow a link that collects no personally identifying information, and others follow a link that records their email address. The trick with this method is that there's no easy way for someone to change their mind mid-survey without abandoning the survey they're filling out entirely, which makes them more likely to just not respond at all. If you know a developer's identity, you likely don't need to ask many optional questions about their games and business - you can probably just cross reference SteamDB and other source of public information.

But here are some things you might still ask that only a developer would know:

  • Revenue (per game and overall lifetime)
  • Number of full time employees
  • What other stores they're working with or are considering working with

Keep in mind that every extra question you ask, even optional ones, can lower your response rate. Don't make the mistake of trying to collect more data than you need, and always circle back to your original research question. For every survey question ask, "does this help me answer the research question?" If it doesn't, toss it.

Design the Survey

Survey design is a subject unto itself, and I freely admit to being a complete amateur. For instance, take my research question -- "What are the top issues on Steam?" If you were surveying that, first you'd need to decide what issues to even ask about. Will that be part of the survey? Will you have a preliminary phase to gather issues? If so, who participates in that, and how long does it last? Keep in mind tht running two surveys in a row on the same population will drastically lower your response rate.

Next, how do you want people to answer that question? Rate each issue 1 to 10? Can I pick "1" for as many issues as I want or am I forced to pick only one that's the most important? Can I vote separately on whether I happen to agree with an issue, as well as whether it's also important to me? Should I?

Whatever you do, I advise you to keep it simple and straightforward and eschew any complicated rules for the respondent to follow. Also, whenever possible, make the survey form itself enforce any constraints like, "you're only allowed to mark one issue as the top concern."

In general, if you're asking the kind of research questions I am, your survey might look something like this:

  • What's the most important issue right now?
    • Rank X issues, assigning each one a value between 1 and 10
    • You can't give two different issues the same rank value
  • Opinion polls -- Likert scale "do you agree/disagree" questions
  • What kind of developer are you?
    • Geography/location
    • Revenue
    • Number of employees

A survey like this, conducted across a wide enough sample, with a tight enough design, would paint a pretty good picture about how Steam Developers really feel, instead of just reflecting the noise of someone's local echo chamber, or allowing some random dude to pretend their own personal opinion has more widespread support than it actually does.

I do not think I will be doing another of my amateur surveys, so I really hope someone in the community (and especially Valve) get serious about systematically listening to the developer population, and with superior methodology to mine.