Our current rating system is flawed

5-Star ratings are a pointless

Mar 25, 2021

Thank you for taking the time to read this and I hope you walk away with a new perspective of how we use everyday technology. If you enjoyed this post, please do share with your network.

Having done two more intense reads on tech policy and human behaviour on social media, I thought I’d take a bit of a breather and talk about something more light-hearted, albeit something that that has bothered me for a while (because clearly my brain likes to focus on the real issues out here).

App store ratings system.

Specifically, user reviews/ratings for digital products.

Why am I going to talk about something that 99% people in the world do not care about? Because they are fundamentally flawed and don’t serve their purpose. It leads to a poorer experience for the consumer and gives developers a headache. For some reason, we are ok with it all. The world is moving fast and we are going through a technology revolution. We’re breaking down norms of working in the office and going remote - we are literally in the process of deconstructing traditional forms of working to accommodate for the new technological world. But not with user experience ratings for digital products. We still rely on an old backwards system for something that was created today (mostly anyway - some places have actually made good progress on this, which I will discuss).

“Calm down, it’s not that deep, no one cares” I hear you shouting(?) before you move off this post. If that was the case, why would people go to all lengths to create a scam around app reviews and there wouldn’t be services around selling fake app store reviews, or people asking about them. There are literal businesses around fake reviews, and people make money. That must mean something.

How are they flawed?

To understand why the rating systems are flawed we need to break down the original fundamental purpose of rating something and how they are now abused today.

1. Digital products are dynamic but reviews are static

When traditional media, like a movie or an album, is released, it is watched and reviewed based on their experience and opinion. Normally the review is followed by a subjective rating on an arbitrary rating system. This system can count up to 4, 5, 10 or 100, or sometimes it's measured in words, like 'Bad', 'Good' etc. This system rating is then used to judge whether the movie is worth the watch. It informs the user exactly what to expect. Normally, the review website, such as IMDB, will have a review from 'movie critics' and a section that aggregates the scores and comments from basic peasants that watch the movie as well.

Some weirdos LOVE watching a terrible movie and actively search for a bad movie. Either way, the aggregated scores from the critics and non-critics can give a good overall idea of the movie contents. Why? Because from the point of reviewing the movie or product and the point of the viewer watching the movie, no matter how much time has passed, the movie is exactly the same.

If I watch Toy Story, the movie I watch will be exactly the same movie that people watched in 1995. Sure, time and society may have changed, but the actual movie you watch will stay the same. If I go to talk about Toy Story to someone who watched the movie in 1995, the specific consumption experience of the content will be the same, with the differentiating factor being the subjective impression it had on me.

The same can not be said for video games and applications today. If I played a game, say No Man's Sky on release, I will have had a wildly different experience from someone who played the game today. The reviews you read from the release of the game will be really outdated, as the game has massively changed from what it once was. On release in 2016, the game was considered a flop - it overpromised and then felt the burn due to failing to deliver the hype. However, the development team stuck it through and now has delivered what is considered a game ‘that is good now’.

PutridPete writing the most helpful user review on release of how ‘dissapointing’ it was, giving the game a 4 out of 10

If I search for reviews, one of the top results is 'No Man's Sky Review (2020)'. I have to actively search for the original release reviews in 2016. Some websites class it as a ‘re-release’, but I would disagree. It is the same game, a moving target with constant updates like social media sites, so why are reviews static? Would we review social media sites, such as Facebook, Instagram or Twitter once when they release and never again?

Yes I use Bing. I want them Bing points. I find what I need and I have enough for a months worth of Xbox Games Pass. What do you have after using Google and giving your life away to them?

2. Digital stores use product reviews to determine what to promote, encouraging fake reviews and review bombing

App stores like Google Play Store or Apple App Store, as well as digital games stores like Steam and Epic Games stores, promote products based on their reviews (where product is an app, game, DLC, mods for games, etc.). These algorithms have been studied and people have tried to make sense of them. Obviously, the actual algorithms are not shared and therefore we don’t know exactly how it works, but I suspect there are marketing agencies out there who have got pretty close.

What this means is that if you mess up your launch for your game or app and get terrible reviews from users, you’re most likely doomed. You will be lower in the search rankings reducing your visibility - if visible at all. It doesn’t matter if you fixed everything, as it will be shown lower down, unless people go to the effort to give positive reviews. Getting people to review something is tough enough as it is - this study says around 1.5% of users. The study states, that for every 1000 users, 15 write a review meaning 15 people are influencing the perception of the product experience for 985 customers (I would recommend reading the study if you are interested, they go into a nice break down of how people review things and the purchasing behaviours from reviews). The thing is, around 60% of people do read reviews before purchasing, even though the average person doesn’t write them.

It also appears that the average consumer knows about these algorithms. When something goes wrong, people resort to review bombing, essentially coordinating and giving negative reviews on a product, usually for events outside of the performance and experience of the product. Fallout 76 experienced this and more recently, Robinhood trading app (which were subsequently removed by Google, only for them to experience it again). This normally happens as the users feel helpless and want to send a message to the creators. I don’t really know how much review bombing affects the algorithm promotion overall, but it can affect the developers, as the game or app will always have that proportion of negative reviews.

3. Reviews are nearly always binary

When have you ever gone on a digital product review and thought ‘let me read a balanced 3-star review’? What’s even the point of a 2-star or 4-star review? Most people either leave a 1-star or 5-star. The other stars add minimal value. It becomes a binary system leaving the middle stars as an average. Are you less likely to use an app if it is 4.2 rather than 4.4? Are you more likely to use a product if it is a 2-star rather than a 1-star? Most likely no. Adding in so much choice to a generally binary decision process is not needed and adds unnecessary friction. We don’t like choice as much as we say we do.

So how should rating systems be?

Well, one digital store has done a brilliant job of modernising it’s rating system.

The video games store, Steam.

Let me break this down.

1. Create a binary system

Steam reviews are either positive or negative. That’s how it should be. Games (and apps) generally leave either a positive or negative impact or feeling on you. You either like it or you don’t. No need to complicate. If you want, you can throw in a neutral for the real diplomatic ones. Remove the unnecessary 5 star or 10 star system, the choice is not needed.

2. Consider reviews over time

A couple of years ago, Steam added the feature to show reviews of a game over time, as a little histogram. This solves my first 2 issues with the current system.

The review distribution overtime for No Man’s Sky on Steam

The picture above is the review over time for No Man’s Sky, with the graph on the right showing reviews received in the last month. Although the game was negatively received, it can be clearly seen the game was improved overtime and in the last month, people viewed the game positively. If this graph didn’t exist, and someone saw the reviews as ‘mixed’, they might be less inclined to play the game.

This captures the product over time. It allows the developer to improve the game or application with the recent reviews providing an accurate representation of the experience of using the product (ideally showing a snapshot ‘the latest update’). If someone was a victim of review bombing, this is also captured and displayed, and the user can investigate and make their own judgement.

3. Show metadata of the reviewer

When someone reviews a game on Steam, it shows how many hours they played the game, and the amount of time they had played the game when the review was written.

Most helpful review on Steam for No Man’s Sky

I am more likely to trust this review as I can see how long they have spent on the game, allowing me to make my own judgement on whether I want to trust the reviewer. It also has other data points:

How many products the reviewer has
- If they have multiple products (i.e. games) on their account, they’re less likely to be a bot
How many reviews they have written
- If a reviewer has written hundreds of reviews but only have a few products, it’s a high chance they are a bot
Date
- The date indicates when it was written and how relevant the review is to the game
Helpfulness rating
- Other users can vote how helpful the rating was. This promotes high quality reviews to be pushed to the top

With the combination of all these factors, it’s easy to distinguish between fake and legitimate reviews, and it also gives the reader a chance to choose between reviews and have enough information to determine if that’s the review they will use to help them decide on purchasing the product or experience.

Review bombing is a weird phenomenon, and I don’t really know how you can solve it. Predicting the behaviour of the masses reaction to external events is tough to do, and until we find a way to prevent the effects of review bombing, I think the best way to do it is how Steam deals with it.

With how applications leech off our data, it’s important we have an overview of the effectiveness of the application, especially if we are spending money on it. To do that, the consumer needs reviews, it’s part of our ingrained culture. With the changes outlined above, I really do think the consumer experience will improve greatly. Well done Steam, but it’s time others follow suit.

Although, something tells me big tech like Google and Apple have other things in mind at the moment.

If you have a better idea than I do, if I’ve missed out anything or you think I am talking absolute rubbish, regardless if it’s positive or negative feedback, feel free to reach out either by commenting on the post, or by emailing me on tanvirtalks@substack.com

Tanvir Talks