by Evaristo Caraballo
The Emoji developers use most — based on my analysis of 3.5GB of chat logs
Emoji have drastically changed the way we communicate in social media.
There are numerous studies suggesting differences in the way people use emoji on different social media platforms. For example, the lists of the top emoji in Instagram, Twitter, or Facebook have some similarities but also very distinctive patterns. Those differences get larger when moving down the list.
The possibility that the social platform dynamics might affect the use of emoji made me curious about how people might use them in a social platform to learn to code.
In this article, I look at how new developers use emoji, specifically in the freeCodeCamp’s Gitter Main Chat Room.
There are at least two ways to render emoji in Gitter:
- Using aliases (like those listed by existing online cheat sheets).
- Using the UTF-8 form by either writing the emoji directly from your keyword or copying/pasting the character from online resources.
Both render differently in the message, the former rendering existing Gitter images and the latter rendering according to your machine setups. The first method “using aliases” is the most popular and will be the main subject of this discussion.
To give you a quick idea of what I was after, I wanted to quickly explore answers to questions like:
- Is there a distinctive pattern in the use of emoji?
- Which are the most popular emoji then?
- How many people use emoji?
- How versed are users in the emoji vocabulary?
So lets get started and answer these questions.
Let's have some emoji-talk
After carrying out my analysis, I found out that about 23% of engaged chatters were also emoji users. I define an engaged chatter as a person that has sent at least 10 messages. If we instead compare engaged and non-engaged emoji users against all engaged chatters, that figure rises to 45%.
The number of emoji users might sound small compared to other platforms. However, it is important to note that:
- many users of the chat room were short lived
- there were users who preferred a conservative communication
- some users might not know the emoji aliases
In total, our emoji users rendered at least 753,000 emoji (600,000 when emoji were counted only once per message) with an average of 32 emoji for every 100 messages.
All in all, our emoji users showed a collective literacy of about 800 aliases, about 25% of the full list of emoji in use. I sketched a beeswarm visualization? on D3.js showing that many of them were introduced for the first time in the chat room between July 2015 and July 2016 with a growth rate of 10 - 20 new emoji per week.
When taken per individual though, our emoji users managed a vocabulary of around 3 different emoji on an average. The difference was due to few users championing the usage of emoji, with one particular emoji master showing an emoji literacy of around 500 different ones. ?
“Atypical” emoji-ing in the chatroom?
To have a better idea of how people emoji-ed in the chatroom I compared my findings against a report made by SwiftKey in 2015. There have been substantial updates to the emoji list since the release of the report but it appears to be the best free reference available still in use. It was not possible to find the emoji categorizations used by SwiftKey though. I used the categories and subcategories given by unicode.org as an approximation instead:
I first evaluated the use of emoji at the category level and the results were very much as in the SwiftKey report. Most of the emoji posted in the freeCodeCamp chat room belonged to the “Smileys & People” category, which include faces, gestures, person-roles, body parts and hearts.
Because comparisons based on high level categorizations are usually too shallow, I tried another comparison focusing on the 25 most used emoji ever from 2015 to 2017 using their subcategories instead. Together those 25 emoji accounted for around 15% of all the emoji posted during that period.
The list of emoji and subcategories suggest that our emoji users might still fit well into the typical pattern of emoji users. The extensive use in the chat room of icons within the “face-positive” subcategory coincided with the use of the SwiftKey report's “happy faces”.
The same with the “face-negative” subcategory, much like the “sad faces” in the SwiftKey report. A bit apart was the use of “:trollface:”, which is an icon available in GitHub and it is usually associated with spam messages and sabotage, but also used as a joke in the freeCodeCamp chat room, probably in the same way as ? (“:poop:” or “:hankey:”), also listed in the 25 top-ever.
However it is in the extensive use of positive hand gestures and in general “body” icons where this chat room might distinguish itself from other benchmarks.
The most used gesture icons in the freeCodeCamp chat room are positive, related to welcome, support, validation, and recognition of success, which are values commonly shared in the freeCodeCamp community.
Another difference is the lesser use of icons like ♥️ “hearts” or ? “kisses”, suggesting that “sharing affection” was not the main goal of this chat room. With a gender demography of about 70–80% males that could prove even harder. This demographic might also explain some male-related icons in the top-ever, such as ? (“:gun:”).
Even though we could spot some deviations to the general pattern, it is too soon to make a definitive conclusion. In fact it is likely that the most important deviations might be found in how people used the less-popular emoji.
Furthermore, it might be that the most important differences are not in terms of numbers, but meanings or how the iconography might be interpreted by the group according to its context. A good example of what I refer to is the swastika. A well known example for emoji is the eggplant. I wonder if from our 25 top-ever list ? (“:fire:”) wouldn’t have a distinctive meaning for this group, as a way to express “commitment to a task”. In any case, this is more a topic for those interested in social media communication and emoji, like in this article.
And the winner is…
As a bonus, I scratched a D3.js visualization of the monthly Top5 emoji. Being part of the list of the-most-counted-ever doesn't mean that the emoji reached the monthly top 5 once, or vice versa. Like the Tour de France, a rider could be consistently in the sixth position for the whole competition without ever winning a day and then listed in the most counted. Similarly, a rider could win a day and then stay the last the rest of the time. This is why this list looks a bit different.
So the winner of the monthly Top 5 is…
Frankly, I didn’t expect ? (“:smile:”) to be the most popular emoji. I thought it was ? (“:joy:”), given that Apple recently revealed it as its most popular during 2017.
The following 8 emoji also appeared in the freeCodeCamp casual chatroom. All about smiles :). Do you think you are an emoji-fan? Guess their aliases! (Observation: names/keywords can vary by platform…)
I used Python and the Gitter API to get the messages from the freeCodeCamp main chat room. Python libraries like multiprocessing and emoji were used to transform the data. Part of the transformations also required data available online, for which I made customized scrapers also with Python libraries (requests, urllib, BeautifulSoup4). To analyze the data I used plain Python and some pandas. Explorative visualizations were made using matplotlib while the interactive ones where made in D3.js.
Versions of the code will be available on my GitHub repository together with a few final datasets. Regarding the raw datasets used for this project they are now available on the freeCodeCamp’s Kaggle account.
The motivation of this project adheres to the mission of the freeCodeCamp’s Open Data Initiative. A big thanks to the people in the freeCodeCamp DataScience room and specially to mstellaluna for her comments!
And remember, if you found the information in this article useful or you simply liked the content, don’t forget to leave some claps ? ? before you leave! Thanks and Happy Coding! ?