9 min read

Post 4: The Science of Tinder 🔥

Post 4: The Science of Tinder 🔥

Love in the 21st century looks a little bit different than it did in the past. With the advent of online dating, people are now able to connect with potential partners from all over the world. Of course, online dating is not perfect. There are still plenty of people who find love the old- fashioned way. But, for those who are willing to give it a try, this project might provide some insights.

This project started as part of my data analytics journey and it's no secret that the tech world can be overwhelming. New concepts and jargon seem to spring up overnight, and it can be tough to keep up. However, I knew I didn't want to just be a student who did assignments, I wanted them to give me an edge in my future work, so the eagerness with which I learned during 7 months is something that still amazes me today.

I was looking for something where I could apply my coding skills to analyze large amounts of data, but at the same time, people who aren't into tech would enjoy it and learn in an interesting way. That's when it hit me: why not work with Tinder Data? Well, since first I have to get the data. It is time to get down to business and show results.

I started looking for it in forums, Research, GitHub repositories and many websites without any success. I was starting to lose hope till I found an article that really resonated with me and gave me some new perspective. I felt like there might be a light at the end of the tunnel. Excited by this new possibility, I reached out to the person who had written the article, hoping to connect with them and learn more. However, I never received a response back (it was a clear sweep left pass).

I knew it was going to be difficult to get the data but I decided to continue instead of changing for another project since Iʼm not one to give up easily. Through the last article I learned about Swipestats.io an interesting website that visualizes your Tinder data, so I contacted its owner, Mr. Kristian Else Bø by mail and LinkedIn but failed to get a response from him. Therefore, deciding it was worth trying again through Instagram, I created an account since I donʼt have one and this time, with the great luck that he answered me and agreed to help me.

After ups and downs finding the data, it's time to set some aims. I won't be able to move forward without a clear understanding of what this project is going towards, so the aim is to Base of the sentiment of a message and the language predict what leads to more conversations on Tinder.

When I started the data visualization, the geographical distribution of the dataset shows that the internet truly does have the power to connect every human to each other human on the planet. With just a few clicks, we can communicate with anyone, no matter where they are in the world. This ability to instantly connect with people from different cultures and backgrounds has transformed the way we do business and perhaps made it a bit easier to find a soul mate.

Project's GitHub repository

Swipe left: 31.916.803                      
Swipe Right: 17.704.473

Of the 1200 profiles in Tinder's dataset, there is a total of 17 million swipes right and a surprising more than 31 million swipes left. This indicates that people are very selective about who they choose to date. Physical appearance might be by far the most important factor of swapping right. Also, the user engagement that those numbers create is absolutely incredible from a business perspective.

Since the dataset is not evenly balanced in terms of the number of men and women, we'll take a look at the numbers. Out of the 1209 users in the tinder dataset, 134 are interested in males, 1017 are interested in females, and almost 60 people are interested in both males and females.

Project's GitHub repository

Interestingly, our data shows that Tinder is most popular with younger adults. This could be because younger adults are more tech-savvy and therefore more comfortable using apps like Tinder. Whatever the reason, it's clear that Tinder is a hit with young people. The majority of users fall into the 25-34 category (698 users), followed by 19- 24 (375 users) and 35-44 (113 users). There are also fewer users in the 45-54 and 55+ categories (18 and 4 users respectively).

Project's GitHub repository

1200 profiles contained 2.5 million messages and extracting it was no easy feat. In fact, it was quite a challenge. But, I managed to do it with python and the data provides a fascinating insight into the world of online dating and the interactions between people. Some of the messages are funny and some are sweet. The reality is that I was learning a lot about how to use Python and libraries in a quite fun way.

So, what does all this mean? Beyond the impressive data that we have been able to analyze, there is so much more than meets the eye. Sure, we can use it to stay connected with friends and family, but technology can also be used to create amazing things. For example, take this popular dating app, Tinder.

By using JavaScript, AngularJS, React, NGINX, AWS we are able to connect people with potential partners in a way that was never possible before. And thanks to tools like Python, we can continue developing and analysing new ways to improve the user experience to make products even better. In fact, as of October 2021, the market value of the company Match Group Inc (owner of Tinder) was a whopping $45 billion. Clearly, the internet is capable of so much more than we could have ever imagined. So what else is out there? Only time will tell.

All of the above sounded great until I found out what algorithms are. That is when it all became clear to me why data scientists make good money and thank God I have an amazing teacher and tutor that explains things in an easy-to understand way.

I'm a big fan of finding ways to optimize processes, especially when it comes to tedious tasks like labeling data. So, when I came across to LangDetect a library that uses machine learning to identify the language of a text, it saved me a ton of time compared to labeling the data manually. Plus, it's always great to find open source tools that can help us be more efficient in our work.

With the output of 48 different languages inside our dataset, it may seem like a lot. But it's just mind-blowing to think about how human communication has evolved over time. We've gone from sounds and symbols to structured languages that have shaped the cultures of entire regions and the way each culture expresses emotions. Looking at this data, it's clear that language is more than just a tool for communication. It's a window into our history, our culture, and perhaps the window to your soul mate 😉.

One interesting discovery was Emojis, specially after using the Machine Learning Library and codeing it to separate messages into its language. For example, people that use English express feelings with emojis in a different way, some are open minded with emojis like    💦🔥😍😈🍆🍑    while others are long term thinking   ❤👰🎩💍🍻🗼🏡👪.  Also, I realize Japanese people use different emojis than westerners in their text messages. I found things like: ٩(◕‿◕。)۶      (´♡‿♡`)       (ᗒᗣᗕ)՞     (⊃。•́‿•̀。)⊃  

For the sentiment analysis, I also use a well-trained library from huggingface, pipeline() which is a Natural Language Processing (NLP) algorithm that helps me to identify, extract and qualify the emotional tone behind each message. Sentiment analysis is important because letʼs say, you sell products across different countries and you have thousands of customer reviews. In this case we can use sentiment analysis to check if customers have a positive or negative opinion and improve customer experiences.

The above sounds very clever but difficult to really explain how a computer tries to define the feelings of a human expression, which is why the following graph helps to understand what is happening inside each model at a higher level without scarring people with the complexity of math.

www.voxco.com/blog/sentiment-analysis-helps-improve-customer-experience/

Now the most complex part is linear regression. I was like whaaat??? So, let me explain it in a very simple way and you'll see why this can be useful for you. Linear regression is a way to predict how one thing (like your income) will change depending on other factors such as age or education level.

So, for our case, we will use linear regression with the following variables
X1 = Number of swipes, Number of Messages, Language
Y =  Total number of positive conversations / total number of messages

The number of swipes and number conversations are numerical variables. However, language is a text-based variable that we cannot use for our linear model because it does not have an equivalent number representation. This lets me realise that problem solving skills are essential in the tech industry because execution is full of unexpected problems. In this particular case, we can fix it with Onehot encoding, which is a way to turn these words into something more tangible, numbers.

Since we have just analyzed some hot conversations, let's keep the line and make a heat plot this time to see the correlation in our variables. We can see some positive correlation between swipes and messages. This means that as the number of swipes increases, the total number of messages also increases.

Project's GitHub repository
Project's GitHub repository

In conclusion, more swipes slightly lead to more conversations and if the ratio of those messages is positive, users will tend to have longer interactions. On the other hand, I realize that if you have an internet connection, the means and tools for learning are abundant. However, nowadays the curiosity and desire to learn is incredibly scarce.

In a world where you can talk and leverage a computer though code there are two things growing, internet and data. However, the internet has not gotten faster at the rate that data storage has increased. The speed is only accelerating with the growth of the internet of things (IoT). This emphasizes the importance of companies leveraging data to optimize their performance and understand users like never before.

Also, the cost of data storage is going down with the progress of computing power (Mooreʼs Law). This means Ai and machine learning are advanced really fast as we will continue to compound knowledge to the point that the marginal cost of intelligence will be close to zero. Which means we have to focus on what makes us humans as this technological breakthrough continues to advance.

Beyond Tinderʼs technological innovations, every user is a person with dreams, qualities and multiple abilities so this capacity to make others laugh, be empathetic and express love is extremely important for humans especially to differentiate from computers as this marginal cost is getting closer to zero.

The limits are really our imagination, the desire to learn is really your own choice.

dataAnalytics/projects/tinder at master · havipr/dataAnalytics
My data projects with technical and business skills - dataAnalytics/projects/tinder at master · havipr/dataAnalytics
This article dosen't include any code. if you'd like to check how it was develop please refer to the Project's GitHub repository