Learning a Language? Export your Data from Duolingo.

Why learn a language? This is one avenue of life that I am constantly working on. If I had the ability to speak fluently in another language, life would be easier (emphasis on the would be). Whatever your reason for studying a language, you can keep track of your progress over time with Duolingo.

I want to show you how to export your own data from Duolingo and hopefully inspire you to do better than myself in your language learning practice. You will see what I mean… 😉

How to download your data from Duolingo?

First, you want to log into your account, then go to the “Settings.”

Next, you will want to find “Export my data,” and then click that.

There was a message that popped up stating that it could take up to 30 days for them to send my data, but the reality is, I got an email within an hour stating that the data was ready to be downloaded.

We did it!!! Now, let’s analyze the results! They should come in a csv file, so if you want to use Excel for analysis, that is an option, Tableau, R, Python, and Power BI, to name a few, are other viable options. I am going to use R. Here is a complete version of what I did: http://rpubs.com/natester/duolingoanalysis

Here is the summarized version of what I did….Looking at totals for 2019 and 2020, you will see I did better in 2019.

leaderboard_barchart <- ggplot(data=leaderboard_groupedby_year, aes(x=year, y=TOTAL_SCORE))

leaderboard_barchart + 
  geom_col(color= c('orangered','blue'),fill=c('orangered','blue'))+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank())+
  theme_void()+
  labs(title= "Total Score for 2019 (Orange) and 2020 (Blue)")+
  geom_text(aes(label = TOTAL_SCORE), 
            position = position_dodge(0.9),
            vjust = -0.5,
            size =5,
            color=c('orangered','blue'))

Visualizing the Data by Month

head(leaderboard)
##   leaderboard       date timestamp tier score year
## 1     leagues 2019-05-18  20:00:35    0    20 2019
## 2     leagues 2019-05-20  11:14:16    0    50 2019
## 3     leagues 2019-05-27  12:41:58    0    60 2019
## 4     leagues 2019-06-03  11:58:17    0    40 2019
## 5     leagues 2019-06-10  22:35:45    0    50 2019
## 6     leagues 2019-06-17  11:36:58    0    40 2019
leaderboard_linechart <- ggplot(data=leaderboard, aes(x=date, y=score))

leaderboard_linechart + 
  geom_line(color= c('cornflowerblue'),size =1)+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank())+
  labs(title= "Score Over Time for 2019 through 2020")

It looks like I reached my highest peak in my score at the beginning of 2020. What was that score?

sqldf("SELECT date, MAX(score) AS HIGHEST_SCORE
       FROM leaderboard 
       ")
##         date HIGHEST_SCORE
## 1 2020-01-27           109

109 was my highest score on ‘2020-01-27’, however, as we discovered. I showed a greater total score for 2019.

The take away from my charts are, overtime consistency matters more than one single learning sprint; especially, when it comes to learning languages.