Creating a DaDaset for Baby Names

First Reactions to the title: You might be asking me in your mind, why Baby Names? And my response to you is, why not? The next question you probably have is, why DaDaset…that’s not how you say “data”? And my response to you is, some of our fellow Americans would beg to differ.

When Picking a baby name, the first thing you need to make sure of is that you have conditions for what your baby name must have, and, if you want to go crazy, must not have. I went crazy. See the fictional conditions below.

Baby Names Conditions:

  1. Unique
  2. Cannot be the same as my name, my parents, or etc, etc
  3. Needs to have the letter “A” or “E”
  4. No “R”s and no “Y”s

When I am looking at a name, usually there is some inexpressible reason why I do or don’t like a name. You may experience something similar when I ask: Would you name your child Mkjsodnfgonsdfououosdfonsf? This was a sub-random string of characters I typed into my keyboard meant as a humorous illustration, but still, I believe the point is made clear; you either don’t like the name or do don’t like the name, so the first step is figuring out what names you do like. Here are a few categories of names you may want to consider:

Historical names: These could be the names of biblical, religious, celebrated, and famous individuals.

Names that carry meaning: These are any names that have a special meaning or significance attached to them. Like Emmanuel, for which means “God with us.”

Current or past names that have been used: The government has been keeping track of this information and posting it to this site: http://www.ssa.gov/OACT/babynames/index.html. This site is incredibly useful if you are picking out a name! Take a look at the different years and female and male names that were picked by parents in different years.

Names that are adopted from another language: Keeping one’s native tongue alive is a great idea. There are other reasons why you might choose to use a different language, such as the sound of the name in another language, or many other reasons.

Once you have gathered your dataset, you are ready to meet with your significant other to go through the list using the conditions that you set for the name. Your conditions can ABSOLUTELY be different from the fictional conditions I listed above.

Why uniqueness? What if we move to another country where everyone else’s name is different? This is a possibility, and, I want to emphasize that there is a list of possible location outcomes for your child. Having a kid or kids with the same name does mean it is the end of the world! Thank goodness for nicknames, middle names, and initials! If you have a common first name, you can always pick a slightly more unique middle name to increase the chances that your middle name won’t be the name of another person. Therefore, you can truly tell your child you tried to avoid this scenario…

[Illustration of students in a classroom, the teacher is going through the list, Nate?….the two student’s with the name “Nate” look at each other confused. Questions running through their heads may be: Who is she asking for? Why didn’t we put our nicknames down on the class roster? Why can’t I be NT? Why is Nate 3 absent today?]

Check out Rpub to get a better presentation of the code and…the results:

http://rpubs.com/natester/boybabynames

Baby Boy Names 2020

library(dplyr)
library(rvest)
library(prettifyAddins)
library(ggplot2)
library(stringr)
library(sqldf)
html_form_page <- 'http://www.whattoexpect.com/baby-names/list/top-baby-names-for-boys/###top-names' 
Reading the HTML code
webpage <- read_html(html_form_page)

summary(webpage)
##      Length Class  Mode       
## node 1      -none- externalptr
## doc  1      -none- externalptr
head(webpage)
## $node
## <pointer: 0x0000000017a89170>
## 
## $doc
## <pointer: 0x0000000017fa9fd0>
names <- webpage %>%
  html_nodes("li")%>%
  html_text()
Creating a visualization
head(names) 
## [1] "  Log In / Join "                                                                                                                                        
## [2] " Getting Pregnant    Fertility  Ovulation  Preparing for Pregnancy "                                                                                     
## [3] " Fertility "                                                                                                                                             
## [4] " Ovulation "                                                                                                                                             
## [5] " Preparing for Pregnancy "                                                                                                                               
## [6] " Pregnancy    Week By Week  Symptoms  Baby Names  Baby Shower  Complications  Due Date Calculator  Labor & Delivery  Screenings & Tests  Signs of Labor "
tail(names)
## [1] "What to Expect Bookstore"            "Advertising Policy"                 
## [3] "Do Not Sell My Personal Information" "Help"                               
## [5] " AdChoices "                         "Feedback"
class(names)
## [1] "character"
This dataset is not a dataframe; we need to turn it into one.
names <- data.frame(names)

head(names)
##                                                                                                                                                      names
## 1                                                                                                                                           Log In / Join 
## 2                                                                                       Getting Pregnant    Fertility  Ovulation  Preparing for Pregnancy 
## 3                                                                                                                                               Fertility 
## 4                                                                                                                                               Ovulation 
## 5                                                                                                                                 Preparing for Pregnancy 
## 6  Pregnancy    Week By Week  Symptoms  Baby Names  Baby Shower  Complications  Due Date Calculator  Labor & Delivery  Screenings & Tests  Signs of Labor
class(names)
## [1] "data.frame"
Need to get rid of rows 1-65, and potentially more
names <- names[-1:-65,]

head(names)
## [1] Liam     Noah     William  James    Oliver   Benjamin
## 1083 Levels:    Log In / Join   AdChoices  ... Zyaire
class(names)###factor?
## [1] "factor"
names <- data.frame(names)

names <- names[-1002:-1021,] ###I just wanted to make sure I didn't delete any names
2 extra rows as predicted
names <- data.frame(names)

names <- names[-1001:-1002,]

Baby_Boy_Names_2020 <- names

Ranking <- c(1:1000)

names_ranked <- cbind.data.frame(Ranking,Baby_Boy_Names_2020)

head(names_ranked)
##   Ranking Baby_Boy_Names_2020
## 1       1                Liam
## 2       2                Noah
## 3       3             William
## 4       4               James
## 5       5              Oliver
## 6       6            Benjamin
##————————— Now you want to remove any names that do not meet your conditions.—————————————

1. How many boys were born in said year? 2019 is the only year I gathered data for number of births using this number for the 2020 baby names, because it seems the baby names were from the 2019 data and used as 2020 names.
Around 37,308,668 boys and 35,730,482 girls according to: https://datacenter.kidscount.org/data/tables/102-child-population-by-gender#detailed/1/any/false/1729,37,871,870,573,869,36,868,867,133/14,15,65/421,422
Percentage/number of boys with top 2 names: Liam and Noah in 2019?
.010741 * 37308668
## [1] 400732.4
.009979 * 37308668
## [1] 372303.2
400,732 We won’t count the .4 of a person
372,303 We won’t count the .2 of a person
These top two names I will exclude from the data set.
Percentage/number of girls with top name: Olivia 2020?
.0010122 * 35730482
## [1] 36166.39
36,166 Once again, we won’t count the .39 of a person. Since I don’t have a dataset of girl names I will not finish this analysis…
however, I would like to look at it in the future! Note:This post took more time than I initially anticipated, please request more if you like it.

For boys names, I am going to strip out the top two: Liam and Noah
names_ranked <- names_ranked[-1:-2,]

head(names_ranked)###William is the new top name for 2020
##   Ranking Baby_Boy_Names_2020
## 3       3             William
## 4       4               James
## 5       5              Oliver
## 6       6            Benjamin
## 7       7              Elijah
## 8       8               Lucas
2. Cannot be the same as my name, my parents, or etc, etc
Taylor, cristiano, Madona, Shaquille, Jojo, Messi, Michael
Do a text string search for those names or names like it…in sQL…
family_names <-c("Taylor", "cristiano", "Madona", "Shaquille", "Jojo", "Messi", "Michael",
                ###adding on abbreviated names
                "Tay","Chris","Shaqy","Jo","Mike")

sqldf("SELECT *
      FROM names_ranked
      where Baby_Boy_Names_2020 = 'William'")
##   Ranking Baby_Boy_Names_2020
## 1       3             William
new_boy_names <- sqldf("SELECT *
      FROM names_ranked
      WHERE Baby_Boy_Names_2020 NOT IN(
       'Taylor', 
        'cristiano', 
        'Madona', 
        'Shaquille', 
        'Jojo', 
        'Messi', 
        'Michael',
        'Tay',
        'Chris',
        'Shaqy',
        'Jo',
        'Mike',
        'Nate')
        AND Baby_Boy_Names_2020 NOT IN(SELECT Baby_Boy_Names_2020
              FROM names_ranked 
              WHERE (Baby_Boy_Names_2020 LIKE '%r%'
              OR Baby_Boy_Names_2020 LIKE '%y%'
              OR Baby_Boy_Names_2020 LIKE '%i%'))
        AND (Baby_Boy_Names_2020 LIKE '%a%'
              OR Baby_Boy_Names_2020 LIKE '%e%')
      ")

class(new_boy_names)
## [1] "data.frame"
head(new_boy_names)
##   Ranking Baby_Boy_Names_2020
## 1       4               James
## 2       8               Lucas
## 3       9               Mason
## 4      10               Logan
## 5      12               Ethan
## 6      13               Jacob
summary(new_boy_names)
##     Ranking      Baby_Boy_Names_2020
##  Min.   :  4.0   Abdullah:  1       
##  1st Qu.:216.2   Abel    :  1       
##  Median :445.5   Ace     :  1       
##  Mean   :461.6   Adam    :  1       
##  3rd Qu.:691.2   Adan    :  1       
##  Max.   :999.0   Aden    :  1       
##                  (Other) :264

knitr::kable(new_boy_names, caption = "Dataset After Meeting Specified Conditions")

Dataset After Meeting Specified Conditions
Ranking	Baby_Boy_Names_2020
4	James
8	Lucas
9	Mason
10	Logan
12	Ethan
13	Jacob
17	Jackson
20	Matthew
21	Samuel
23	Joseph
25	Owen
28	Jack
29	Luke
37	Mateo
39	Jaxon
41	Joshua
45	Caleb
48	Nathan
49	Thomas
50	Leo
61	Landon
63	Jonathan
64	Nolan
66	Easton
72	Angel
76	Jaxson
78	Adam
86	Evan
89	Jose
90	Jace
91	Jameson
94	Axel
100	Jason
101	Declan
102	Weston
106	Luca
112	Chase
114	Emmett
118	Cole
120	Bennett
128	Ashton
132	Gael
135	Maxwell
136	Max
139	Juan
140	Maddox
145	Jonah
146	Abel
148	Jesus
151	Beau
152	Camden
153	Alex
157	Jude
158	Blake
159	Emmanuel
170	August
172	Alan
173	Dean
185	Jesse
187	Joel
194	Dawson
196	Matteo
198	Steven
200	Zane
202	Judah
207	Kaleb
214	Jax
216	Holden
217	Legend
220	Kaden
221	Paxton
225	Josue
226	Kenneth
227	Beckett
228	Enzo
233	Lukas
234	Paul
237	Caden
238	Leon
243	Theo
246	Jaden
255	Ace
256	Nash
262	Jake
269	Sean
270	Chance
276	Cash
284	Stephen
287	Dallas
289	Manuel
290	Lane
291	Atlas
293	Jensen
295	Beckham
296	Daxton
304	Jett
305	Cohen
316	Dante
319	Kane
320	Luka
321	Kash
323	Desmond
324	Donovan
330	Angelo
345	Muhammad
346	Jaxton
349	Dakota
351	Keegan
355	Kade
357	Leonel
361	Wade
370	Jase
371	Lennox
372	Shane
376	Seth
379	Lawson
381	Gage
385	Cade
386	Johnathan
393	Shawn
394	Malcolm
397	Dalton
403	Kason
405	Noel
419	Leland
420	Pablo
421	Allen
427	Damon
428	Emanuel
431	Bowen
434	Kasen
437	Jonas
438	Sage
440	Esteban
442	Kashton
449	Adan
453	Dax
454	Mohamed
456	Kamden
457	Hank
460	Augustus
465	Benson
472	Alonzo
473	Landen
486	Deacon
488	Eden
495	Tate
499	Moses
506	Case
508	Asa
511	Aden
517	Apollo
526	Donald
528	Saul
531	Duke
533	Tatum
534	Ahmed
535	Moshe
538	Cannon
539	Alec
541	Keaton
547	Samson
550	Cason
551	Ahmad
552	Jalen
557	Callum
570	Callen
574	Kobe
577	Mathew
579	Johan
582	Stetson
588	Callan
589	Cullen
593	Kannon
595	Axton
603	Sam
605	Mohammad
607	Gustavo
612	Hamza
617	Kellan
619	Kase
625	Kohen
627	Mohammed
630	Lucca
632	Mack
638	Alden
642	Zeke
650	Lance
655	Amos
660	Casen
661	Colten
667	Devon
669	Boone
671	Nelson
672	Douglas
675	Lennon
679	Noe
682	Lochlan
685	Langston
686	Lachlan
688	Abdullah
689	Lee
692	Ben
695	Joe
699	Kellen
701	Jakob
708	Tomas
710	Thaddeus
711	Watson
714	Koda
716	Nathanael
732	Santana
735	Wells
741	Axl
745	Musa
747	Enoch
750	Talon
756	Dane
765	Hassan
766	Jamal
772	Kole
775	Alonso
777	Madden
778	Allan
780	Jaxen
782	Magnus
784	Dash
798	Jaxxon
809	Keanu
816	Koa
818	Coen
827	Van
829	Canaan
836	Maxton
837	Tadeo
839	Aldo
853	Blaze
855	Kace
862	Eugene
866	Nova
873	Kenzo
878	Stefan
879	Wallace
881	Kendall
885	Anson
886	Gannon
890	Dangelo
893	Bentlee
897	Chad
899	Mustafa
912	Wesson
913	Alfonso
916	Juelz
917	Duncan
918	Keagan
919	Deshawn
920	Bode
926	Keenan
928	Jaxx
936	Heath
939	Elon
943	Maddux
948	Vance
949	Boden
969	Jad
975	Zev
983	Deangelo
986	Kalel
998	Benton
999	Coleman

Please send me an email with your thought or comment! Thank you for reading!

The New Role: Complete Career Change

I am going to start writing more frequently, so I can hopefully encourage others to do the same in their journey to learn more, and also to hopefully get feedback about how to improve my posts or personal knowledge base. I am always open to learning something new.

So why wait till now to start writing? Well, I started my new role, and I simply love it, however, I haven’t been writing much because of how content I am with the team and the things I get to work on and learn. 

Photo by Immo Wegmann on Unsplash

Hold on pause! BACK IT UP!

Okay, here is some context. I was working in the Sales department with a great team of people, however, the work was unrelated to what I wanted to do long term, which is, to work with databases, and eventually, work as a Data Scientist!

Photo by Myriam Jessier on Unsplash

So far things have been going great in my new role, but getting to where I am now was not the easiest thing for me to do. I had to put together a plan because attending classes is not enough if you are going through a complete career change like I was. Let me give you the rundown of the different roles I have held over the years.

career_journey <- c('Paraprofessional at Roberts Academy','English Teacher in Japan','Long Term English Substitute Teacher for Finneytown Highschool','Kroger IT Support Analyst','Sales Underwriting Assistant','Business Intelligence Developer')
summary(career_journey)
career_journey <- data.frame(career_journey)
The Various Jobs I have held on my Arbitrary Path Towards My Career
career_journey
Paraprofessional at Roberts Academy
English Teacher in Japan
Long Term English Substitute Teacher for Finneytown Highschool
Kroger IT Support Analyst
Sales Underwriting Assistant
Business Intelligence Developer

I graduated from school wanting to be a teacher helping students learn the English language through literature, well, that was fun while it lasted, but proved to be not the right career move for me. I was scared of moving into something related to computers, however, when I jumped in and realized how fun and rewarding working with data can be, I felt like I had found my mate for life…data. Hence the blog name and image…

Maybe you find yourself in a similar situation, where your current career goals…

1. seem to not be working out the way you thought

2. have become a lie you tell yourself to get through the day

3. you are finding that the thing you loved the most about your career is getting replaced or overrun by something else (this is the category I fell into).

Don’t give in! Explore your options and never doubt yourself! Your brain is a powerful tool that can be reconfigured to whatever you set your mind to. As cliché as that sounds, it’s true. The only limit is what you put into action. For me, this was a Data Analytics Program, a ton of self-study (SQL phone apps, Quizlet for vocab, R phone apps, and practice on my PC), and a determination to succeed.

If you are thinking about switching careers you probably have a long laundry list of things you think will be holding you back. One of mine was having my first kid! Can you believe that? I actually thought an incredible innocent little creature was going to “hold me back,” but guess what, she became another motivating factor for me to work harder if anything else, because now my daughter was apart of my journey, and I couldn’t let myself, my wife, and my daughter down!!! There were quite a number of mornings at 5 am where my daughter would only fall asleep in the carrier which I would strap on and start rocking her to sleep while also typing SQL queries, and SAS or R code on my computer for my class projects (Rest assured, if I were to put her into her crib, she would be back up in a heartbeat 😉 , so the carrier was the best option for her and myself).

So what if I am just not as amazing as you? This question assumes that I am some degree of “amazing” by myself, but the truth is the most important ingredient to your success is the team of people you have behind you whole love and support you.

I hope you found this piece encouraging! I want to help those who wish to move into a different career but feel trapped. You can do it! It takes a lot of hard work, but you can do it!

I am going to more regularly blog tips and tricks I have picked up that have proven to be helpful for me. I also want to post some different projects I complete.

Thank you for reading!

PART 2: GOT $CAMMED?

In the last post, you were able to see an image of my beloved dog and hear a little bit about fraud and theft in America. This post you will also see a picture of my beloved dog, and I am hoping that with the context from the last blog post you can at least try to picture yourself in Sophia’s shoes who is the person I interviewed for the substance of this blog post. If people can be duped into paying pseudo-government officials money, and phony boyfriends and girlfriends, why couldn’t they be duped into paying a company that was hiring them? And this is where Sophia’s story begins…

The Dataset… This dataset was scrubbed the old fashioned way…manually. Through a series of messages and pictures. Thank you, Sophia, for helping me construct this timeline with your data.

Sophia, whose real name will be kept secret to protect her identity, is a young blonde who really likes to look fashionable and has a cat or, at least that is what Facebook seemed to suggest. Sophia began to talk through some of the things that she had felt leading up to the week where she was robbed, and in reflecting what she learned from the incident, Sophia said, “I learned a lot about their techniques,” and when I asked her for advice for others if they found themselves in a questionably scammy scenario, she said, “[If they ask you for money] have them buy it for you…It’s not your responsibility,” and when they do send you money, “wait till the paycheck goes through.” At the beginning of our conversation, Sophia looked distraught and bewildered, but after making this statement her face reddened with anger, as she thought of the people who did this to her. When we reviewed some of the correspondence, one of the last messages Sophia sent seemed to match her mood now the most which involved some colorful language meant to offend the scammers. The scammers left without a trace and the whole exchange took only a week. If you are meeting someone for the first time and there is even the remotest suggestion of a scammer vibe this person is giving you, see FTC.gov, and look at their information on scammers. Their website has a lot of great resources for how to distinguish whether someone is a scammer or not and they have a phone number you can call to report scams. If you think you are alone, think again and then look at all of the data the FTC provides on other individuals who were also scammed out of their money or had their identity stolen. 

Stepping down from my soapbox, please look at the transaction line chart alongside the timeline of events as they unfolded between Sophia, the Scammer, and the bank. If you don’t want the annoying giant play button in the middle of the viz please go to this link http://public.tableau.com/views/viz_of_scam_timeline_with_balance_v3/TimelineFinal?:display_count=y&:origin=viz_share_link where you can see the best format of this viz.

 

Returning to a bit of advice Sophia gave earlier, “It’s not your responsibility,” and it is not okay to make a transaction without the guarantee of having the money bounce free in your hand or in your bank account. This bit of advice is important because I think it shows how the power should be in the potential employee’s hands or the requestee, as is the case for all other scamming incidents, not the employer’s or potential scammer.

 

References:

  1. https://www.consumer.ftc.gov/features/feature-0037-imposter-scams
  2. https://playfairdata.com/how-to-make-a-timeline-in-tableau/
  3. https://www.ftc.gov/enforcement/data-visualizations/explore-data