Tidy Data & Organized Research: Working Together for Better Data

In Hadley Wickham’s “Tidy Data,” Journal of Statistical Software readers dive into the world of tidy data, and how organized formatting can truly help elevate research findings while simplifying complex information. In Wickham’s journal, readers find that the three principles of tidy data are that each variable forms a column, each observation forms a row and that each type of observational unit forms a table. Analysts, as well as computers, prefer information to be laid out as tidy data because it makes it “easier to extract needed variables…”  due to the fact “…it provides a standard way of structuring a dataset.” (Wickham, 2014). Wickham claims tidy sets are particularly well suited for vectorized programming languages. Initially, I did not understand this statement due to the fact some language was foreign to me, therefore resulting in having to do outside research for clarification. It was then that I found that vectorized programming languages let a person perform mathematical functions on entire lists as if they were singular objects. Despite the order of variables and observations not affecting the analysis, good ordering allows a better scan of the raw variables. 

Some of the best methods for organizing your research are working to tidying messy datasets. Wickham argues that most messy datasets can be tidied with three small sets of tools: melting, string splitting, and casting. “​​Melting is [described] by a list of columns that are already variables, colvars for short.” (Wickham, 2014). However, other methods for organizing your research are simply taking well-organized notes, using your sources to generate new or improved ideas, and incorporating preliminary readings to get a sense of the deeper meaning into what your research topic is truly about. 

I would incorporate tidy datasets and organizational research methods into my scholarly practice by simplifying the information I have at hand. To elaborate, after having conducted my research and studying my readings, I would take notes to jot down my findings. Notes are unbelievably important! You can not study something you’ve already forgotten, nor would you want to pass down misinformation due to foggy memories. With a simplified version of my research, I would be able to create a tidy set, and as Wickham mentions numerous times, tidy sets generally are messy at first! That’s where I would implement the new skills of organizing and following the three principles to elevate my tidy set. I don’t expect perfection, nor should any true scholar. I’ll make mistakes, learn from my mistakes, and ultimately make new mistakes. Being a scholar is all about implementing research, adjusting said research, and continuing to learn. 

Hadley Wickham, “Tidy Data,” Journal of Statistical Software 59, no. 10 (August 2014)

Leave a Reply

Your email address will not be published.