Tidy Data & Organized Research: Working Together for Better Data

In Hadley Wickham’s “Tidy Data,” Journal of Statistical Software readers dive into the world of tidy data, and how organized formatting can truly help elevate research findings while simplifying complex information. In Wickham’s journal, readers find that the three principles of tidy data are that each variable forms a column, each observation forms a row and that each type of observational unit forms a table. Analysts, as well as computers, prefer information to be laid out as tidy data because it makes it “easier to extract needed variables…”  due to the fact “…it provides a standard way of structuring a dataset.” (Wickham, 2014). Wickham claims tidy sets are particularly well suited for vectorized programming languages. Initially, I did not understand this statement due to the fact some language was foreign to me, therefore resulting in having to do outside research for clarification. It was then that I found that vectorized programming languages let a person perform mathematical functions on entire lists as if they were singular objects. Despite the order of variables and observations not affecting the analysis, good ordering allows a better scan of the raw variables. 

Some of the best methods for organizing your research are working to tidying messy datasets. Wickham argues that most messy datasets can be tidied with three small sets of tools: melting, string splitting, and casting. “​​Melting is [described] by a list of columns that are already variables, colvars for short.” (Wickham, 2014). However, other methods for organizing your research are simply taking well-organized notes, using your sources to generate new or improved ideas, and incorporating preliminary readings to get a sense of the deeper meaning into what your research topic is truly about. 

I would incorporate tidy datasets and organizational research methods into my scholarly practice by simplifying the information I have at hand. To elaborate, after having conducted my research and studying my readings, I would take notes to jot down my findings. Notes are unbelievably important! You can not study something you’ve already forgotten, nor would you want to pass down misinformation due to foggy memories. With a simplified version of my research, I would be able to create a tidy set, and as Wickham mentions numerous times, tidy sets generally are messy at first! That’s where I would implement the new skills of organizing and following the three principles to elevate my tidy set. I don’t expect perfection, nor should any true scholar. I’ll make mistakes, learn from my mistakes, and ultimately make new mistakes. Being a scholar is all about implementing research, adjusting said research, and continuing to learn. 

Hadley Wickham, “Tidy Data,” Journal of Statistical Software 59, no. 10 (August 2014)

Let’s Get Ethical, Ethical ♫

Considering the four literacies ethics, privacy, copyright, and licenses that were discussed during our class on September 7th, it was shown that they all play an equal role in the research being conducted by future and current scholars. These literacies affect your research and scholarship as either a historian/scholar by defining the guidelines of what information you are legally and morally allowed to publish and/or obtain. 

Through ethics, you need to be aware of normative, alternative, and harmful potentialities. Will your research cause psychological, physical, legal, or social harm to the subjects being researched? Being aware of ethics of care and understanding the risk-benefit analysis must be prevalent. It is imperative that researchers are granted consent before furthering their research in order for their project to be considered ethical. 

Another literacy is privacy; privacy is important to prevent public disclosure of impermissible facts, defamation of character, or appropriation of logos or names. A surprising fact I was not aware of when it came to privacy law was that your project may proceed if the subject matter of the collections is no longer living. Whether that was common knowledge or not, I genuinely never thought about it despite it making complete sense. How can a dead person give consent? It leads me to question research regarding Egyptian history and the mummies involved. Who grants archeologists consent to unravel the remains of people who were once actual living people with families, memories, and an entire history– only to be picked apart for scientific research purposes? 

Under copyright laws, the content in which researchers are allowed to use is defined by strict guidelines. Copyright affects your research in the way that plagiarizing carries heavy consequences and it is important to understand the copyright and license depending on the project being worked on. However, you are not legally able to copyright facts. This allows researchers to further their studies without copyright laws preventing them from publishing their own findings and hypotheses. An interesting fact about copyright laws that I learned from our meeting was that a person is unable to be hit with a copyright strike when it is anything from the government.  

In order to avoid copyright laws, the last literacy to follow is licensing. Licenses simply come down to being a contract not to sue. It allows historians and scholars the ability to use the information to further their research without fearing backlash and possible legal repercussions under fair use. 

Important considerations to think about before beginning a digital humanities project are: Is the information being obtained ethically (for example; has consent been given prior to starting the project)? Some strategies to account for ethics can be consulting journal publications, imposing access controls, and/or developing best local practices.