Sunday, August 29, 2021

Story time: Bioinformatics research without a computer science degree

Welcome back to my blog!  

I recently finished a graduate program in clinical neurology...  where I had undertaken a computational data science project as a key component of my degree!  Aside from learning how to do basic python coding and blogging about it, I've never had any "official" experience in using my coding skills for university/research.  Usually, I introduce python codes using python projects but this time I'd like to talk about my experience in undertaking bioinformatics-related research without having a computer science component in my bachelor's.

What kind of research did you take part in?


Simply put, figuring out whether we could study diseases using data from different sources.  My answer was more of yeah... but be careful😅

Why did you choose a computational project?

In all honesty I've always liked mathematics and physics, and I'm quite confident at them.  I've also had an affinity for computers as well.  When I was a child I'd always play with computer games when I wasn't doing my homework and I even joined the computer club at school where we mostly competed against each other in touch typing.  When I found out that my program was offering a project which combines neurodegeneration (my main research interest) and computer science, I was ecstatic😲 When I first met my supervisor, they seemed really nice and assured me that they didn't expect me to know everything about coding straight away but I should at least have some interest in learning the skills that would be crucial for my project.  I have already started learning coding and was brushing on my loops mostly, so I felt like it was a good fit for me.

How did you acquire the coding skills necessary for the project?

My supervisor initially recommended that I take some online courses that teach basic data science coding.  My primary resource was freeCodeCamp.org classes on YouTube, where they taught me the basics of data science python coding. (e.g. dataset importing using pandas, data visualization using matplotlib and seaborn libraries)  My project started with the applying for the essential resources, so there was a time gap for me to acquire these skills during the application process.  Throughout the project, my supervisor would recommend other essential tools (e.g. openrefine) for data cleaning and mining.  For those who don't know what data cleaning and mining is, it's basically the process of removing unwanted data or making the data appear more neat and consistent since the people who made the data may have made some mistakes during the input.  I'd also think about what I wanted to do with the data then find libraries or functions that would help reach those objectives.  Many of those functions were functions from the pandas library fto transform my data or the scipy.stats module when performing statistical tests for my project.  The websites pandas.pyplot.org and docs.scipy.org were particularly useful in learning new functions from the pandas and scipy libraries as they provide the basic outline of the code, detailed explanations of the components of the code, and some examples of codes and corresponding output.  I had to refresh my memory on different statistical tests to see which would be most useful for my project.  Then it's about practice, practice, and more practice!  I've learned a lot about what kind of coding I would need by testing out different codes on my data to see what turns up and find which ones were the most useful.  If there was anything I couldn't understand on my own, I would ask members of my research team who were more knowledgeable about statistics and coding. 

Is there a difference in research between data science vs wet lab?


The biggest difference I've found is that the methodology for data science constantly updates itself as I learned new things about my data.  For wet lab research, there's usually a specific protocol for how to carry out certain types of investigations (e.g. western blog, PCR, DNA sequencing etc.) and it's a lot of repetition of those routines.  Most of the planning for wet lab research is focused on figuring out the equipment, solutions, cell types, and concentrations, but the overall procedure more-or-less tends to be similar.  That's why in many biology or chemistry research projects, the methodology section is the easiest part to write and finalize in a thesis.  For data science though, the full methods is something that we can write about towards the end of the project because it relies heavily on what kind of data we would be able to obtain.  There are a few questions to consider when figuring out the methods for data science:
  • Where am I going to collect my data?
  • What kind of data am I going to have?
  • How complete is my data going to be?
  • How big is my data?
  • Are my data coming from one source or multiple sources?
  • What kind of statistical methods would most suit my data?
  • What programming language am I going to use?
etc. etc.  

Data science has a lot of exploration so it takes a bit longer to figure out a specific route to follow, and there's bound to be some bumps along the way which would make you choose a different path sometimes.  

I suppose another difference between data science and wet labs is the nature of the supervision.  Wet labs tend to be more about completing routine tasks on a daily basis, so your supervisor would be able to check in on your progress on a regular basis and see if you've made any errors when carrying out the protocols or if there's some unexpected results.  For data science, there's a lot more independence when carrying out tests because, as I've mentioned before, it's a lot more journey based.  Supervisors would be more interested in understanding the logic behind what tests you've decided to carry out and how that turned out.  Depending on what results you get, the supervisors will nudge you in a certain direction.  

In essence, the main difference between data science and wet labs is that data science is more about the journey while wet labs focus more on the destination.

Would you recommend learning computational skills even if you're not sure if you'd like heavily quantitative research?


Yes.  Primarily because there's a lot more emphasis on collaborative research nowadays with intersectionality becoming more popular.  More and more wet labs favor those who have computational skills and computational scientists often work with life science researchers when undertaking projects.  There is a higher demand for bioinformatics researchers who are experts at both the biology and the computer science aspects of the research and can be the "middle guy" between the pure computer people and the pure life sciences people.  Even if you ultimately decide not to become a data scientist, it's useful to learn quantitative research skills.

Since your blog is a combination of python coding and neurodivergence, would you give any advice specific to researchers (or people who want to be researchers) who identify as neurodivergent?

I'd say the most important attributes of a grad student is being consistent, organized and willing to work with others.  I wouldn't worry too much about being the smartest person in the room because it's likely you won't be when surrounded by a bunch of experts in their field.  I would recommend making regular reports of what you've done during your research as they would come in handy during the write-up process of your dissertation.

Generally I'd say think of your strengths and weaknesses.  For example,
Strengths:
  • Learning independently
  • Numeracy 
  • Presentation skills
Weaknesses:
  • Planning and writing long papers
  • Managing stress levels
  • Communication skills
Once you've figured out what you're good at and not-so-good at, find resources that would be most suitable for you.  What skills do you think you could use to your advantage?  What are the skills you might need to work on?  I knew I like routines so I would feel good when I plan ahead and stick to my routines for consistent output.  I also knew that I needed to work on my writing and making sure I don't burn out easily so I would find ways to sense my limit and manage my stress levels.  The disability services at university can be a good place to start to figure out your options when seeking help.  Even if you don't have a diagnosis, they might be able to refer you for an assessment.  They'd often ask why you'd like to ask for help, so it's better to make a list of struggles you have on a personal and academic basis.  I found that working with a 1:1 study skills tutor was useful in figuring out what kind of support I need, writing drafts, and how to advocate for myself.

Would you continue to pursue computational research?

Definitely!  I greatly enjoyed my time in my research project.  It's something that I never considered before but I now realize is actually a good option for me.  Special shoutout to everyone who helped me throughout my journey💖

Final thoughts

I hope you all enjoyed reading about my thoughts and advice about taking on a computational research project.  If there's anything more specific you'd like to hear about, please comment down below and I'll consider your requests!  Next month, I'll hopefully be able to present new codes to share😇  Check out my past posts in the archive section to see more of my works.  I'm fairly active on Twitter so if you're interested in my daily tweets, please follow me!

Resources (in order of appearance)

LLM Part 2: Encoding

Welcome to Part 2 of building your own large language model! Part 1 was about breaking down your input text into smaller subwords. (tokeniza...