Monday, September 19, 2022

UPDATE: I HAVE RETURNED!!!

 HELLO WORLD!

For those of you who are new here, welcome!  I'm glad that you decided to have a look at my blog ๐Ÿ’• If you've been following my blog, thank you so much for returning!  I've been away for quite a while now (so long for my New Year's Resolution to post every month LMAO) so I decided to give you an update on what I've been up to and what I plan to do with the blog from now on.

In my last post, I talked about getting a PhD offer.  Due to... circumstances (The years 2021 and 2022 have continued to be quite a wild ride ๐Ÿ‘€) and some considerable thought about my life, I decided to apply for a jobs.

Specifically, I wanted to see if I can start my career as a data scientist in industry.  I've always wanted to start a career in industry but I decided to apply for a PhD because: 1) I wasn't sure if I could get a data scientist job right away after my Master's considering that I have a degree in Clinical Neurology and not Data Science/Computer Science etc. AND 2) I thought that having a PhD would make me more employable.  I eventually received an offer for a PhD, but the course was going to start a lot later than I expected, so I thought, "Why not try apply for data scientist positions and see what happens?"  If I can start my career now, I won't need to go through a PhD.

During the time I've been away from the blog, I've been going through the motions of preparing resumes, attending company events, networking, preparing for interviews etc.  I'm happy to let you know that I accepted a job offer at a company that offers a training program so that new employees can confidently grow their skills needed for the job.  I'm so excited to start my new career!!!  

For the forseeable future of this blog, I'm thinking of sharing my experiences related to 1) applying for jobs; 2) differences between PhD and job applications; 3) challenges and tips for applying for tech positions when not having a tech or heavily quantitative educational background AND MORE!!!  Hopefully focusing on writing story times and articles can help with producing content consistently.

Thanks for reading!


Wednesday, March 30, 2022

Story time: That time an international student applied for a PhD for UK/settled status students (PhD in UK)

 Welcome back!

And... I know that in last month's post I announced that I would make a Part 2 for Project 8: Logistic Regression but due to life being really busy for me at the moment, I've decided to make a chill story time about what happened when I, an international student, applied for a PhD in the UK even though the position was advertised exclusively for people from the UK.  Hopefully y'all will like this, since my last story time was one of the most popular posts in this blog. ๐Ÿ˜Œ

Alright, cut to the chase.  Did you get in?

Unsurprisingly, YES I DID ๐Ÿ˜†

What was the PhD about?

The PhD was a fully-funded (Well... sort of.  More about this later) bioinformatics project that includes a placement overseas.  

What made you decide to apply for a PhD for locals when you'd be an international student?

I already had a good working relationship with the supervisor and they encouraged me to apply.  I didn't have the most traditional educational background when applying for data science-related PhD projects which put me at a disadvantage when applying for larger programmes.  Since they already knew that I had prior research experience in bioinformatics, it was easier to convince them to take me and provide advice regarding PhD applications specifically tailored to me.

Would you say that knowing the right person that can give you an "in" is important when applying for PhDs?

Absolutely yes.  If you're interested in applying for PhDs, then the first step is to look for a supervisor that 1) is capable of supervising PhD students; 2) currently conduct research related to what you're interested in; 3) someone who you like personally (or at least can work with professionally).  My advice is to treat PhD applications much like job applications.  Now that I think about it, the general flow of a PhD application probably deserves it's own full article... 

How come you didn't apply for a PhD in your home country?

I already did my Bachelor's and my Master's in the UK.  I felt it was more straightfoward to apply for a PhD in the UK since I was quite out of touch for applying for grad school in my home country.

Would you recommend international students to apply for PhDs not intended for international students?

Generally speaking, I'd say no.  I mentioned before that the PhD was fully-funded, but only for local students.  I was quite lucky that I would still receive partial funding but I was told that I had to cover the rest of the fees (mostly tuition fees because that's hella pricey ๐Ÿ˜ง) This is something that I think most people outside of grad school don't know, but funding means A LOT to university researchers.  I'm not just talking about PhD students, I'm talking about post-grads and any academic without tenure.  If you're applying for PhDs, there is IMMENSE pressure on you to get full funding that covers all of your tuition fees, living costs, travel, and anything else related to your research.  Keeping that in mind, international students are at a great disadvantage because there's larger fees to cover and fewer opportunities to get funding.  If you can get a PhD position that offers full funding at international student rates then TAKE IT!!!  (That is if you want to do a PhD as an international student in the UK, of course)

Wow!  I guess there's quite a lot of ground to cover when it comes to PhD applications.  I thought that a PhD is like an extension of school.

๐Ÿ˜†๐Ÿ˜†๐Ÿ˜† 

Final thoughts

I hope you all enjoyed reading about my experience applying for a PhD.  If there's anything you'd like to know more about, please comment down below and I'll consider your requests!  Next time, I (hope to) will write about Project 8 Part 2.  Check out my past posts in the archive section to see more of my works.  I'm semi-active on Twitter so if you're interested in my daily tweets, please follow me


Monday, February 28, 2022

Project 8 Part 1: Logistic Regression - Python

 Welcome

Hi again, hi again!  If you've been catching up with my blog, thanks for your continuous support ๐Ÿ’“ If you're new here, thank you for giving my blog a chance ๐Ÿ’• Since I started learning R, I've thought about making code comparisons between Python and R.  Concidentally, I've also started learning machine learning so I thought... why not try and compare machine learning codes between Python and R!  So far, I've learned how to build logistic regression models using Python and R.  Project 8 is divided into parts 1 and 2 where the codes using Python and R will be described respectively.

I will be using the Iris dataset to demonstrate how the codes work๐Ÿ‘ If you're someone who requires assistive software to read, I suggest downloading the PDF documents to read the codes.

Python - Jupyter Notebook

For this project, I built a logistic regression model using sklearn.  For starters, the packages I used were Pandas, Numpy, Scipy, Sklearn, and matplotlib.







Sklearn allows us to import some of the most famous datasets when learning data science.  For this project, I imported the Iris dataset and included data under the columns sepal_len, sepal_wid, petal_len, petal_wid, and class.  The NAs were dropped and empty lines were removed.  (Note:  This section of the code is based on the work of Srishti Saha from GitHub)

(Click here for the PDF version of code: import Iris dataset)













Just type in iris_df to have a look at the dataset!





























In order for the model to work, we have to make sure that the variable that we want to predict, in this case "class", is an integer.

Just type in iris_df["class"].dtype to confirm!









In order to make a model that predicts Y ("class") from X ("sepal_len", "sepal_wid", "petal_len", "petal_wid"), then both X and Y have to be turned into arrays.












X had to be rescaled so that the maximum value becomes 1 so that we can produce Y which will be returned from a value within the range of 0 to 1.


In order to test the model, the data was split into a training set and a test set.  The training set provides information for the model to "learn" how to make classifications.  The test set makes sure that the model can actually make classifications and is useful for finding out how accurate the model is.  In this project, I split the data so that the training set contains 80% of the data and the test set the remaining 20%.



I made sure whether the data was actually split into 80:20.  The full dataset has 150 data points.  The train set has 120 data points and the test set has 30 data points.  Since 150 x 0.8 = 120 and 150 x 0.2 = 30, the splitting was performed accordingly.  


The logistic regression model was made using the train set.



This is what happened when I tried to test the model on the test set.































You'll see a big array of decimal numbers ranging from 0 to 1.  Logistic regression provides an outcome of the variable class as either "Yes" or "No"... kind of.  What this model really does is provide the probability that class would be "Yes".  The closer the Y value is to 1, the higher chance that class is "Yes."

How useful is the model?


So now that I have the predictions for class based on the other variables in the Iris dataset.  How would I know how accurate the predictions are?  One way is to use the Jaccard index to produce an average percentage of how similar the actually Y values were vs the predicted Y values (called Yhat). 

(Click here for the PDF version of code:  how good is the model?)











The Jaccard index was 0.825.  That means that the model produces results correctly 82.5% of the time.  You could interpret that as "1/5 of all cases could be wrong" or you could say that "it's a whole lot better than a 50:50 chance!"  Personally, I think that 82.5% is a pretty solid number considered that it's a pretty small dataset!

Final thoughts

Thank you so much for reading!  Making this post was actually a lot of fun and I hope you all enjoyed it ❤ I feel like knowing that you are out there reading this blog keeps me motivated to keep on coding ๐Ÿ˜€ Next time, I will be showing that making the same logistic regression model would look like when using R.  Until then, please feel free to read my other posts in this blog.  If there's anything you want to say about this post, comment down below!



Tuesday, January 18, 2022

Happy New Year!!! (How I got into R storytime...)

Welcome back to my blog as we enter 2022!

Happy New Year!  I hope you all had a lovely winter ๐Ÿ’— I know I've been a bit lazy with my blog in 2021, so for my New Year's Resolution, I will write one blog post per month and deliver consistent content for you to enjoy.  As promised in my previous post "Is autism a disability?", I'm going to talk about how I ended up learning R and why some of you might find it useful (hint: data scientists and data analysts).  There will be some R-related content from now on as well as Python and neurodiversity as before!

What is R?

R, like Python, is a programming language.  The main difference is that instead of being able to do a bit of everything, R is mostly used for statistical analysis.  It is a language developed by statisticians for statisticians.  Much like most Pythonistas use Jupyter notebook as an editor, R programmers use RStudio to write code and import packages.

How I ended up learning R?

I got new job!  Yup, that's right.  After I finished my Master's I ended up working at a research lab involved in data science where I have to use R.  Necessity is a good motivator for... everything I suppose haha.  

Was it easy to learn?  How hard is it?

Personally I found it fairly straightforward to learn.  Already knowing Python, I was fairly comfortable with programming concepts and as someone who has been in the STEM field throughout my education the statistics wasn't too hard to grasp.  It also helps that I'm a massive math nerd and did a computational project for my dissertation ๐Ÿ˜… This was the first time that I've used a book to learn how to code.  I'd say pick a book that's essentially a "Book for Dummies" that describes all the steps starting from installation of R.  Of course, I tested out the codes from the book to see if it actually workers on my computer and not just read it.  The thing that I've found with any kind of programming is that you just have to start and make new programs and you'll get from A to B at some point.  Once I was done with the basics, I started making new codes.  StackOverFlow has been particularly useful whenever I got stuck.  There's always someone more experienced than you!  Especially if you're just getting started.

Who would find R useful?

Most likely if you're in the data science field, R is a useful programming language to learn.  R is designed for statistical calculations.

Which do I prefer:  Python or R?

Long story short:  it depends.  If I want to do some heavy statistical analysis, calculations, or data visualizations, then I prefer R.  Generally though, I prefer Python because Python codes are easier to read (especially for machine learning related codes) and it's like the "jack of all trades" kind of programming language.  R is useful for reading Excel or CSV (UTF-8) files but Python can import other more "minor" types of files as well.

Final thoughts

Thanks for reading until the end of my post!  Since the majority of you wanted to read a story time about me starting to learn R in a twitter poll, I decided to make a post about it.  (A lot of you told me to take a break in December as well so I took your advice and resumed writing in January LOL)  I haven't yet decided on next month's topic but I'll let you know via social media!  In the meantime, as always, please check out my other blog posts!!



 

Sunday, November 21, 2021

Is autism a disability?

Introduction

Welcome back to my blog!  If you thought you missed my October 2021 post... you didn't๐Ÿ˜“๐Ÿ˜… I took a break from making blog post since there was a lot going on in my life๐Ÿ˜Š Now I've gotten settled and it's time to start writing!  This time, I'd like to introduce a discussion topic that arises on occasion which is: Is autism a disability?

From my research (and I use this term pretty loosely here since I just mean watching related YouTube videos and reading reddit/FB discussions LOL) I've seen a variety of opinions:

  1. Autism IS a disability because there are some struggles that are unique to autistic people on a regular basis (e.g. meltdowns, shutdowns, sensory overload, being misunderstood by people often etc.) 
  2. Autism IS a disability not because autism in itself is a bad thing to have but rather because society disables autistic people by putting limitations on what it means to be "normal"
  3. Autism IS NOT a disability because autism can have good and bad traits and it is society that dictates what is and what isn't a disability
  4. Autism IS NOT a disability because as much as we have other types of diversities (hair color, eye color, skin color, height, weight etc.) we also have neurodiversity
  5. Autism may or may not be a disability depending on the severity
I made a poll on Twitter to see how people on the internet respond to the question "Is autism a disability?" and here are the results below:
Will be releasing my Nov 2021 blog post TOMORROW!  A question for those of you who are interested:  Is autism a disability?, Yes, of course! 50%, Nope 33.3%, Ehhh... IDK 16.7%, Don't care LOL 0%, 6 votes final results


(Please follow me at @lukas_fleur382 to participate in future polls๐Ÿ˜€ Your opinions may be reflected on a future post!) 

For further information, here are a few videos by autistic YouTubers that express their own opinions on the matter:
It seems that this topic can become quite controversial and heated discussions can arise.  Personally I think this question is rather philosophical because we would have to question what it means to have a disability (or be disabled) and what it means to be autistic.

List of definitions 

Definition of disability (US and UK)

US (Americans with Disabilities Act (ADA)): 

UK (Equality Act 2010): 

"A physical or mental impairment that has a 'substantial' and 'long-term' negative effect on your ability to do normal everyday activities"

  • substantial - e.g. takes much longer than it would usually would to complete a daily task like getting dressed
  • long-term - e.g. 12 months or more 


Definition of autism spectrum disorder

The following are official diagnostic criteria that doctors/psychologists/psychiatrists use internationally when performing an autism spectrum disorder assessment.

Diagnostic and Statistical Manual of Mental Disorders 5th edition (DSM-V)

A. Persistent deficits in social communication and social interaction across multiple contexts, as manifested by the following, currently or by history:

1. Deficits in social-emotional reciprocity, ranging, for example, from abnormal social approach and failure of normal back-and-forth conversation; to reduced sharing of interests, emotions, or affect; to failure to initiate or respond to social interactions.

2. Deficits in nonverbal communicative behaviors used for social interaction, ranging for example, from poorly integrated verbal and nonverbal communication; to abnormalities in eye contact and body languages or deficits in understanding and use of gestures; to a total lack of facial expressions and nonverbal communication.

3. Deficits in developing, maintaining and understanding relationships, ranging for example, from difficulties adjusting behavior to suit various social contexts; to difficulties in sharing imaginative play or in making friends; to absence of interest in peers.

B. Restricted, repetitive patterns of behavior, interests, or activities, as manifested by at least two of the following, currently or by history:

1. Stereotyped or repetitive body movements, use of objects, or speech (e.g. simple motor stereotypes, lining up toys or flipping objects, echolalia, idiosyncratic phrases)

2. Insistence on sameness, inflexible adherence to routines, or ritualized patterns of verbal or nonverbal behavior (e.g. extreme distress at small changes, difficulties with transitions, rigid thinking patterns, greeting rituals, need to take same route or eat same food every day.

3. Highly restricted, fixated interests that are abnormal in intensity or focus (e.g. strong attachment to or preoccupation with unusual objects, excessively circumscribed or perseverative interests).

4. Hyper- or Hyporeactivity to sensory input or unusual interest in sensory aspects of the environment (e.g. apparent indifference to pain/temperature, adverse response to specific sounds or textures, excessive smelling or touching of objects, visual fascination with lights or movement).

C. Symptoms must be present in the early developmental period (but may not become fully manifest until social demands exceed limited capacities, or may be masked by learned strategies in later life).

D. Symptoms cause clinically significant impairment in social, occupational, or other important areas of current functioning.

E. These disturbances are not better explained by intellectual disability (intellectual developmental disorder) or global cognitive delay. Intellectual disability and autism spectrum disorder frequently co-occur; to make comorbid diagnoses of autism spectrum disorder and intellectual disability, social communication should be below  that expected of general developmental level.

Reference:  Centers for Disease Control or Prevention (CDC) 


International Classification of Diseases 11th revision (ICD-11):  Taking into effect starting January 2022

Persistent deficits in the ability to initiate and to sustain reciprocal social interaction and social communication

Range of restrictive, repetitive, and inflexible patterns of behaviour, interests or activities that are clearly atypical or excessive for the individual's age and sociocultural context.

The onset of the disorder occurs during the developmental period, typically in early childhood, but symptoms may not become fully manifest until much later, when social demands exceed limited capacities.

Deficits are sufficiently severe to cause impairment in personal, family, social educational, occupational or other important areas of functioning and are usually a pervasive feature of the individual's functioning observable in all settings, although they may vary according to social, educational, or other context.

Individuals along the spectrum exhibit a full range of intellectual functioning and language abilities.

Reference:   ICD-11 for Mortality and Morbidity Statistics (Version: 05/2021)


What do the definitions all mean?

Disability:

  • Has some kind of medical condition (can be the mind or body)
  • Makes life harder 
    • Finds it hard to get good grades
    • Struggling to get a new job, or sustain a job
    • Finds it difficult to talk/make friends/work with people
    • Finds it difficult to shop or travel on their own
    • Struggles to eat/sleep/go to the toilet/take a bath or shower on their own 
    • etc...
  • Usually long-lasting (months or years)

Autism spectrum disorder

  • Difficulties socializing
    • Understanding what is considered to be "normal"
    • Doing what is considered to be "not normal"
    • Doesn't have friends (or a lot of friends)
    • Friendships don't last long
    • Is not interested in people
    • etc...
  • Is generally considered to be "different"
    • Does the same (or similar) things over and over again
      • Says the same words
      • Lines up toys/tools etc.
      • Paces around and around
      • Hand flapping/pacing/spinning/fidgeting etc. often
    • Likes things the "same" way
      • Same foods
      • Same schedules
      • Same clothes
      • etc.
    • Different expression of interests
      • Likes things that others aren't interested in
      • Likes things a lot more than other people
      • Likes very few things compared to other people
    • Senses differently
      • Finds some lights/sounds/smells etc. a lot more stressful than other people
      • Finds some lights/sounds/smells etc. a lot duller than other people
  • Starts from when they were small kids
    • But might not necessarily struggle until later in life
  • Makes life hard(er)

Is autism a disability?

Arguments FOR autism being a disability

  • IMPAIRMENT in social interaction and communication
  • SUBSTANTIAL LIMITATIONS in life activities
  • Present from early developmental period (long-term)

Arguments AGAINST autism being a disability

  • Symptoms may not become fully manifest until later when social demands exceed limited capacities - Would the person not be autistic at a time when they may not have been "disabled?"
  • Level of functioning may vary depending on the context - If a person struggled greatly in school (e.g. had consistently bad grades), but built a successful career with very few struggles, would the person no longer be considered autistic?  Would they no longer become disabled?
  • Assessing impairment or level of functioning is up to the interpretation of the assessors and the person themselves - lack of consistency

Conclusion

From a purely literal standpoint it may seem obvious that autism is a disability, but the reality is that there is a lot left to interpretation when defining a "substantially significant impairment" for both an autism diagnosis and disability assessment.  It may be safer to assume that autism as a condition is a disability (at least as long as we have the definitions that we have presently) but whether someone identifies as a disabled autistic person is dependent on their situation and the interpretation of the person themselves and the people around them on what it means to be disabled and what it means to be autistic.  

Afterthought

Thank you for reading until the end!  I'm super grateful for my readers that check out my work๐Ÿ˜ Please share your thoughts in the comments!  Do you think autism is a disability?  What are some other autism-related topics that you would like me to write about?  If you're interested in my previous autism/neurodiversity related posts, here is a list:
Since next month is December, I'm planning on sharing a story time about learning about a new programming language: R.  Hopefully, it would smoothen the transition to a new era for my blog to talk about a wider variety of topics related to programming.๐Ÿ˜˜

Saturday, September 18, 2021

Project 7: Renaming columns

 Welcome to my blog!  

If you've been here before, welcome back!  Last month, I wrote about my experiences working on a bioinformatics project with some tips for those of you who are interested in data science.  If you haven't read it yet, check out 'Story time:  Bioinformatics research without a computer science degree'.


For this month, I'd like to write about one of the first steps (if not the first step) of working with new datasets:  Renaming columns.

Below is a sample table to illustrate columns and rows.

Row 1:  Number, Age, Gender, Experience, Comments.  Row 2:  1, 80, Male, Yes, Cool.  Row 3:  2, 50, Female, No, Awesome.  Row 4: 3, 67, Transgender, Yes, AFAB.  Row 5: 4, 39, Nonbinary, No, Fabulous. Row 6: 5, 10, Genderfluid, Unknown, None.  Column 1: Number, 1, 2, 3, 4, 5. Column 2: Age, 80, 50, 67, 39, 10. Column 3: Gender, Male, Female, Transgender, Nonbinary, Genderfluid.  Column 4: Yes, No, Yes, No, Unknown. Column 5: Cool, Awesome, AFAB, Fabulous, None









Please note that for this post, we'll be following through the steps using Jupyter.  If you are more familiar using other notebooks, feel free to use what you are comfortable with.


1)  Open your notebook

You can find the screen below by opening the command prompt.  First type in "jupyter notebook" then copy and paste one of the links generated below.

Black screen with white text.  Blue arrow shows input of 'jupyter notebook' and red arrows show links to open Jupyter.

2)  Open a new Python 3 notebook

Once you've opened Jupyter, have a look on the upper right corner.  You should be able to see a button called "New".  If you click on the New button, you should be able to see a menu of new notebooks, folders or files to open.  To select a new Python 3 notebook, click on "Python 3".

The relevant buttons are highlighted in red.

Files in Jupyter notebook.  Red circles highlight 'New' and 'Python 3"


3)  Upload CSV UTF-8 file

First, download the dataset that you plan to analyze.  Convert the file into a CSV UTF-8 format if necessary.  I've found that CSV UTF-8 files are the easiest to upload and analyze using Python 3.  Right next to the New button in 2), there is another button called "Upload".  You can upload your new file using that button.  The file should appear in the menu.

For this post, I will be using the file Injury statistics - work related claims: 2018 - CSV from Stats NZ.  


4)  Pandas library

You can import the pandas library then rename pandas as pd when using functions from the pandas library.  import pandas as pd

Then using the function pd.read_csv(), you can open the CSV file that was uploaded into Jupyter.  It would make things easier to set a variable name for viewing the CSV file later on:

fullset_injury_df = pd.read_csv('injury-statistics-work-related-claims-2018-csv.csv')


Line 1:  import pandas as pd. Line 2: fullset_injury_df = pd.read_csv('injury-statistics-work-related-claims-2018-csv.csv').  Line 3: fullset_injury_df







5)  View dataset

Have a quick look at the dataset.  Take note of the columns and data described to have a good "feel" of the data.  It might help figure out what kind of data analysis might be ideal.  


6)  Identify column names with symbols, and column names with spaces

Have a look at the dataset columns in 4).  Can you find any column names with any symbols?  Spaces even?  These names can become problematic in the future because they will prevent you from being able to use the dot-notation to access the column.

Wait???  What is a dot notation?

Right.  I guess I haven't mentioned it before in any of my past posts.  Mmmm.  I think a few images might help clear things up.

fullset_injury_df['Sex']











Here is an example of calling on a column using [].  It's useful for any kind of column name.

fullset_injury_df.Sex










Here is an example of calling on a column using the dot notation.  It's another way of calling a column name.

Input 4: fullset_injury_df['Geographic region where injury occurred'], Output 4. Input 5:  fullset_injury_df.Geographic region where injury occurred, SyntaxError:  Invalid syntax.










When you try to read column names that have spaces or symbols using the dot notation, you get a syntax error.


Why not just use [] then?  Why would it be necessary to change the column names?

To be honest, in most cases it would be straightforward to use [] to access a column.  However, during my project I encountered a situation when Python 3 kept confusing one of my columns (which had a "." inside) as a file name and made it difficult to read the dataset properly.  I found that renaming column names that can be accessed using dot notation can avoid such nuisances.  


There are three conditions that must be met for column names to be accessed using dot notation:

A)  The column name cannot be a number

B)  The column name cannot include spaces

C)  The column name cannot include symbols

There is an exception to C), as _ is acceptable.


7)  Rename variable names

You can use the .rename(columns = {original column name: new column name}) to change the column name.  In the example below, I changed the name "Geographic region where injury occurred" to "Geographic_region_where_injury_occurred" to replace the spaces with _.  

Input 6:  line 1: #add _ to variable names with spaces. line 2: fullset_injury_df = fullset_injury_df.rename(columns = {'Geographic region where injury occurred': 'Geographic_region_where_injury_occurred'}. line 3: fullset_injury_df.  Output 6.  Red circle highlights .rename.  Red arrow highlights changed variable name.







Let's see if I could access the new column using the dot notation.

Input 7: fullset_injury_df.Geographic_region_where_injury_occurred. Output 7.






It worked!  Now, Python no longer has a problem with accessing the new column name using the dot notation!  But personally, I find the name takes up a lot of space in the table so I decided to shorten the name to "Geo_region".

Input 8: line 1: #Geographic_region_where_injury_occurred TO Geo_region.  line 2: fullset_injury_df = fullset_injury_df.rename(columns = {'Geographic_region_where_injury_occurred': 'Geo_region'}. line 3: fullset_injury_df. Output 8. Red circle highlights renamed variable Geo_region.

Nice and neat๐Ÿ˜

I repeated the process for all of the other column names and this is what the dataset looks like now.

Input 9: line 1: fullset_injury_df = fullset_injury_df.rename(columns = {'Age group (years) at date of injury': 'Age'}). line 2: fullset_injury_df = fullset_injury_df.rename(columns = {'Employment status': 'Employment'}). line 3:  fullset_injury_df = fullset_injury_df.rename(columns = {'Injury/illness/disease group': 'Pathology'}). line 4: fullset_injury_df = fullset_injury_df.rename(columns = {'Type of injury/illness/disease': 'Pathology_type'}). line 5:  fullset_injury_df = fullset_injury_df.rename(columns = {'Industry subgroup': 'Industry_subgroup'}). Output 9.

Now we have a dataset with short column names that can be accessed using dot notations๐Ÿ˜Œ

8) THE END (or a new beginning?)

Follow the above steps, and you're on your way for the nitty gritty data analysis!  This dataset only has 13 variables but some massive datasets can have 100s or 1000s of variables.  While a relatively simple process, it can admittedly become tedious when dealing with massive datasets.  However, it is an important step to avoid error messages later on.  You don't want to get constant error messages!  Believe me!!!


Final thoughts

I hope you enjoyed my new post and that it would help you get started with looking at new datasets๐Ÿ˜€ Many datasets have their own system in naming their columns, so hopefully this will help out with making sense of the data that you receive from other sources.  What else would you like to know about datasets?  If you're an experienced data scientist, I would love to know your thoughts.  Please share your comments!  


Next month, I'd like to focus on the neurodiversity portion of this blog.  Let's explore a long-lasting question in the autistic community:  Is autism a disability?  


See you next month!!!

Sunday, August 29, 2021

Story time: Bioinformatics research without a computer science degree

Welcome back to my blog!  

I recently finished a graduate program in clinical neurology...  where I had undertaken a computational data science project as a key component of my degree!  Aside from learning how to do basic python coding and blogging about it, I've never had any "official" experience in using my coding skills for university/research.  Usually, I introduce python codes using python projects but this time I'd like to talk about my experience in undertaking bioinformatics-related research without having a computer science component in my bachelor's.

What kind of research did you take part in?


Simply put, figuring out whether we could study diseases using data from different sources.  My answer was more of yeah... but be careful๐Ÿ˜…

Why did you choose a computational project?

In all honesty I've always liked mathematics and physics, and I'm quite confident at them.  I've also had an affinity for computers as well.  When I was a child I'd always play with computer games when I wasn't doing my homework and I even joined the computer club at school where we mostly competed against each other in touch typing.  When I found out that my program was offering a project which combines neurodegeneration (my main research interest) and computer science, I was ecstatic๐Ÿ˜ฒ When I first met my supervisor, they seemed really nice and assured me that they didn't expect me to know everything about coding straight away but I should at least have some interest in learning the skills that would be crucial for my project.  I have already started learning coding and was brushing on my loops mostly, so I felt like it was a good fit for me.

How did you acquire the coding skills necessary for the project?

My supervisor initially recommended that I take some online courses that teach basic data science coding.  My primary resource was freeCodeCamp.org classes on YouTube, where they taught me the basics of data science python coding. (e.g. dataset importing using pandas, data visualization using matplotlib and seaborn libraries)  My project started with the applying for the essential resources, so there was a time gap for me to acquire these skills during the application process.  Throughout the project, my supervisor would recommend other essential tools (e.g. openrefine) for data cleaning and mining.  For those who don't know what data cleaning and mining is, it's basically the process of removing unwanted data or making the data appear more neat and consistent since the people who made the data may have made some mistakes during the input.  I'd also think about what I wanted to do with the data then find libraries or functions that would help reach those objectives.  Many of those functions were functions from the pandas library fto transform my data or the scipy.stats module when performing statistical tests for my project.  The websites pandas.pyplot.org and docs.scipy.org were particularly useful in learning new functions from the pandas and scipy libraries as they provide the basic outline of the code, detailed explanations of the components of the code, and some examples of codes and corresponding output.  I had to refresh my memory on different statistical tests to see which would be most useful for my project.  Then it's about practice, practice, and more practice!  I've learned a lot about what kind of coding I would need by testing out different codes on my data to see what turns up and find which ones were the most useful.  If there was anything I couldn't understand on my own, I would ask members of my research team who were more knowledgeable about statistics and coding. 

Is there a difference in research between data science vs wet lab?


The biggest difference I've found is that the methodology for data science constantly updates itself as I learned new things about my data.  For wet lab research, there's usually a specific protocol for how to carry out certain types of investigations (e.g. western blog, PCR, DNA sequencing etc.) and it's a lot of repetition of those routines.  Most of the planning for wet lab research is focused on figuring out the equipment, solutions, cell types, and concentrations, but the overall procedure more-or-less tends to be similar.  That's why in many biology or chemistry research projects, the methodology section is the easiest part to write and finalize in a thesis.  For data science though, the full methods is something that we can write about towards the end of the project because it relies heavily on what kind of data we would be able to obtain.  There are a few questions to consider when figuring out the methods for data science:
  • Where am I going to collect my data?
  • What kind of data am I going to have?
  • How complete is my data going to be?
  • How big is my data?
  • Are my data coming from one source or multiple sources?
  • What kind of statistical methods would most suit my data?
  • What programming language am I going to use?
etc. etc.  

Data science has a lot of exploration so it takes a bit longer to figure out a specific route to follow, and there's bound to be some bumps along the way which would make you choose a different path sometimes.  

I suppose another difference between data science and wet labs is the nature of the supervision.  Wet labs tend to be more about completing routine tasks on a daily basis, so your supervisor would be able to check in on your progress on a regular basis and see if you've made any errors when carrying out the protocols or if there's some unexpected results.  For data science, there's a lot more independence when carrying out tests because, as I've mentioned before, it's a lot more journey based.  Supervisors would be more interested in understanding the logic behind what tests you've decided to carry out and how that turned out.  Depending on what results you get, the supervisors will nudge you in a certain direction.  

In essence, the main difference between data science and wet labs is that data science is more about the journey while wet labs focus more on the destination.

Would you recommend learning computational skills even if you're not sure if you'd like heavily quantitative research?


Yes.  Primarily because there's a lot more emphasis on collaborative research nowadays with intersectionality becoming more popular.  More and more wet labs favor those who have computational skills and computational scientists often work with life science researchers when undertaking projects.  There is a higher demand for bioinformatics researchers who are experts at both the biology and the computer science aspects of the research and can be the "middle guy" between the pure computer people and the pure life sciences people.  Even if you ultimately decide not to become a data scientist, it's useful to learn quantitative research skills.

Since your blog is a combination of python coding and neurodivergence, would you give any advice specific to researchers (or people who want to be researchers) who identify as neurodivergent?

I'd say the most important attributes of a grad student is being consistent, organized and willing to work with others.  I wouldn't worry too much about being the smartest person in the room because it's likely you won't be when surrounded by a bunch of experts in their field.  I would recommend making regular reports of what you've done during your research as they would come in handy during the write-up process of your dissertation.

Generally I'd say think of your strengths and weaknesses.  For example,
Strengths:
  • Learning independently
  • Numeracy 
  • Presentation skills
Weaknesses:
  • Planning and writing long papers
  • Managing stress levels
  • Communication skills
Once you've figured out what you're good at and not-so-good at, find resources that would be most suitable for you.  What skills do you think you could use to your advantage?  What are the skills you might need to work on?  I knew I like routines so I would feel good when I plan ahead and stick to my routines for consistent output.  I also knew that I needed to work on my writing and making sure I don't burn out easily so I would find ways to sense my limit and manage my stress levels.  The disability services at university can be a good place to start to figure out your options when seeking help.  Even if you don't have a diagnosis, they might be able to refer you for an assessment.  They'd often ask why you'd like to ask for help, so it's better to make a list of struggles you have on a personal and academic basis.  I found that working with a 1:1 study skills tutor was useful in figuring out what kind of support I need, writing drafts, and how to advocate for myself.

Would you continue to pursue computational research?

Definitely!  I greatly enjoyed my time in my research project.  It's something that I never considered before but I now realize is actually a good option for me.  Special shoutout to everyone who helped me throughout my journey๐Ÿ’–

Final thoughts

I hope you all enjoyed reading about my thoughts and advice about taking on a computational research project.  If there's anything more specific you'd like to hear about, please comment down below and I'll consider your requests!  Next month, I'll hopefully be able to present new codes to share๐Ÿ˜‡  Check out my past posts in the archive section to see more of my works.  I'm fairly active on Twitter so if you're interested in my daily tweets, please follow me!

Resources (in order of appearance)

A New Frontier: Building bots without code!!!

 Dear Readers,  Welcome back to this month's Chronicles of a Neurodivergent Programmer.  Last month, I took a break from writing about t...