Analyzing Anti-Cancer Medications in Mice using Jupyter Notebook, Pandas, & Matplotlib

Resources


Data sources: Mouse_metadata.csv, Study_results.csv

Software: Python 3.9.7; Jupyter Notebook 6.4.11; pandas 1.3.5; Matplotlib 3.5.1

Project Objectives


Use the data provided to compare the effectiveness of the drug, Capomulin, against the other treatment regimens when treating squamous cell carcinoma (SCC), a commonly occurring form of skin cancer.

Tasks include:

  1. Preparing and cleaning data by merging the two datasets and dropping duplicate mice ID’s
  2. Generating summary statistics for each drug regimen
  3. Creating visualizations such as bar charts, pie charts, line charts, and scatter plots
  4. Calculating quartiles, finding outliers, and creating a box plot
  5. Calculating correlation and linear regression between the selected drug regimen, Capomulin, and the average tumor volume

Results & Analysis


Screenshot

The table above displays the clean dataframe after merging the two datasets and dropping duplicate mouse ID’s. There are 248 unique mouse ID’s in the cleaned dataset, with metastatic sites ranging from 0-4, ages ranging from 1-24 months, and weights ranging from 15-30 grams.

Screenshot

The table above displays the summary statistics for each drug regimen. Ramicane is the leading drug regimen across all category statistics with the smallest average tumor volume, median tumor volume, tumor volume variance, tumor volume standard deviation, and tumor volume standard error of mean.

Screenshot

The bar chart above represents the count of mice that were tested for each drug regimen, with Capomulin and Ramicane having the highest number of test subjects, both exceeding 200 mice each.

Screenshot

The pie chart above depicts the distribution of male and female mice that were tested, showing an almost equal split with roughly 50% for each gender.

Screenshot

The image above shows the results of my calculations aimed at identifying outliers for the four drug regimens. The results indicate that there is only one outlier for Infubinol, which falls below the lower bound.

Screenshot

The boxplot above displays the upper and lower bounds and the single outlier I found in my previous calculations for the four drug regimens.

Screenshot

The line chart depicted above shows the decrease in tumor volume for mouse “l509” treated with the drug Capomulin over a period of 40+ days.

Screenshot

The scatter plot above depicts the relationship between the average tumor volume and mouse weight for the drug Capomulin.

Screenshot

Above is an image showing the correlation coefficient (0.84) between mouse weight and average tumor volume for the drug Capomulin. It also displays the linear regression model we calculated to predict the average tumor volume based on the weight of the mouse.

Drawing Insights



Return to the Homepage