Search
Close this search box.
Search
Close this search box.

Blog

17 Must Know Commands for Any ML Engineer

Date of publication: 2 years ago

Share this:

17 Must Know Commands for Any ML Engineer

Author: Adam G. Dobrakowski
Redaction: Zuzanna Kwiatkowska
 

Since I started working on Machine Learning projects a couple of years ago, I decided to build a cheatsheet with the most important commands to use on a day-to-day basis.

Most of them are used so rarely that they are hard to remember. On the other hand, after a short time, I realised that I’m looking for the same information on Stack Overflow over and over again.

That’s why in this short article, I would like to share with you a part of my cheatsheet. I hope that it is going to be useful in your work.

 

Tech Stack

 
The technologies I use most often in my projects are:

  1. Jupyter Notebook – to do a quick data analysis and experimenting,
  2. Visual Studio Code – as an IDE to write Python code,
  3. Remote repository,
  4. Linux.

 

To create data analysis efficiently and quickly, I use Python’s libraries such as Pandas and Matplotlib.

Let’s start with imports!

import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import HTML

 

Pandas

 

1. Show all rows in a table

 
# turn on
pd.set_option('display.max_rows', None)

# turn off
pandas.reset_option('display.max_rows')

or alternatively

with pd.option_context("display.max_rows", 1000):
display(df)

In my opinion, the second option is better, because we don’t have to remember about switching it off every time. This is based on my own experience, when I often forgot about it and crushed my jupyter notebook when trying to display a large table.
 

2. One-liners to make DataFrame processing easier

 
I particularly like 3 of them.

To change name of a single column, for example from “A” to “B”, use:

df.rename(columns={'A': 'B'})

To add a column and automatically fill it with ones:

df.assign(one=1)

To delete column called “campaign_id”:

df.drop(columns=['campaign_id'])
 

3. Merging the DataFrames

 
If you want to do it row-wise, you can use:

df = df1.append(df2)

When merging column-wise, you just have to add additional argument:

pd.concat([df1, df2], axis=1)
 

4. Converting DataFrame with 2 columns to dictionary

 
df.set_index('Column1')['Column2'].to_dict()

You can also do a backward operation and create DataFrame from dict quickly:

pd.DataFrame.from_dict(my_dict, orient='index')
 

5. Creating additional column with percentage statistics

 
Imagine you have a database consisting of ad campaigns. For each ad campaign, we know how many clicks it got and on which day. Now, for each day, we want to know how much each campaign contributed to all clicks within this day. Sounds difficult, but we can actually do that in a single line!

df['clicks_perc'] = df[['clicks', 'campaign_id', 'day']].groupby(['campaign_id', 'day']).transform(lambda x: x / x.sum())
 

6. Plotting 2 variables in a single graph using Pandas

 
df[['income', 'cost']].plot()
plt.show()

 

7. Creating a plot for a single category

 
Imagine you have a database with ad clicks. You measure them every hour for all of your websites. How would you create a plot in which you can see the number of ad clicks over time for every website separately? My solution would be:

plot_df = df[['clicks', 'page', 'hour']].set_index(['page', 'hour']).unstack('hour')
plot_df.columns = [c for (_, c) in plot_df.columns]
plot_df.plot()

 

Plots

 

8. Quickly beautify plots in Matplotlib

 
plt.rcParams["figure.figsize"] = (20,10)
plt.rcParams["font.size"] = 22
plt.style.use('bmh')

# reset
plt.rcParams.update(plt.rcParamsDefault)

 

9. Add vertical and horizontal grid lines to your plot

 
For vertical:

plt.axvline(x=0, color='grey', linestyle='-')

And horizontal:

plt.axhline(y=0.0, color='k', linestyle='-')
 

Jupyter Notebook

 

10. Using Python code from .py files in the notebook

 
Imagine you have a directory where you store two sub-directories: ipython with Jupyter Notebooks and lib with your Python code in .py files. To import from lib inside the notebook, simply use:

import os
while 'ipython' in os.getcwd():
os.chdir("../")

 

11. Making the command windows larger

 
By default, the code window in Jupyter doesn’t cover the full width of your browser. If you have a wide monitor, it may be frustrating, especially when you want to analyse databases with a lot of columns. You can change it using:
 

12. Beautify HTML titles in the notebook

 
display(HTML("<style> .container {width: 100% !important; } </style>"))
 

Terminal and Git

 

13. Displaying JSON-like format in your terminal

 
echo '{"a":[2,3]}' | json_pp
 

14. Find the system processes that use your computer memory the most

 
ps aux --sort=-%mem | head
 

15. Running Jupyter Notebooks from the terminal

 
runipy -o my_notebook.ipynb
 

16. Choosing a file when you have a merge conflict in Git

 
In my opinion, it’s particularly useful when you have a conflict between Jupyter Notebooks.

git checkout --theirs [--ours] path/to/file
 

17. Reverting your commit

 
Imagine you want to revert 5 commits to 3 commits behind. You can then provide a list to your git revert:

git revert HEAD~5..HEAD~2

Simply using HEAD~2 would only revert a single commit.
 

Conclusions

 
I hope that some of those commands were surprising for you and that you’re going to use them!

Do you also have your own command and functions cheat sheet? If so, share your best ones on LinkedIn with us!

Other posts

Breaking news from MIM Solutions

Follow us

Events

Is commercial AI a scam?

Do you have some spare time this weekend? Make sure to listen to the podcast by Maciej Szczerba where our expert Adam Dobrakowski shared his knowledge on

News

Our solutions are used by doctors!

On 1 October, infertility treatment clinic Invicta implemented a new AI tool – the AIOO application, which was co-created by MIM Solutions. AlOO predicts the

News

We are Eurostars!

MIM Solutions and ACORAI became a part of Eurostars programme! Eurostars is a programme co-founded by the EU and their Horizon Europe funding. It supports