Blog

Find out what our ML and Big Data experts think.

Fine-tuning BERT model for arbitrarily long texts, Part 2

Author: Michał Brzozowski   This is part 2 of our series about fine-tuning BERT: if you want to read the first part, go to this link, and if you want to use the code, go to our GitHub.   Fine-tuning the pre-trained BERT on longer texts   Now, this is the time to address the elephant in the room of the previous approach. We were lucky to find the already fine-tuned model for our IMDB dataset. However, more often, we are in a more unfortunate situation when we have the labelled dataset and we need to fine-tune the classifier from

Fine-tuning BERT model for arbitrarily long texts, Part 1

Author: Michał Brzozowski   Models based on the transformers architecture have become a state-of-the-art solution in NLP. The word “transformer” is indeed what the letter “T” stands for in the names of the famous BERT, GPT3 and the massively popular nowadays ChatGPT. The common obstacle while applying these models is the constraint on the input length. For example, the BERT model cannot process texts which are longer than 512 tokens (roughly speaking, one token is associated with one word). The method to overcome this issue was proposed by Devlin (one of the authors of BERT) in the discussion. In this

6 inspirations for making hypotheses in Data Science

Author: Adam G. Dobrakowski Redaction: Zuzanna Kwiatkowska   In the Data Science literature, we can find quite a few articles that describe how to do exploratory data analysis (EDA) from a technical point of view. However, usually, there is no information on where to get inspiration for making hypotheses in such an EDA. That is why in this post, I would like to share my thoughts on how to approach searching for such inspiration. As always, I will be relying heavily on my own experiences. If you have any ideas that I haven’t included here, be sure to let me

A/B Testing in Machine Learning. Part 3: 4 most common mistakes

Author: Adam G. Dobrakowski Redaction: Zuzanna Kwiatkowska   This post is the third and last in a series of posts about A/B testing. The others are:   A/B Testing in Machine Learning. Part 1: How to prepare the A/B tests? A/B Testing in Machine Learning. Part 2: Most common problems   Here, I will show you common mistakes that inexperienced and even advanced ML engineers struggle with. Usually, a lot of people are involved in conducting A/B tests. ML engineers/data analysts, but also people responsible for the deployment or operation of a given element on the client’s side. Sometimes, we

A/B Testing in Machine Learning. Part 2: Most common problems

Author: Adam G. Dobrakowski Redaction: Zuzanna Kwiatkowska   This article is the second part of the series regarding A/B testing. You can access other articles with the following links:   A/B Testing in Machine Learning. Part 1: How to prepare the A/B tests? A/B Testing in Machine Learning. Part 3: 4 most common mistakes   In this article, I will show you the most common problems I encountered when performing A/B tests in real-life scenarios. I will also tell you how to deal with those challenges!   How to construct two identical test groups?   As I already mentioned in

How to Make the Polish Healthcare Sector More Efficient with (Simple) Natural Language Processing?

Author: Zuzanna Kwiatkowska Scientific collaboration: Mateusz Rapicki   Have you ever wondered how many things must happen during your 15-minute-long visit to the doctor’s office? Imagine yourself walking into the room. You sit down, and the doctor asks you what the reason for your visit is. They listen to you and ask about your medical history and any information relevant to your problem. Maybe they already have some of that information in their computer files, so they must read them and recollect everything about your case. Or worse – you bring that information with you in a potentially chaotic stack

Posts
Our customers said...

Blog

Find out what our ML and Big Data experts think.

Follow us

Get in touch

MIM Solutions can help develop AI in your company, please contact us and we’ll talk you through it.