upgrade to sdk2

sara rezaei 20 Reputation points
2023-08-14T09:19:23.3+00:00

I am currently in the process of training a model using the LGBM algorithm, and the features I have include five categorical features and one textual feature.

For preprocessing, I use the CountVectorizer method for the categorical features. For the textual data, since it's in Persian language, I preprocess it using the "hazm" library, followed by employing the TF-IDF_Vectorizer approach.

In the past, when using SDK1, the code execution time was around 6 minutes. However, since moving to SDK2 and incorporating MLflow, the execution time has significantly increased. Due to limited computational resources and my constraints, I'm forced to cancel the execution within 30 minutes.

At present, I'm unsure of the root cause of the issue, as I haven't altered the preprocessing methods, the model, or the data itself.

However, I have noticed that the execution time increases significantly when adding the textual feature to the model. Currently, I'm unsure which aspect I should investigate.
preprocess of text features:
User's image

There's another important point to mention. I've added several log statements at different parts of the code to track where more time is being consumed. However, it seems that the code isn't entering the main section, preventing it from logging anything on the Azure server.
User's image

Is there a specific reason or factor causing this issue?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,235 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.