upgrade to sdk2
I am currently in the process of training a model using the LGBM algorithm, and the features I have include five categorical features and one textual feature.
For preprocessing, I use the CountVectorizer method for the categorical features. For the textual data, since it's in Persian language, I preprocess it using the "hazm" library, followed by employing the TF-IDF_Vectorizer approach.
In the past, when using SDK1, the code execution time was around 6 minutes. However, since moving to SDK2 and incorporating MLflow, the execution time has significantly increased. Due to limited computational resources and my constraints, I'm forced to cancel the execution within 30 minutes.
At present, I'm unsure of the root cause of the issue, as I haven't altered the preprocessing methods, the model, or the data itself.
However, I have noticed that the execution time increases significantly when adding the textual feature to the model. Currently, I'm unsure which aspect I should investigate.
preprocess of text features:
There's another important point to mention. I've added several log statements at different parts of the code to track where more time is being consumed. However, it seems that the code isn't entering the main section, preventing it from logging anything on the Azure server.
Is there a specific reason or factor causing this issue?