Forum Discussion

Rohitshah1990's avatar
Rohitshah1990
Copper Contributor
Nov 09, 2020

common pitfall of using data bricks with pandas and not spark

Hi Team,

 

I just want to understand what could be the common pitfall of using Pandas on Databricks instead of Spark. There are certain factors on which we have decided to go with Databricks instead of Azure AI platform (jupyter notebook).

  1. Experiment tracking using ML-ops
  2. Hyperparameter tuning with Spark trails which helps with parallelization 

I just wanted to understand that what could possibly go wrong if we train model on Databricks by just using pandas & sklearn .

  1. Deployment: we will deploy final model offline, will different env will cause an issue ?
  2. Cost: is AI platform supports points mentioned above, experiment tracking & parallel hyperparameter tuning 
  3. Ease of use 
  4. other advantage offered by AI platform (ex: automatic hyperparameter tuning)

I am new to Azure service, It will be really helpful if you can share detail answer of above points with your preference. (what you would have chosen and why?)

 

Thanks in Advance.

No RepliesBe the first to reply

Resources