Episode
Simplifying Data Analysis & Visualization with Developer Tools & AI | Python Data Science Day
with Nitya Narasimhan
Having data analysis and visualization skills is increasingly important in the new age of Large Language Models and generative AI. But how does a non-Python developer skill up rapidly with the tools & best practices required to achieve project goals, without having the benefit of years of Python or data science experience? This is where the right developer tooling, with a little bit of AI assistance, can help.
In this talk, we'll go from identifying an open-source data set, to analyzing it for insights and visualizing relevant outcomes, in 25 minutes - with just a GitHub account and an OpenAI endpoint. Along the way, we'll introduce you to a series of developer tools that make your journey easier:
- Open Dataset: to ""analyze"" - from Kaggle, Hugging Face, or Azure
- Data Wrangler: to ""sanitize"" data - extension from Visual Studio Code
- Jupyter Notebook: to ""record"" process - for transferable learning
- GitHub Codespaces: to ""pre-build"" environment - for consistent reuse
- GitHub Copilot: to ""explain/fix"" code - for focused learning with AI help
- Microsoft LIDA: to ""suggest/build"" visualization goals - for building your intuition with AI help
The talk comes with an associated repo that you can fork - then replace with your own dataset to extend or experiment on your own later. By the end of the talk you should have a sense of how you can go from discovering a data set to getting some visual insights about it, using existing tools with a little AI assistance.
Chapters
- 00:00 - Simplifying Data Analysis & Visualization with Developer Tools & AI
- 00:29 - Follow along
- 00:54 - Introduction - Data Analysis Challenges & Goals
- 04:44 - GitHub Codespaces - Reusable environments
- 08:32 - Jupyter Notebooks - Make it reproducible
- 11:18 - GitHub Copilot - AI-assisted learning
- 14:43 - Visual Studio Code - Productivity extensions
- 15:39 - Open Datasets - Data Wrangler
- 19:15 - Resonsible AI toolkit - Model debugging for fairness
- 21:13 - Project LIDA - AI-assisted intuition & visualization
- 25:24 - Azure AI Studio - Paradigm shift to LLM Ops
- 25:47 - Summary - Questions & Next Steps
Recommended resources
Related episodes
Connect
- Nitya Narasimhan | Twitter/X: @nitya
Having data analysis and visualization skills is increasingly important in the new age of Large Language Models and generative AI. But how does a non-Python developer skill up rapidly with the tools & best practices required to achieve project goals, without having the benefit of years of Python or data science experience? This is where the right developer tooling, with a little bit of AI assistance, can help.
In this talk, we'll go from identifying an open-source data set, to analyzing it for insights and visualizing relevant outcomes, in 25 minutes - with just a GitHub account and an OpenAI endpoint. Along the way, we'll introduce you to a series of developer tools that make your journey easier:
- Open Dataset: to ""analyze"" - from Kaggle, Hugging Face, or Azure
- Data Wrangler: to ""sanitize"" data - extension from Visual Studio Code
- Jupyter Notebook: to ""record"" process - for transferable learning
- GitHub Codespaces: to ""pre-build"" environment - for consistent reuse
- GitHub Copilot: to ""explain/fix"" code - for focused learning with AI help
- Microsoft LIDA: to ""suggest/build"" visualization goals - for building your intuition with AI help
The talk comes with an associated repo that you can fork - then replace with your own dataset to extend or experiment on your own later. By the end of the talk you should have a sense of how you can go from discovering a data set to getting some visual insights about it, using existing tools with a little AI assistance.
Chapters
- 00:00 - Simplifying Data Analysis & Visualization with Developer Tools & AI
- 00:29 - Follow along
- 00:54 - Introduction - Data Analysis Challenges & Goals
- 04:44 - GitHub Codespaces - Reusable environments
- 08:32 - Jupyter Notebooks - Make it reproducible
- 11:18 - GitHub Copilot - AI-assisted learning
- 14:43 - Visual Studio Code - Productivity extensions
- 15:39 - Open Datasets - Data Wrangler
- 19:15 - Resonsible AI toolkit - Model debugging for fairness
- 21:13 - Project LIDA - AI-assisted intuition & visualization
- 25:24 - Azure AI Studio - Paradigm shift to LLM Ops
- 25:47 - Summary - Questions & Next Steps
Recommended resources
Related episodes
Connect
- Nitya Narasimhan | Twitter/X: @nitya
Have feedback? Submit an issue here.