Episode

Simplifying Data Analysis & Visualization with Developer Tools & AI | Python Data Science Day

with Nitya Narasimhan

Having data analysis and visualization skills is increasingly important in the new age of Large Language Models and generative AI. But how does a non-Python developer skill up rapidly with the tools & best practices required to achieve project goals, without having the benefit of years of Python or data science experience? This is where the right developer tooling, with a little bit of AI assistance, can help.

In this talk, we'll go from identifying an open-source data set, to analyzing it for insights and visualizing relevant outcomes, in 25 minutes - with just a GitHub account and an OpenAI endpoint. Along the way, we'll introduce you to a series of developer tools that make your journey easier:

  • Open Dataset: to ""analyze"" - from Kaggle, Hugging Face, or Azure
  • Data Wrangler: to ""sanitize"" data - extension from Visual Studio Code
  • Jupyter Notebook: to ""record"" process - for transferable learning
  • GitHub Codespaces: to ""pre-build"" environment - for consistent reuse
  • GitHub Copilot: to ""explain/fix"" code - for focused learning with AI help
  • Microsoft LIDA: to ""suggest/build"" visualization goals - for building your intuition with AI help

The talk comes with an associated repo that you can fork - then replace with your own dataset to extend or experiment on your own later. By the end of the talk you should have a sense of how you can go from discovering a data set to getting some visual insights about it, using existing tools with a little AI assistance.

Chapters

  • 00:00 - Simplifying Data Analysis & Visualization with Developer Tools & AI
  • 00:29 - Follow along
  • 00:54 - Introduction - Data Analysis Challenges & Goals
  • 04:44 - GitHub Codespaces - Reusable environments
  • 08:32 - Jupyter Notebooks - Make it reproducible
  • 11:18 - GitHub Copilot - AI-assisted learning
  • 14:43 - Visual Studio Code - Productivity extensions
  • 15:39 - Open Datasets - Data Wrangler
  • 19:15 - Resonsible AI toolkit - Model debugging for fairness
  • 21:13 - Project LIDA - AI-assisted intuition & visualization
  • 25:24 - Azure AI Studio - Paradigm shift to LLM Ops
  • 25:47 - Summary - Questions & Next Steps

Connect

  • Nitya Narasimhan | Twitter/X: @nitya

Developer
Python