Explore CodeQL in GitHub

9 minutes

What is CodeQL

CodeQL is a semantic code analysis engine developed by GitHub that treats code as data. Instead of simply searching for text patterns, CodeQL understands the structure and meaning of your code, enabling sophisticated security and quality analysis.

Traditional static analysis tools often produce false positives because they use simple pattern matching. CodeQL's semantic approach understands code context, relationships between code elements, and data flow through applications, resulting in more accurate vulnerability detection.

Key characteristics of CodeQL

CodeQL treats code as a database:

Structural representation: Converts source code into a database that captures syntax trees, control flow graphs, and data flow paths.
Queryable format: Makes code queryable using a specialized query language, similar to querying a traditional database.
Language-agnostic approach: Supports multiple programming languages including C/C++, C#, Java, JavaScript/TypeScript, Python, Ruby, Go, and Swift.
Comprehensive coverage: Analyzes not just individual files but entire codebases with all their interconnections.

CodeQL enables precise security analysis:

Variant analysis: Once you identify one vulnerability, you can write queries to find similar issues across your entire codebase.
Data flow analysis: Tracks how data moves through your application from sources (user input) to sinks (sensitive operations).
Taint tracking: Identifies when untrusted data reaches sensitive operations without proper validation or sanitization.
Control flow analysis: Understands execution paths and conditional logic to find vulnerabilities that only occur under specific conditions.

How CodeQL works

CodeQL analysis involves three distinct phases that transform source code into actionable security findings:

Phase 1: Create a CodeQL database

The first step extracts a structured representation of your code:

Code extraction: Analyzes your source files during compilation or through static analysis.
Database creation: Builds a comprehensive database representing your code's structure, including abstract syntax trees, control flow graphs, and data dependencies.
Metadata capture: Records file locations, line numbers, variable scopes, function calls, and class hierarchies.
Optimization: Indexes the database for efficient querying, even on large codebases.

This database becomes the foundation for all subsequent analysis. It's created once and can be queried multiple times, making iterative security analysis efficient.

Phase 2: Run CodeQL queries

Once the database exists, you execute queries to find security issues:

Standard query packs: GitHub provides curated query sets for common vulnerabilities (OWASP Top 10, CWE standards).
Custom queries: Write your own queries to find organization-specific security patterns or coding standards violations.
Query execution: The CodeQL engine runs queries against the database, searching for code patterns that match vulnerability signatures.
Performance: Queries execute quickly because they operate on the indexed database rather than raw source files.

Example query categories include:

Injection vulnerabilities: SQL injection, command injection, cross-site scripting.
Authentication issues: Weak password policies, missing authentication checks, insecure session management.
Cryptography problems: Weak algorithms, hard-coded credentials, insufficient randomness.
Resource management: Memory leaks, resource exhaustion, uncontrolled resource consumption.

Phase 3: Interpret results

The final phase presents findings in actionable formats:

Result ranking: CodeQL prioritizes findings by severity, confidence level, and exploitability.
Contextual information: Each finding includes file locations, line numbers, affected code snippets, and data flow paths.
Remediation guidance: Results include explanations of the vulnerability and recommendations for fixing it.
Integration: Results integrate with GitHub Security tab, pull request annotations, and SARIF files for external tools.

The CodeQL query language

CodeQL queries are written in a declarative language specifically designed for analyzing code:

Query structure and syntax

CodeQL uses object-oriented logic programming:

Classes and predicates: Define what you're searching for using classes that represent code elements (functions, variables, expressions).
Declarative approach: Describe what you want to find rather than how to find it.
Pattern matching: Use predicates to match code patterns and relationships.
Composability: Build complex queries by combining simpler predicates.

Example query structure:

import javascript

from SqlExecution sql, Source source
where source.flowsTo(sql.getAnArgument())
select sql, "SQL query vulnerable to injection from $@.", source, "user input"

This query finds SQL injection vulnerabilities by:

Identifying SQL execution points.
Finding sources of user input.
Tracking data flow from input to SQL execution.
Reporting vulnerabilities with context.

Standard query libraries

GitHub provides extensive query libraries:

Security queries: Detect OWASP Top 10 vulnerabilities, CWE categories, and language-specific security issues.
Code quality queries: Find code smells, maintainability issues, performance problems, and best practice violations.
Community contributions: Thousands of queries contributed by security researchers and developers.
Regular updates: GitHub Security Lab continuously adds new queries for emerging vulnerabilities.

You can use these queries as-is or customize them for your specific needs.

CodeQL in GitHub security

CodeQL integrates deeply with GitHub's security features:

Code scanning with CodeQL

Automated security analysis:

Default setup: Enable CodeQL scanning with one click in repository settings.
Scheduled scans: Automatically scan on every push, pull request, or on a schedule.
Multi-language support: Automatically detects languages in your repository and runs appropriate queries.
Result presentation: Security findings appear in the Security tab with detailed explanations.

Pull request integration:

Inline annotations: Security findings appear as comments directly on the vulnerable lines of code in pull requests.
Blocking checks: Configure CodeQL as a required check that must pass before merging.
Differential scanning: Only reports new vulnerabilities introduced by the pull request, reducing noise.
Developer feedback: Developers see security issues immediately while code is fresh in their minds.

GitHub Advanced Security

For organizations, GitHub Advanced Security provides additional capabilities:

Private repository scanning: Run CodeQL on private repositories.
Custom query execution: Upload and run organization-specific queries.
Security overview: Dashboard showing security posture across all repositories.
Alert management: Triage, assign, and track security findings across teams.

Using CodeQL in CI/CD pipelines

CodeQL extends beyond GitHub to integrate with various CI/CD systems:

Integration approaches

GitHub Actions integration:

- name: Initialize CodeQL
  uses: github/codeql-action/init@v2
  with:
    languages: javascript, python

- name: Perform CodeQL Analysis
  uses: github/codeql-action/analyze@v2

This workflow:

Initializes CodeQL for specified languages.
Builds your application (or analyzes without building).
Runs security queries.
Uploads results to GitHub Security tab.

Azure Pipelines integration:

CodeQL can run in Azure Pipelines using the command-line interface:

Install CodeQL CLI: Download and install the CodeQL bundle in your pipeline.
Create database: Run codeql database create during your build.
Analyze database: Execute codeql database analyze with selected query packs.
Export results: Generate SARIF files for visualization in Azure DevOps.

Other CI/CD systems:

CodeQL CLI supports any CI/CD platform:

Jenkins: Execute CodeQL analysis as build steps.
GitLab CI/CD: Run CodeQL in GitLab pipelines with SARIF output.
CircleCI: Integrate CodeQL scans into CircleCI workflows.
Custom systems: Use CodeQL CLI from any environment that can run command-line tools.

Security gates

Use CodeQL results as quality gates:

Fail builds: Configure pipelines to fail when CodeQL finds high-severity vulnerabilities.
Trend analysis: Track security metrics over time to measure improvement.
Compliance requirements: Generate evidence of security scanning for audits and compliance certifications.
Automatic remediation: Trigger automated workflows when specific vulnerabilities are detected.

CodeQL development tools

CodeQL provides tools for creating and testing queries:

Visual Studio Code extension

The official CodeQL extension for VS Code offers:

Query development: Write and test CodeQL queries with syntax highlighting, autocomplete, and inline documentation.
Local database analysis: Run queries against databases created from local codebases.
Result visualization: See query results with source code navigation and data flow paths.
Debugging support: Step through query execution to understand results and optimize performance.

Command-line interface

The CodeQL CLI enables scriptable analysis:

Database creation: codeql database create extracts code into queryable format.
Query execution: codeql database analyze runs queries and generates results.
Test queries: codeql test run validates queries against test cases.
Pack management: Download and manage standard query packs.

Benefits of CodeQL for security automation

Integrating CodeQL into your DevSecOps process provides significant advantages:

Developer productivity

Early detection:

Shift-left security: Find vulnerabilities during development rather than in production.
Faster remediation: Fix issues when code is fresh and changes are small.
Learning opportunities: Developers learn secure coding practices from immediate feedback.
Reduced context switching: Security findings appear in familiar development tools.

Accurate results:

Low false positives: Semantic analysis produces more accurate findings than pattern matching.
Contextual information: Results include data flow paths showing exactly how vulnerabilities occur.
Prioritized findings: Focus on exploitable issues rather than theoretical concerns.
Variant discovery: Find all instances of a vulnerability pattern, not just obvious examples.

Organizational security

Comprehensive coverage:

Entire codebase: Analyze all code, including third-party dependencies and legacy components.
Multiple languages: Uniform security analysis across polyglot applications.
Consistent standards: Apply the same security rules across all repositories.
Historical analysis: Scan existing code to establish security baselines.

Scalable security:

Automated scanning: No manual security reviews required for every commit.
Continuous monitoring: Regular scans detect newly disclosed vulnerabilities.
Security as code: Codify security requirements as queries stored in version control.
Knowledge sharing: Query libraries capture institutional security knowledge.

Compliance and governance

Audit trails:

Scan history: Record of all security scans with timestamps and results.
Finding lifecycle: Track vulnerabilities from detection through remediation.
Policy enforcement: Demonstrate that security scans occur for every release.
Evidence generation: Produce reports for auditors and compliance frameworks.

For more information about CodeQL, see CodeQL Overview.

For the available tools, see CodeQL Tools.

Feedback

Was this page helpful?