Optimizing 500MB+ GeoJSON Datasets for Mapbox: A Performance Journey

February 2025

The Challenge

The Seismic Risk Information System needed to visualize 7,000+ critical infrastructure buildings across Kazakhstan—schools, hospitals, emergency facilities—each with detailed seismic vulnerability data.

Initial implementation problems:

  • GeoJSON file: 500MB+ uncompressed
  • Initial load time: 35 seconds on government networks
  • Browser crashes on devices with <4GB RAM
  • Complete UI freeze during data parsing
  • Unusable for emergency response scenarios

This was unacceptable. Emergency officials needed instant access to infrastructure data, not a 35-second loading screen.

Understanding the Problem

Let's break down what was happening:

// ❌ The problematic approach
const response = await fetch('/api/buildings.geojson')
const geojson = await response.json() // 500MB parsed in browser!

map.addSource('buildings', {
  type: 'geojson',
  data: geojson
})

Why this failed:

  1. Massive payload - Downloading 500MB over government networks
  2. JSON parsing - Browser parsing 500MB JSON freezes UI
  3. Memory pressure - Keeping entire dataset in memory
  4. No progressive loading - All-or-nothing approach
  5. Poor zoom performance - Rendering 7,000 markers at once

Solution 1: Vector Tiles

The game-changer was switching from raw GeoJSON to vector tiles (PBF format).

What are Vector Tiles?

Vector tiles are pre-processed chunks of geographic data, divided by zoom level and geographic area. Instead of sending everything at once, the map only loads tiles for the current viewport.

Implementation

Step 1: Generate vector tiles from GeoJSON

# Install Tippecanoe (vector tile generator)
brew install tippecanoe

# Generate tiles
tippecanoe -o buildings.mbtiles \
  --minimum-zoom=5 \
  --maximum-zoom=14 \
  --drop-densest-as-needed \
  --extend-zooms-if-still-dropping \
  buildings.geojson

Step 2: Serve tiles

// api/tiles/[z]/[x]/[y].js (Next.js API route)
import fs from 'fs'
import MBTiles from '@mapbox/mbtiles'

export default async function handler(req, res) {
  const { z, x, y } = req.query
  
  const mbtiles = new MBTiles(`./data/buildings.mbtiles`)
  const tile = await mbtiles.getTile(z, x, y)
  
  res.setHeader('Content-Type', 'application/x-protobuf')
  res.setHeader('Content-Encoding', 'gzip')
  res.send(tile)
}

Step 3: Update map source

// ✅ Vector tile source
map.addSource('buildings', {
  type: 'vector',
  tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'],
  minzoom: 5,
  maxzoom: 14
})

map.addLayer({
  id: 'buildings-layer',
  type: 'circle',
  source: 'buildings',
  'source-layer': 'buildings',
  paint: {
    'circle-radius': [
      'interpolate', ['linear'], ['zoom'],
      10, 3,
      14, 8
    ],
    'circle-color': [
      'match',
      ['get', 'risk_level'],
      'high', '#ef4444',
      'medium', '#f59e0b',
      'low', '#10b981',
      '#6b7280'
    ]
  }
})

Results from vector tiles alone:

  • Initial load: 35s → 3s (91% faster)
  • Data transfer: 500MB → 50kB initial tiles
  • Memory usage: 500MB → ~10MB active tiles
  • No browser crashes ✅

Solution 2: Clustering for Dense Areas

At lower zoom levels, rendering thousands of individual markers is inefficient. Clustering groups nearby points.

map.addSource('buildings', {
  type: 'vector',
  tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'],
  cluster: true,
  clusterMaxZoom: 12, // Cluster up to zoom 12
  clusterRadius: 50 // Cluster radius in pixels
})

// Cluster circles
map.addLayer({
  id: 'clusters',
  type: 'circle',
  source: 'buildings',
  filter: ['has', 'point_count'],
  paint: {
    'circle-color': [
      'step',
      ['get', 'point_count'],
      '#51bbd6', 100,
      '#f1f075', 500,
      '#f28cb1'
    ],
    'circle-radius': [
      'step',
      ['get', 'point_count'],
      20, 100,
      30, 500,
      40
    ]
  }
})

// Cluster counts
map.addLayer({
  id: 'cluster-count',
  type: 'symbol',
  source: 'buildings',
  filter: ['has', 'point_count'],
  layout: {
    'text-field': '{point_count_abbreviated}',
    'text-size': 12
  }
})

// Individual points (shown at higher zoom)
map.addLayer({
  id: 'unclustered-point',
  type: 'circle',
  source: 'buildings',
  filter: ['!', ['has', 'point_count']],
  paint: {
    'circle-radius': 6,
    'circle-color': '#11b4da'
  }
})

Clustering results:

  • Zoom 5-9: Render 50-100 clusters instead of 7,000 points
  • Smooth 60fps panning and zooming
  • Instant visual feedback when zooming in

Solution 3: Progressive Data Loading

Even with vector tiles, we can optimize further by loading data strategically.

Viewport-based Filtering

function loadVisibleBuildings() {
  const bounds = map.getBounds()
  
  const visibleFeatures = map.queryRenderedFeatures(
    map.getBounds(),
    { layers: ['buildings-layer'] }
  )
  
  // Load detailed data only for visible buildings
  const ids = visibleFeatures.map(f => f.properties.id)
  fetchDetailedData(ids)
}

// Load when map movement stops
map.on('moveend', debounce(loadVisibleBuildings, 300))

Priority-based Loading

// Load critical buildings first (hospitals, emergency)
const priorityLevels = {
  hospital: 1,
  fire_station: 1,
  police: 2,
  school: 3,
  residential: 4
}

async function loadBuildingsByPriority() {
  for (const [type, priority] of Object.entries(priorityLevels)) {
    await loadBuildingsOfType(type)
    updateMap()
  }
}

Solution 4: Caching Strategy

Browser Caching

// Service Worker for offline support
self.addEventListener('fetch', (event) => {
  if (event.request.url.includes('/tiles/')) {
    event.respondWith(
      caches.open('map-tiles-v1').then((cache) => {
        return cache.match(event.request).then((response) => {
          return response || fetch(event.request).then((response) => {
            cache.put(event.request, response.clone())
            return response
          })
        })
      })
    )
  }
})

Server-side Caching

// Redis cache for frequently accessed tiles
import { Redis } from '@upstash/redis'

const redis = new Redis({
  url: process.env.REDIS_URL,
  token: process.env.REDIS_TOKEN
})

export default async function handler(req, res) {
  const { z, x, y } = req.query
  const cacheKey = `tile:${z}:${x}:${y}`
  
  // Check cache first
  const cached = await redis.get(cacheKey)
  if (cached) {
    res.setHeader('X-Cache', 'HIT')
    return res.send(Buffer.from(cached, 'base64'))
  }
  
  // Generate tile
  const tile = await generateTile(z, x, y)
  
  // Cache for 24 hours
  await redis.set(cacheKey, tile.toString('base64'), { ex: 86400 })
  
  res.setHeader('X-Cache', 'MISS')
  res.send(tile)
}

Solution 5: Optimized Layer Styling

Reduce GPU load with efficient styling:

// ❌ Inefficient: Multiple layers with filters
map.addLayer({
  id: 'high-risk',
  filter: ['==', 'risk', 'high'],
  paint: { 'circle-color': 'red' }
})
map.addLayer({
  id: 'medium-risk',
  filter: ['==', 'risk', 'medium'],
  paint: { 'circle-color': 'orange' }
})

// ✅ Efficient: Single layer with data-driven styling
map.addLayer({
  id: 'buildings',
  paint: {
    'circle-color': [
      'match',
      ['get', 'risk'],
      'high', '#ef4444',
      'medium', '#f59e0b',
      'low', '#10b981',
      '#6b7280'
    ]
  }
})

Final Results

Performance Metrics

| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Initial load time | 35s | 10s | 70% faster | | Data transfer (initial) | 500MB | 50kB | 99.99% less | | Memory usage | 500MB | 10MB | 98% less | | FPS (panning) | 15fps | 60fps | 300% better | | Time to first render | 38s | 1.2s | 97% faster |

User Impact

  • Zero browser crashes since deployment
  • Field workers can now use the system on mobile devices
  • Emergency response time improved due to instant map access
  • Positive feedback from 500+ government users
  • System handles peak loads during emergency drills without degradation

Key Takeaways

  1. Vector tiles are essential for large datasets - Don't try to load 500MB+ GeoJSON directly. Use vector tiles or consider Mapbox's hosted solution.

  2. Clustering improves performance dramatically - Reduce rendered features from thousands to hundreds at lower zoom levels.

  3. Progressive loading beats all-or-nothing - Load critical data first, details on-demand.

  4. Cache aggressively - Tiles rarely change; cache them in service workers, Redis, and CDN.

  5. Optimize styling - Use data-driven styling instead of multiple filtered layers.

  6. Test on real networks - Simulate 3G connections; what works on fiber doesn't work in the field.

Tools & Resources

Vector Tile Generation:

Performance Analysis:

Caching:

What's Next?

In my next article, I'll cover building real-time WebSocket systems that handle 200+ concurrent users with sub-second latency, based on my work on the Hospital Capacity Monitoring system.

Questions about GeoJSON optimization or Mapbox performance? Reach out on LinkedIn or check out the code examples on GitHub.


This optimization was implemented for the Seismic Risk Information System, visualizing 7,000+ critical infrastructure buildings for Kazakhstan's Ministry of Emergency Situations.


© 2025 Arlen. All rights reserved.