Reading Spatial Data in Python
Working with geospatial data locally offers numerous advantages including faster processing, enhanced privacy, and the ability to work without an internet connection. This guide focuses on two critical spatial data formats: GeoTIFF and NetCDF files, providing comprehensive instructions for setting up your environment, loading data, visualization techniques, and advanced spatial operations.
Whether you’re analysing remote sensing imagery, climate data, or creating specialised maps, these step-by-step instructions will help you build a robust offline geospatial workflow in Python. By the end of this guide, you’ll be able to confidently manipulate spatial data, create informative visualisations, and perform common geographic operations such as clipping to specific regions.
- Python Setup
- Reading GeoTiff
- Reading NetCDF
Geospatial Python Environment Setup (Windows)
This guide will walk you through setting up a complete geospatial Python environment on Windows, with all the essential libraries for geospatial data analysis, visualisation, and processing.
1. Install Anaconda (Python 3.10 recommended)
- Go to: https://www.anaconda.com/products/distribution
- Download the Windows installer (64-bit) for Python 3.10
- Run the installer:
- ✅ Add Anaconda to PATH (optional but helpful)
- ✅ Register Anaconda as default Python (recommended)
2. Create a New Conda Environment (geo_env)
- Open Anaconda Prompt (search it from Start Menu)
- Run the following commands:
conda create -n geo_env python=3.10
conda activate geo_env
3. Install Required GeoPython Packages
Install all required packages in one line:
conda install -c conda-forge rasterio geopandas xarray geemap localtileserver jupyterlab notebook earthengine-api ipykernel
This installs everything:
rasterio,geopandas,xarray,geemap,localtileserver, etc.jupyter notebookandjupyterlabfor developmentearthengine-apiif using Google Earth Engine
4. Add geo_env to Jupyter Notebook Kernels
So Jupyter can use the new environment:
python -m ipykernel install --user --name=geo_env --display-name "Python (geo_env)"
5. Launch Jupyter Notebook
Still inside the same terminal:
jupyter notebook
- A browser window will open
- Choose: New → Python (geo_env)
- Now you’re running inside the correct environment!
✅ Test Your Setup
Create a new notebook and run:
import rasterio
import geemap
print("Everything is working ✅")
📌 Notes
- Always activate your environment before working:
conda activate geo_env
jupyter notebook
- To stop the notebook server, just press
Ctrl + Cin the terminal.
6. Troubleshooting Common Issues
Package Installation Errors
If you encounter errors during package installation, try installing packages one by one:
conda install -c conda-forge rasterio
conda install -c conda-forge geopandas
conda install -c conda-forge xarray
# and so on...
GDAL/Proj Issues
Some geospatial libraries depend on GDAL. If you face GDAL-related errors:
conda install -c conda-forge gdal
Memory Errors During Installation
If installation fails due to memory issues, try closing other applications or installing packages individually.
Python Version Conflicts
If you need a specific Python version for compatibility:
conda create -n geo_env python=3.9 # or another version
Updating Your Environment
To update all packages to the latest versions:
conda update --all
Alternative: Using pip
If conda installation doesn’t work for certain packages, pip can be used as a fallback:
pip install rasterio geopandas xarray geemap
However, for geospatial packages with complex dependencies like GDAL, conda is generally more reliable.
Additional Useful Packages
You might want to add these packages to your environment depending on your needs:
conda install -c conda-forge matplotlib seaborn plotly contextily folium rasterstats
matplotlib&seaborn– For plotting and visualizationplotly– For interactive plotscontextily– For adding basemaps to plotsfolium– For interactive mapsrasterstats– For raster analysis
🔗 Extra: Setting Up Google Earth Engine Authentication
If you’re using Google Earth Engine (GEE), you’ll need to authenticate:
- Make sure you have a GEE account (sign up at https://earthengine.google.com/)
- Run the following in your Python environment:
import ee
ee.Authenticate() # Follow the instructions in your browser
ee.Initialize()
After the first authentication, you can just use ee.Initialize() in future sessions.
GeoTIFF Data Handling and Visualisation
This tutorial demonstrates how to work with GeoTIFF data offline using Python. You’ll learn how to download, read, visualise, and manipulate geospatial raster data – all while working locally on your machine. This approach is ideal for environments with limited internet connectivity or when working with large datasets.
Prerequisites – – Verify Package Installation
To ensure all packages are correctly installed:
# Test importing each package
import rasterio
import geopandas
import matplotlib
import numpy
import requests
print("All packages imported successfully!")Now you’re ready to proceed with the tutorial on GeoTIFF data handling.
Tutorial Workflow1. Download GeoTIFF from Our Platform and Save Locally
First, let’s download a sample GeoTIFF file and save it to your local machine. There are two ways:
- Download directly from our platform
- Download using python code
Figure: How to get the link from our platform to be used in Python
import requests
# Direct download link from the platform
url = "https://dl.dropboxusercontent.com/scl/fi/ijfjpdntc1qv068wjlbbb/AvgSurfT.tif?rlkey=wx9lgdgkcf2ahrymemln1y37v&dl=0"
output_path = "AvgSurfT.tif"
# Download and save locally
response = requests.get(url)
with open(output_path, "wb") as f:
f.write(response.content)
print("File downloaded and saved as:", output_path)
2. Read the GeoTIFF and Explore Its Properties
Now that we have the file locally, let’s examine its metadata and properties:
import rasterio
# Open the GeoTIFF file
with rasterio.open("AvgSurfT.tif") as dataset:
print("Metadata:", dataset.profile)
print("CRS:", dataset.crs)
print("Number of bands:", dataset.count)
This will output information about the file’s coordinate reference system (CRS), dimensions, and other properties.
3. Read Band Data and Compute Statistics
Let’s extract data from the first band and calculate some basic statistics:
import numpy as np
with rasterio.open("AvgSurfT.tif") as dataset:
band1 = dataset.read(1).astype("float64") # Read first band
valid_data = band1[~np.isnan(band1)] # Mask NaNs
stats = {
'min': float(np.min(valid_data)),
'max': float(np.max(valid_data)),
'mean': float(np.mean(valid_data)),
'std': float(np.std(valid_data))
}
print("Band 1 Statistics:", stats)
4. Visualise Band Data with a Colormap
We can create a simple visualisation of our temperature data:
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 8))
plt.imshow(band1, cmap='hot_r', vmin=280, vmax=320)
plt.colorbar(label="Temperature (K)")
plt.title("AvgSurfT - Band 1")
plt.xlabel("X (pixels)")
plt.ylabel("Y (pixels)")
plt.show()
Figure: Basic visualisation of Surface temperature data
5. Replace Pixel Coordinates with Geographic Coordinates
To make the visualisation more meaningful, let’s use longitude and latitude instead of pixel coordinates:
with rasterio.open("AvgSurfT.tif") as src:
band1 = src.read(1)
transform = src.transform
from rasterio.transform import xy
# Get dimensions
nrows, ncols = band1.shape
# Get lon/lat of top-left and bottom-right
lon_min, lat_max = xy(transform, 0, 0)
lon_max, lat_min = xy(transform, nrows - 1, ncols - 1)
# Create extent
extent = [lon_min, lon_max, lat_min, lat_max]
plt.figure(figsize=(10, 6))
plt.imshow(band1, cmap='hot_r', vmin=280, vmax=320, extent=extent)
plt.colorbar(label="Temperature (K)")
plt.title("AvgSurfT - Band 1")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
Figure: Temperature data with geographic coordinates
6. Plot All Bands
Let’s visualize all bands in the GeoTIFF file:
# Open the dataset
with rasterio.open("AvgSurfT.tif") as dataset:
num_bands = dataset.count # Number of bands
transform = dataset.transform
height = dataset.height
width = dataset.width
# Get the geographic extent
lon_min, lat_max = xy(transform, 0, 0)
lon_max, lat_min = xy(transform, height - 1, width - 1)
extent = [lon_min, lon_max, lat_min, lat_max]
# Create subplots for each band
fig, axes = plt.subplots(1, num_bands, figsize=(5 * num_bands, 6))
if num_bands == 1:
axes = [axes] # Ensure axes is always iterable
for i in range(num_bands):
band = dataset.read(i + 1) # Read band (1-based index)
ax = axes[i]
im = ax.imshow(band, cmap='hot_r', vmin=280, vmax=320, extent=extent)
ax.set_title(f"Band {i + 1}")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
plt.colorbar(im, ax=ax, shrink=1)
plt.tight_layout()
plt.show()
Figure: All bands visualisation
7. Clip the GeoTIFF to a Region (Thailand Boundary)
Now let’s work with a more specific region by clipping the data to Bangkok:
import geopandas as gpd
from rasterio.mask import mask
# Load the GADM level-1 shapefile for Thailand
thailand = gpd.read_file("path/to/your/shapefile.shp") # Update with your path
# Clip only to a specific province
bangkok = thailand[thailand["shapeName"] == "Bangkok"]
# Open GeoTIFF and perform clipping
with rasterio.open("AvgSurfT.tif") as src:
if bangkok.crs != src.crs:
bangkok = bangkok.to_crs(src.crs)
# Clip the raster
out_image, out_transform = mask(src, bangkok.geometry, crop=True)
out_meta = src.meta.copy()
# Update metadata for output
out_meta.update({
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform
})
8. Save and Visualize the Clipped Output
Finally, let’s save our clipped data and create a visualization with the Bangkok boundary:
# Save clipped GeoTIFF
with rasterio.open("AvgSurfT_bkk.tif", "w", **out_meta) as dest:
dest.write(out_image)
# Get number of rows and columns
nrows, ncols = out_image.shape[1], out_image.shape[2]
# Get the bounding coordinates using the affine transform
lon_min, lat_max = xy(out_transform, 0, 0)
lon_max, lat_min = xy(out_transform, nrows - 1, ncols - 1)
# Create extent for imshow (left, right, bottom, top)
extent = [lon_min, lon_max, lat_min, lat_max]
# Plot raster and boundary
fig, ax = plt.subplots(figsize=(10, 6))
raster = ax.imshow(out_image[0], cmap='hot_r', vmin=300, vmax=320, extent=extent)
bangkok.boundary.plot(ax=ax, edgecolor='black', linewidth=0.5)
plt.colorbar(raster, ax=ax, label="Temperature (K)")
ax.set_title("AvgSurfT - Clipped to Bangkok with Boundary")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_axis_on()
plt.savefig("AvgSurfT_bkk_with_boundary.png", dpi=300)
plt.show()
Figure: Clipped Bangkok temperature data with boundary
Conclusion
This tutorial has demonstrated how to work with GeoTIFF data offline using Python. You’ve learned how to:
- Download and save geospatial data locally
- Read and explore GeoTIFF properties
- Calculate basic statistics on raster data
- Create visualizations with proper geographic coordinates
- Clip raster data to specific regions
- Save and export your results
For questions or additional information, please leave a message below!
- Or download Python Jupyter notebook here: Jupyter notebook
Note: This tutorial assumes you have basic familiarity with Python programming and geospatial concepts. For more detailed explanations of specific functions, please refer to the documentation for rasterio, GeoPandas, and Matplotlib.
Processing and Visualising NetCDF Data
This guide provides comprehensive instructions for working with NetCDF (.nc) files in Python, including data loading, analysis, and creating various visualisations with a focus on meteorological and geospatial data.
Prerequisites — Download NetCDF file from our platform:
1. Installation of Required Libraries
First, install the necessary Python libraries:
pip install xarray netCDF4 matplotlib numpy scipy cartopy geopandas
xarray: For working with labeled multi-dimensional arrays (essential for NetCDF files)netCDF4: Backend for xarray to work with NetCDF filesmatplotlib: For creating visualizationsnumpy: For numerical operationsscipy: For statistical analysiscartopy: For advanced mapping capabilities (optional but recommended)geopandas: For working with geospatial data
2. Loading and Exploring NetCDF Data
import xarray as xr
import os
import matplotlib.pyplot as plt
import geopandas as gpd
import numpy as np
import pandas as pd
# Path to your NetCDF file
nc_file_path = r"C:\Users\ASUS\Jupyter\ALICE\data\AvgSurfT.nc"
# Open the dataset
ds = xr.open_dataset(nc_file_path)
# Explore the dataset structure
print("Dataset overview:")
print(ds)
# View available variables in the dataset
print(f"Available variables: {list(ds.data_vars)}")
# Extract the variable of interest
var_data = ds['AvgSurfT'] # Replace with your variable name if different
# Examine metadata and attributes
print(f"Variable attributes: {var_data.attrs}")
print(f"Dimensions: {var_data.dims}")
print(f"Coordinates: {var_data.coords}")
print(f"Shape: {var_data.shape}")
3. Basic Statistics and Data Exploration
# Calculate basic statistics
ds_mean_value = var_data.mean().values
ds_min_value = var_data.min().values
ds_max_value = var_data.max().values
ds_std_dev = var_data.std().values
# Get time range
min_date = str(var_data.time.values.min())[:10]
max_date = str(var_data.time.values.max())[:10]
# Display statistics
print(f"Time range: {min_date} to {max_date}")
print(f"Mean: {ds_mean_value:.4f} {var_data.attrs.get('units', '')}")
print(f"Min: {ds_min_value:.4f} {var_data.attrs.get('units', '')}")
print(f"Max: {ds_max_value:.4f} {var_data.attrs.get('units', '')}")
print(f"Std Dev: {ds_std_dev:.4f} {var_data.attrs.get('units', '')}")
# Additional spatial statistics
print("\nSpatial Analysis:")
spatial_mean = var_data.mean(dim=['lat', 'lon']).values
print(f"Mean across all locations: {spatial_mean.mean():.4f}")
# Temporal statistics
print("\nTemporal Analysis:")
if 'time' in var_data.dims:
temporal_mean = var_data.mean(dim='time')
print(f"Mean temporal value shape: {temporal_mean.shape}")
print(f"Max value in temporal mean: {temporal_mean.max().values:.4f}")
print(f"Min value in temporal mean: {temporal_mean.min().values:.4f}")
4. Spatial Visualisation with Geographic Boundaries
# Load Thailand boundary
thailand_boundary = gpd.read_file("https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_THA_0.json")
# Create a figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 6), constrained_layout=True)
# Historical spatial average (left subplot)
temp = ds # Using the original dataset
temp['AvgSurfT'].mean(dim="time").plot(
cmap="hot_r", ax=axes[0], vmin=290, vmax=310,
cbar_kwargs={"label": "Historical AvgSurfT (K)"}
)
thailand_boundary.plot(edgecolor="black", facecolor="none", lw=0.7, ax=axes[0])
axes[0].set_title(f"Historical AvgSurfT\n{min_date} to {max_date}")
axes[0].set_xlabel("Longitude")
axes[0].set_ylabel("Latitude")
# Standard deviation plot (right subplot)
temp['AvgSurfT'].std(dim="time").plot(
cmap="viridis", ax=axes[1],
cbar_kwargs={"label": "AvgSurfT Standard Deviation (K)"}
)
thailand_boundary.plot(edgecolor="black", facecolor="none", lw=0.7, ax=axes[1])
axes[1].set_title(f"AvgSurfT Variability\n{min_date} to {max_date}")
axes[1].set_xlabel("Longitude")
axes[1].set_ylabel("Latitude")
# Save plot
fig.savefig(r"C:\Users\ASUS\Jupyter\ALICE\data\AvgSurfT\ncAvgSurfT_map.png", dpi=300, bbox_inches='tight')
plt.show()
5. Multiple Time Step Visualisation
# Get the number of available time steps
time_steps = len(var_data.time)
# Determine how many to plot (up to 6)
n_plots = min(time_steps, 6)
# Create a 1xn subplot
fig, axes = plt.subplots(nrows=1, ncols=n_plots, figsize=(4*n_plots, 4), constrained_layout=True)
# If only one subplot, make axes iterable
if n_plots == 1:
axes = [axes]
# Plot each time step in its subplot
for i, time_idx in enumerate(range(n_plots)):
ax = axes[i]
time_value = var_data.time.values[time_idx]
time_data = var_data.sel(time=time_value)
# Plot temperature and overlay boundary
time_data.plot(ax=ax, cmap="hot_r", vmin=280, vmax=315,
cbar_kwargs={'label': '' if i < n_plots-1 else 'AvgSurfT (K)'}) # Only last plot shows colorbar label
thailand_boundary.plot(ax=ax, edgecolor="black", facecolor="none", linewidth=0.7)
# Format time string for title
time_str = str(time_value)[:10] # Get YYYY-MM-DD format
ax.set_title(f"{time_str}")
ax.set_xlabel("")
ax.set_ylabel("" if i > 0 else "Latitude")
# Add longitude label only to the bottom plots
if i == n_plots // 2:
ax.set_xlabel("Longitude")
# Global title
plt.suptitle(f"AvgSurfT Maps ({min_date} to {max_date})", fontsize=16, y=1.05)
# Save the plot
plt.savefig(r"C:\Users\ASUS\Jupyter\ALICE\data\AvgSurfT\ncAvgSurfT_allmap.png", dpi=300, bbox_inches='tight')
plt.show()
6. Clip the NetCDF to a Region (Bangkok Boundary)
To properly clip NetCDF data to the Bangkok boundary, we need to ensure our geospatial data is correctly referenced:
import rioxarray as rio
# Ensure NetCDF data has proper spatial dimensions and CRS
ds_geo = temp['AvgSurfT'].rio.set_spatial_dims(x_dim="lon", y_dim="lat")
ds_geo = ds_geo.rio.write_crs("EPSG:4326") # Set to WGS84 coordinate system
# Filter GeoDataFrame to select only Bangkok
thailand_boundary = gpd.read_file("https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_THA_1.json")
bangkok = thailand_boundary[thailand_boundary["NAME_1"].isin(["BangkokMetropolis"])]
# Check that the boundary is in the correct CRS
if bangkok.crs != "EPSG:4326":
bangkok = bangkok.to_crs("EPSG:4326")
# Clip the NetCDF data to the Bangkok boundary
bangkok_data = ds_geo.rio.clip(bangkok.geometry, bangkok.crs)
# Calculate temporal mean for visualization
bangkok_mean = bangkok_data.mean(dim='time')
# Visualize the clipped data
plt.figure(figsize=(10, 8))
im = bangkok_mean.plot(cmap='hot_r', vmin=300, vmax=315, add_colorbar=False) # Remove default color bar
plt.colorbar(im, shrink=0.8) # Add custom-sized color bar (adjust shrink value as needed)
bangkok.boundary.plot(edgecolor='black', linewidth=1.5, ax=plt.gca())
plt.title(f"Average Surface Temperature in Bangkok\n{min_date} to {max_date}")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.tight_layout()
# Save the clipped data if needed
plt.savefig("Bangkok_AvgSurfT.png", dpi=300, bbox_inches='tight')
plt.show()
Summary and Conclusion
When working with NetCDF (.nc) files in Python, remember these key steps:
- Initial Exploration: Open the file with xarray and examine its structure, variables, and dimensions.
- Data Extraction: Select the specific variables and regions of interest from the dataset.
- Statistical Analysis: Calculate basic statistics and perform temporal/spatial analysis.
- Visualisation: Create meaningful plots to visualise your data (time series, maps, etc.).
- Data Export: Save processed data and results in appropriate formats.
By following this guide, you should be able to effectively process, analyse, and visualise data from NetCDF files for meteorological, climate, and geospatial applications.
Remember to always check the coordinate systems, units, and metadata of your NetCDF files, as these can vary depending on the source and type of data.
- To download the python script please click:Python Time Series
This concludes our comprehensive guide on reading, processing, and visualizing spatial data in Python. By following these techniques, you can effectively work with both GeoTIFF and NetCDF data formats offline, enabling sophisticated geospatial analysis even without internet connectivity.







