THSF time series

How to read/process/plot time series data using Python

Time series data is essential for tracking changes over time—whether it’s climate measurements, hydrological trends, or environmental indicators. In geospatial workflows, we often work with time series from different sources such as Google Earth Engine (GEE) or Excel spreadsheets.

This guide walks you through two practical workflows to handle time series data using Python:

  1. From Google Earth Engine to Python

  2. From Excel-based datasets (e.g., basin or district-level stats)

Each tab provides step-by-step instructions for reading the data, converting date fields, analysing statistics, and visualising over time.

Read Time Series Data from GEE into Python

Google Earth Engine (GEE) is a powerful platform for analysing geospatial data. It allows users to store, process, and visualise satellite imagery and other geospatial datasets. Sometimes, you may export time series data (e.g., climate, land use, temperature) from GEE as a FeatureCollection (table format). This guide walks you through how to read and analyze that time series table data in Python using earthengine-api and pandas.

Whether you’re tracking surface temperature over time or rainfall across basins, this guide gives you the tools to process and visualize it with ease.


 1. Install Required Libraries

Make sure you have the necessary packages installed:

pip install earthengine-api pandas matplotlib

Optional: If using Jupyter, also install ipython, notebook, or jupyterlab.


 2. Authenticate and Initialize Earth Engine

Before using Earth Engine in Python, you must authenticate your Google account:

import ee

# First-time authentication
ee.Authenticate()

# Initialize the Earth Engine API
ee.Initialize()

 3. Load Your GEE Table Asset (Time Series Data)

You’ll need the path to your GEE table asset. It might look like this:

projects/ee-<username>/assets/<asset_name>

Example:

asset_path = "projects/ee-udomporn/assets/AvgSurfT_basin"
basin_fc = ee.FeatureCollection(asset_path)

 4. Convert to pandas DataFrame

Once the asset is loaded, extract the features and convert them to a DataFrame:

import pandas as pd

features = basin_fc.getInfo()['features']
records = [f['properties'] for f in features]

basin_df = pd.DataFrame(records)

 5. Parse Dates and Set Time Index

If your table includes 'Year', 'Month', and 'Day' columns, combine them into a datetime column:

# Convert to datetime and set index
basin_df['Date'] = pd.to_datetime(basin_df[['Year', 'Month', 'Day']])
basin_df.set_index('Date', inplace=True)

# Optional: remove separate date parts
basin_df.drop(columns=['Day', 'Month', 'Year'], inplace=True, errors='ignore')

 6. Explore Statistics

Get basic stats like mean, min, max, and standard deviation for each region:

# Select only numeric columns
numeric_data = basin_df.select_dtypes(include='number')

# Basic stats
stats = pd.DataFrame({
    'Mean (K)': numeric_data.mean(),
    'Min (K)': numeric_data.min(),
    'Max (K)': numeric_data.max(),
    'Std Dev (K)': numeric_data.std()
}).round(2)

print(stats)

Optional: visualise the average by basin:

stats['Mean (K)'].sort_values().plot(kind='barh', figsize=(10, 6), color='skyblue')
plt.title('Average Surface Temperature by Basin')
plt.xlabel('Temperature (K)')
plt.grid(True)
plt.tight_layout()
plt.show()

 7. Visualise Time Series (Example)

Here’s a basic time series plot for one or more basins:

▶️ Plot a single basin:

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 4))
plt.plot(basin_df.index, basin_df['Chi'], label='Chi Basin', color='tab:blue')
plt.title('Surface Temperature - Chi Basin')
plt.xlabel('Date')
plt.ylabel('AvgSurfT (K)')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()



▶️ Subplot multiple basins:

basins = ['Chi', 'Mun', 'Ping']
fig, axes = plt.subplots(1, 3, figsize=(18, 4), constrained_layout=True)

for i, basin in enumerate(basins):
    axes[i].plot(basin_df.index, basin_df[basin], label=basin)
    axes[i].set_title(basin)
    axes[i].set_xlabel('Date')
    axes[i].set_ylabel('AvgSurfT (K)')
    axes[i].grid(True)

plt.suptitle('AvgSurfT - Selected Basins')
plt.show()



8. Saving Processed Data

# Save the processed data to a new Excel file
output_path = "processed_basin_data.xlsx"
basin_df.to_excel(output_path)
print(f"Processed data saved to {output_path}")

# Save plots as image files
fig.savefig("basin_plot.png", dpi=300, bbox_inches='tight')
print("Plot saved as image file")

 9. Conclusion

You’ve now learned how to:

  • Access time series data stored in Google Earth Engine
  • Convert it into a pandas DataFrame
  • Parse datetime fields and explore key statistics
  • Create simple and multi-panel time series visualisations

Processing and Visualising Time Series Data from Excel

This guide will walk you through the process of reading, processing, and visualising time series data from Excel files using Python. We’ll cover installation of necessary libraries, data loading, preprocessing, statistical analysis, and creating various visualizations.

Prerequisites — Download Excel file from our platform:




Then will get .tar.gz zip file next extract into your local drive.

1. Installation of Required Libraries

First, install the necessary Python libraries:

pip install pandas matplotlib numpy scipy seaborn openpyxl
  • pandas: For data manipulation and analysis
  • matplotlib: For creating visualizations
  • numpy: For numerical operations
  • scipy: For statistical analysis
  • seaborn: For enhanced visualizations
  • openpyxl: Backend for pandas to work with Excel files

2. Loading Data from Excel

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy import stats

# Set the path to your Excel file
basin_excel_path = r"C:\data\AvgSurfT\AvgSurfT_basin.xlsx"

# Preview the data
basin_df = pd.read_excel(basin_excel_path)
print(basin_df.head())  # Display the first 5 rows
print(basin_df.columns)  # Display column names

# Re-read the Excel file, skipping the first row (for Thai column headers)
basin_df = pd.read_excel(basin_excel_path, skiprows=1)
print(basin_df.columns)  # Display column names after skipping first row

3. Data Preprocessing

# Convert date-related columns to integers
basin_df['Year'] = basin_df['Year'].astype(int)
basin_df['Month'] = basin_df['Month'].astype(int)
basin_df['Day'] = basin_df['Day'].astype(int)

# Convert to datetime and set as index
basin_df['Date'] = pd.to_datetime(basin_df[['Year', 'Month', 'Day']])
basin_df.set_index('Date', inplace=True)

# Drop original date columns if desired
basin_df.drop(columns=['Year', 'Month', 'Day'], inplace=True)

# Check the processed data
print(basin_df.info())
print(basin_df.describe())

4. Exploratory Data Analysis

# Check for missing values
print("\nMissing values in each column:")
print(basin_df.isna().sum())

# Basic statistics for each basin
print("\nBasic statistics:")
print(basin_df.describe())

# Check the temporal coverage
print("\nData spans from:", basin_df.index.min(), "to", basin_df.index.max())

# Check for outliers using Z-score
z_scores = stats.zscore(basin_df)
potential_outliers = (abs(z_scores) > 3).any(axis=1)
print("\nNumber of potential outliers:", potential_outliers.sum())

# Monthly and seasonal patterns
basin_df['Month'] = basin_df.index.month
basin_df['Year'] = basin_df.index.year

# Calculate monthly averages
monthly_avg = basin_df.groupby('Month').mean()
print("\nMonthly averages:")
print(monthly_avg)

# Remove the added columns if you don't need them anymore
basin_df.drop(columns=['Month', 'Year'], inplace=True, errors='ignore')

5. Basic Visualisations

Single Basin Visualisation

plt.figure(figsize=(10, 4))
plt.plot(basin_df.index, basin_df['Chi'], color='tab:blue')
plt.title('AvgSurfT - Chi Basin')
plt.xlabel('Date')
plt.ylabel('AvgSurfT (K)')
plt.grid(True)
plt.tight_layout()
plt.show()



Multiple Basins Comparison

basins_to_plot = ['Chi', 'Mun', 'Ping']
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(18, 4), sharey=True)

for i, basin in enumerate(basins_to_plot):
    axes[i].plot(basin_df.index, basin_df[basin], label=basin, color=f'tab:{["blue", "orange", "green"][i]}')
    axes[i].set_title(f'{basin} Basin')
    axes[i].set_xlabel('Date')
    axes[i].grid(True)

axes[0].set_ylabel('AvgSurfT (K)')
plt.suptitle('AvgSurfT - Selected Basins', fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()




6. Saving Processed Data

# Save the processed data to a new Excel file
output_path = "processed_basin_data.xlsx"
basin_df.to_excel(output_path)
print(f"Processed data saved to {output_path}")

# Save plots as image files
fig.savefig("basin_plot.png", dpi=300, bbox_inches='tight')
print("Plot saved as image file")

By following this comprehensive guide, you can effectively process, analyse, and visualise time series data from Excel files using Python. The techniques demonstrated can be applied to various types of time series data, not just surface temperature measurements.