Containerising AEMO’s PDR: Running 2000s Java in a 2025 Cloud Stack

Moby Dock(er)

Written by Sebastian Machuca, Head of Engineering at Hachiko

I. Introduction – When FTP Met Docker

Australia’s energy market is one of the most open and sophisticated in the world. The architecture AEMO has designed is pragmatic, resilient, and leans heavily on proven infrastructure: CSV files delivered over FTP, gated by VPNs, and updated on a regular schedule. It is simple, effective, and has stood the test of time.

To help participants work with this data, AEMO provides a set of Java applications that monitor the FTP server, download new files, and load them into a relational database. It is not flashy, but it is reliable, and has been operating successfully for more than twenty years.

More importantly, the data itself is wide. AEMO publishes deep, granular information across the entire market landscape: participant identifiers, bids, demand forecasts, dispatch outcomes, P5MIN prices, regional emissions, generation mix. Much of it updates every five minutes. For energy-aware software teams, this level of operational transparency is rare. Most markets do not even try.

This openness is a quiet triumph. And it is worth acknowledging.

Now, if you are a software engineer in a start-up or a modern cloud-native team, and you are trying to integrate with AEMO’s price feed to build against the Australian spot market, there is a moment that hits you early. If you have never dealt with early 2000s Java tooling, it usually lands somewhere between receiving your welcome email and Googling how to run the PDR Loader, when you realise: “This was not built for me…”

And you would be right.

The tooling assumes a world that predates Docker. There is no SDK, no published container images, no example docker-compose.yaml. The standard approach is to pull down binaries from an FTP server, run them on a virtual machine, and manage configuration and scheduling manually.

This is not a complaint. It is a design. AEMO delivers data, not infrastructure. The tools expect you to adapt your environment to their release process. That decision has scaled remarkably well across a complex and diverse market. But for modern teams used to declarative config, ephemeral runners, and reproducible builds, it leaves a gap.

This post is meant to bridge that gap. Thank us later.

II. The PDR Model – Clever, Clunky, and Kind of Beautiful

A. What works:

The Participant Data Replication (PDR) model is one of those things that makes more sense the longer you stare at it. It doesn’t resemble modern web integrations, and it’s definitely not what most engineers expect when thinking about real-time data delivery. But once you understand its constraints and goals, there’s a certain beauty in how deliberately minimal it is.

At its core, the architecture does something clever: it separates the act of publishing from the act of processing. AEMO operates the shared file infrastructure and handles the preparation and publication of structured data. Each participant is given access to this data via a managed file server, based on their subscription. From that point forward, it is entirely up to each organisation to decide how, when, and where to ingest and work with it.

The choice of CSV over FTP might seem outdated, but it turns out to be quite effective. These protocols are simple, widely supported, and operationally stable. You don’t need to work with message brokers, schemas, or real-time APIs. You need a VPN, an FTP client, and a cron job.

It’s a system designed for distribution, not orchestration. There is no webhook to configure, no event stream to manage, no API throttling to consider. AEMO’s responsibility is to publish the data. Your responsibility is to collect it, parse it, and do something useful with it.

By avoiding real-time delivery or hosted compute, AEMO has sidestepped the operational and support burden of large-scale infrastructure. There is no central pipeline to maintain on your behalf. Just structured files, published on schedule, waiting to be consumed.

This is deliberate. It is minimalism with intent. Or at least, it seems to be.

B. What hurts:

But while the data distribution model is clever, the developer experience around it is clunky.

There is no support for containers or modern workflows. There is no reference Docker image, no documented health checks, no .env template, and no integration path for CI pipelines. The implicit expectation is that you’re running this on a long-lived server, configuring things manually.

The documentation is fragmented. Some of it is delivered as PDFs, some as nested help portals, some as internal-style diagrams that assume familiarity with the old way of doing things, with hyperlinks that lead you nowhere. Even when the right file exists, it’s not always clear how to connect the dots.

The tools reflect a Java-first mindset. If you’re not already comfortable running .jar files, managing Java classpaths, or wrapping batch scripts around JVM options, you’re on your own. Developers coming from Go, Python, or container-native workflows won’t find much to scaffold from.

And finally, the release process is built for humans, not systems. You are expected to manually retrieve updates from the /Release FTP folder, inspect the contents, and upgrade your environment accordingly. There’s no version feed, no semantic versioning, and no automation guidance.

It’s not broken. But it’s also not easy.

C. AEMO’s ethos:

AEMO describes its purpose as:

“Our purpose is to ensure safe, reliable and affordable energy and enable the energy transition for the benefit of all Australians.”

This isn’t a critique of that mission. The system works, and the design reflects decades of real-world operational experience. AEMO’s data infrastructure is stable, transparent, and has enabled an entire industry of energy innovation that is the envy of the world (alongside our weather and crocodiles).

But if enabling the energy transition includes supporting the next wave of software teams — the ones building optimisation tools, automations, and real-time decision systems — then the developer experience matters too.

Modern engineering teams expect reproducibility, automation, and composability. They want systems they can test, containerise, and integrate. These aren’t luxuries. They’re foundations. And if AEMO’s goal is to empower stakeholders to innovate on top of the energy system, it’s time to meet engineers where they are.

III. What You’re Actually Dealing With – PDR’s Architecture in Plain Terms

A. The Three Tools:

  • PDR Batcher: Brings market data files from AEMO’s FTP server to your local storage (EFS, S3, etc.). It authenticates, scans directories, and downloads zip archives on schedule.

  • PDR Loader: Loads the downloaded files from local storage into your relational database, parsing formats based on the MMS Data Model. The Loader does not fetch files — it expects them to be present locally.

  • PDR Monitor: Oversees both the Batcher and Loader. It provides health status visibility and centralises configuration for the Loader.

B. The file infrastructure:

  • /Releases: This is where AEMO publishes new versions of PDR tools. It’s hosted on their FTP server. Importantly, only the pre-production server exposes these packages — the IP listed in their docs points there.

  • MMS Data Model: Found at ftp://146.178.211.25/Releases/MMS%20Data%20Model, this is required before the Loader can run properly. It defines the relational schema expected by the incoming market files.

  • Versioning: Releases are bundled by versioned folders. The structure is consistent, but it’s not self-describing. You’ll often see something like:

Each folder contains a zip archive, a changelog, and assorted JARs and config templates.

C. How it’s “meant” to be run (If You’re Set Up Like It’s 2003):

Each tool ships as a standalone Java application. You’re expected to download a zip file from an FTP server, unzip it onto a dedicated machine, and edit .properties files directly. The versioning lives in the folder name. Upgrades mean pointing your batch script at a new directory.

To their credit, the tools do support some modern conveniences. You can inject environment variables into the configs, and there’s even optional support for HashiCorp Vault. But everything else about the setup feels like a throwback.

It’s the kind of system that assumes one server per app, lovingly maintained by hand. No containers. No orchestration. No CI/CD. Just shell scripts in rc.local, or, if someone on the team had a lapse in judgment, a Windows service with a GUI and a prayer. If something crashes, it stays crashed until a human notices and manually kicks it back to life.

The software handles retries, logging, and persistence internally. But you are still responsible for provisioning, lifecycle management, and ensuring everything recovers when it doesn’t. It’s not infrastructure-as-code. It’s infrastructure held together by hope, habit, and half-remembered bash scripts.

If your world runs on ephemeral compute, reproducible builds, and declarative deployments, integrating this will feel like assembling IKEA furniture with ancient Roman tools. It can be done. But not without some headaches and pulling hair.

How you look after integrating PDR

IV. From Zip Files to Containers – Running AEMO Tools in 2025

Containerising the PDR suite isn’t hard — it just means retrofitting best practices onto something that predates them.

You don’t need to touch the JARs. You don’t need to rebuild anything. What you do need is to bring modern discipline to a legacy layout: replace hardcoded config values with ${env:VAR} placeholders, wire in secrets through environment variables, and clean up the shell scripts that assume they’re running on Windows XP.

It’s safe (and encouraged) to delete all the .bat files and Windows-only extras. You’ll also want to normalize line endings — many of the files ship with Windows CRLF, which will break your build or runtime if left uncorrected. A quick pass of dos2unix over all .properties, .sql, and .sh files usually does the trick:

From there, it’s just a matter of using a sensible base image, setting up a clean entrypoint, and ensuring logs land where your platform expects them.

Common Setup Across All Components

All three PDR applications follow the same shape: you extract a ZIP, point a shell script at some properties files, and it spins up a long-running Java process. Dockerising that pattern just means cleaning up what’s there and leaning on environment injection instead of hand-edits.

Start with a sane base image like eclipse-temurin or amazoncorretto. You don’t need Maven, Tomcat, or anything bloated — just a JDK and a pulse.

Copy in the JARs and the relevant *.properties and log4j.xml files. If you’re being responsible, you’ll patch those configs to use ${env:XXX} instead of hardcoded values (see example). No need to mount them; they can ship inside the image.

AEMO refactored their Java packages but didn’t update the default config. If you’re using PostgreSQL, you must fix this manually:

Otherwise you’ll get a mysterious class-not-found error.

Skip the .bat scripts and anything that smells like Windows. In fact, if you haven’t yet, run dos2unix across the board: shell scripts, SQL migrations, and config files. They ship with carriage returns from the XP era.

For entrypoints, stick to the provided pdr*.sh scripts — just make sure they use /bin/sh, not /bin/bash, if you’re on amazoncorretto.

The apps log to files by default. You can fix that with a simple log4j.xml tweak to emit JSON or console output. Structured logs work better with modern observability stacks anyway.

Here’s how we enabled JSON output:

You can toggle formats using an environment variable like LOG_FORMAT=json-console, and set levels with LOG_LEVEL=INFO, DEBUG, etc. It’s clean, structured, and plays nicely with log shippers.

Important: JSON logging requires Jackson. Make sure you include the following JARs in your Lib/ folder:

  • jackson-core-2.17.0.jar

  • jackson-databind-2.17.0.jar

  • jackson-annotations-2.17.0.jar

Finally, make sure you carve out mount points for the input/output folders. These aren’t just temporary dirs — they’re how batcher hands off to loader, and how loader queues up re-requests. Use EFS, S3, or something persistent.

Containerising PDR Batcher

Batcher connects to the AEMO FTP and writes ZIPs to disk. Expect to inject:

  • BATCHER_REMOTE_HOST, BATCHER_REMOTE_USERNAME, BATCHER_REMOTE_PASSWORD

  • BATCHER_DATA_DIR — internal holding area

  • SHARED_DATA_DIR — shared staging folder with loader

  • INSTANCE_IDENTIFIER

Volume mounts must allow bidirectional access between batcher and loader:

Containerising PDR Loader

Loader reads from a shared mount (e.g., /mnt/Reports) and writes to your DB. It requires a working MMS config and a PDR data model preloaded. You’ll typically inject:

  • DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD

  • INSTANCE_IDENTIFIER — used for distinguishing logs/events

  • SHARED_DATA_DIR — used for ZIP input, trickle reports, and re-requests

  • LOG_FORMAT, LOG_LEVEL — useful for structured output

Mount layout should align with what batcher writes:

The loader should be set as the entrypoint using pdrLoader.sh. Log to stdout in JSON, and inject config via ${env:...} syntax.

Database Migrations for pdr-loader

Unlike pdr-batcher — which doesn’t touch a database — pdr-loader requires a working schema before it can ingest anything. Two upstream sources define this schema:

  1. PDR DDL, shipped alongside the loader in the Lib/ folder.

  2. MMS Data Model, provided separately under /Releases/MMS Data Model/ on the AEMO FTP server.

Each is delivered as raw .sql — no versioning, no annotations, and no tooling guidance.

We track each file explicitly in Git using timestamped Atlas-style migrations:

We apply these in CI/CD using Atlas, with no manual steps. Here’s how that looks in docker-compose.yaml:

This ensures:

  • The database is created and healthy.

  • Migrations are applied once, cleanly.

  • Loader only starts after the schema is ready.

By treating DDL as first-class, versioned, testable infrastructure, we avoid surprises and ship with confidence,  even when the upstream inputs are informal and un-labelled.

Environment-Specific Configuration and MMS Ingestion

One subtle but critical aspect of running PDR Loader correctly is setting the INSTANCE_DATA_SOURCES variable. This setting controls which data sources the instance will accept, and it must match your target AEMO environment. The shipped .properties file includes this reference:

We inject this as an environment variable:

You’ll need to explicitly set INSTANCE_DATA_SOURCES to the correct comma-separated list depending on which dataset (Production or PreProduction) you’re targeting. Without this, Loader will ignore relevant ZIP files even if everything else is configured correctly.

Loading PDR configuration

Once the container is running, you still need to seed it with MMS configuration. AEMO publishes two folders under /Releases/MMS Data Model: one for Production and one for PreProduction. Each contains its own create and upgrade directory, along with a versioned PDR config ZIP file like:

After the loader is fully up and running, you must place this ZIP file inside the Reports/ directory (or the folder pointed to by your SHARED_DATA_DIR environment variable). The loader will detect the file, unzip it, and apply its contents.

This step is critical: it populates metadata tables such as PDR_REPORT_TYPE_CONFIG, which are required for the loader to recognise and process incoming ZIP files. Without it, ingestion may silently fail or no records will appear in the database, even if the schema is correct.

For development and testing, we recommend treating this ZIP as a fixture and checking it into your Git repo alongside the DDLs. That way, your CI pipeline can apply schema changes and load the config in the same flow, ensuring everything is validated before reaching production.

Containerising PDR Monitor

Monitor is the least operationally critical but often the most awkward. It watches the loader and batcher, exposes a local web UI, and emits status information via logs — but not via any structured API.

The config format also deviates: unlike the .properties files used by loader and batcher, Monitor ships an XML-based config file (Config.xml). Fortunately, it still supports ${env:...} syntax for environment injection — so you can manage secrets and runtime values cleanly without resorting to custom tooling.

Here’s an excerpt of how database configuration looks:

You can inject other paths, ports, and options the same way. For example, the web UI root directory can be passed via PROJECT_ROOT, e.g.:

Just like the other apps:

  • Stick with the provided pdrMonitor.sh script as your container entrypoint.

  • Use /bin/sh, not /bin/bash, to avoid compatibility issues with amazoncorretto images.

  • Redirect logs to stdout and enable structured logging using log4j.xml, same as the other containers.

  • Mount any required folders (e.g., for reports or requests), and use a shared Docker/ECS network to allow Monitor to reach the other two services.

Heads-up: despite supporting secrets via vault integration (e.g., AWS Secrets Manager), we recommend handling secrets via orchestrator-level injection (e.g., env vars or mounted secrets), rather than wiring vault logic directly into the app.

V. Lessons Learned – Painful Paths and Cleaner Solutions

1. VPNs, Not Tokens

FTP access requires a site-to-site VPN using IKEv2 and IPsec:  no bearer tokens, no password auth, no modern service credentials. Setup involves BGP configuration, shared secrets, and a change window with AEMO to align on encryption parameters. This isn’t unusual in regulated sectors, but it’s worth planning for.

2. You’ll Need to Bring Reproducibility Yourself

The tools were not designed for ephemeral hosts or cloud-native environments, and that is expected. What mattered was deciding how to standardise configuration, inject secrets, and maintain portability without modifying upstream artifacts. Once that scaffolding was in place, our environment became testable and consistent.

3. CI/CD Enables Change Confidence

We still download new PDR releases manually from AEMO’s FTP. But by versioning configs and DDLs in Git, validating schema changes through CI pipelines, and rebuilding containers with each release bump, we can absorb upstream changes with confidence,  even without full automation (yet).

VI. Why AEMO Should Open-Source PDR – The Fastest Way to Unlock Value

The PDR suite is distributed to market participants (and non-participant participants) but not openly developed. That is a missed opportunity,  not just for market participants, but for AEMO itself. As a non-profit entity serving the broader energy ecosystem, enabling community contributions could reduce duplicated effort, accelerate delivery improvements, and strengthen the overall reliability of the tools we all depend on.

Today, each participant unpacks the same ZIP files, patches the same shell scripts, wrestles with the same brittle defaults, and builds their own container images in isolation. That effort is duplicated across the industry. It doesn’t need to be.

Open-sourcing the PDR tools, or even just opening their build process, would enable the community to contribute improvements that directly benefit AEMO’s operational goals: higher adoption, faster support resolution, and greater confidence in the data pipeline. Instead of fielding fragmented support queries, AEMO could point to a shared, version-controlled repository with tagged releases and clear deployment patterns.

Even a modest step — publishing official Dockerfiles and property templates — would save hours per participant and weeks of collective effort across the ecosystem. Those who want to run PDR in cloud-native environments (like we do) would not need to “reverse-engineer” batch scripts designed for Windows. We would just use the container AEMO publishes, inject our own config, and go.

This is not about intellectual property. The logic in PDR is not proprietary. It is a delivery mechanism for publicly governed data. And the existing workaround for many non-participants is to download CSVs and write fragile parsers. That is not secure, not reliable, and not what an energy market should settle for.

AEMO has already done the hard work of implementing the business logic. By letting the community help with the packaging, deployment, and testability, they would not be giving up control. They would be upgrading their delivery.

The fastest way to improve the PDR experience is to stop solving the same problem in private and start solving it together.

VII. Closing – Bridging a 20-Year Gap With a Container

PDR is not broken. It’s just from another era. And that’s fine — as long as you know how to bring it forward.

What we’ve shown here is that you don’t need to rewrite the tools to make them work in a modern stack. With careful defaults, reproducible config, and containerised deployment, you can integrate AEMO’s legacy apps into today’s workflows without compromising reliability.

But the real opportunity is bigger than just one team’s solution. The more PDR becomes a shared, composable, testable foundation, the easier it gets for others to build on top of it — whether that’s loading dispatch prices into a lakehouse, running optimisation models in real time, or simply validating config before prod.

The market already benefits from open data. It’s time the tooling caught up.

Let’s modernise this together.

Next
Next

Extra large demand charge - network tariffs 101