CoE Development Update: July 2023

Greetings, Elyrians!

After four and a half months of scaffolding and building the necessary prerequisites, I've finally rebuilt the backend server for Chronicles of Elyria and Kingdoms of Elyria. Welcome to the exciting July edition of the monthly CoE Development Update!

Soulbound Studios' Mission Statement is to:

"Enrich peoples' lives with experiences that fuel imagination and provide accessible, engaging, and meaningful interaction through entertainment and technology."

The core of Soulbound Studios' games lies in multiplayer evolving online worlds. But you can only have online worlds with a server platform to host the world. So earlier this year, I began working on improvements to the Server-Side Game Engine and the Server Platform to prepare the world for being hosted as part of a distributed system. That is, to make CoE and KoE capable of being called MMOGs.

In the March Development Update, I started laying out a plan that included building a spatial partitioning system for the Soulborn Engine, as well as the implementation of a service host and networking layer. After several months, the game server is once again up and running!

To learn more about what work has been done, what the implications and possibilities are, and to learn more about what's next, either follow the links below or keep reading!

  1. Progress Update
  2. The Distributed ECS Actor System in Action!
  3. A Different Perspective
  4. Implications and Possibilities
  5. What's Up Next?

Progress Update

As usual, let's look at the CoE Scope Document to understand what work has been done over the last month.

Scope Document

As you can see from the screenshot, a lot has been worked on this last month! In particular, the following systems have been highlighted as things that have been completed:

  • Repository Sychronization
  • Client Messaging Library
  • Communication Protocol (Message Classes)
  • Service Host
  • Transport Protocol
  • Edge / Portal Service
  • Health Monitoring Service

Now, while I've marked those all as complete, the reality is that they still need to be made production-ready. There are still bugs that need fixing in many of those, as well as stress testing and a security audit.

Even still, that's a ton of work completed. And as you'll see from the media below, it's mostly functional. More on that later. Since those are all a bunch of technical terms that mean little, let's look at my work in action.

The Distributed ECS Actor System in Action!

This first video shows me adding 40k+ entities to the world. While last month's update showed the framerate for Medium capping out at around 40k entities, with it just barely holding onto the minimum 30 FPS, you can see in this video that Medium is now able to maintain a steady 60 FPS at a much higher entity count.

The ability to go to 100k+ entities without affecting the framerate is because Medium no longer hosts the world "in-process." Medium is now exclusively an admin/debug client that requires a game server to connect to. Since it's no longer hosting the simulation, it has plenty of cycles to do other things.

Because it's so lightweight, I can run multiple instances of Medium on my desktop simultaneously, all connecting to the same game world. Take a look at the following video to see that in action.

In the above video, I started two separate instances of Medium connected to the same game world, each with different camera angles. Then I populated the world with some entities and buildings so you could get a frame of reference. Then I started running the simulation so you could see it affecting both clients in real-time.

If you've been watching the performance of the two previous videos, you've noticed that while the framerate of the clients remains steady, the actual simulation starts to look choppy.

That's because the simulation in those examples was still running in a single process. And, just like in June, the simulation starts dropping frames at around 20-30k entities and then gets slower from there. Why? Because that's 20-30k entities all living within a tight area of just 1km x 1km, all performing physics and collision detection.

To capitalize on the benefits of the Soulborn Engine, you have to run the server distributed across multiple processes. In the following video, I do just that.

In the above video, I start the Health Monitor, Gateway (Edge/Portal), a Facet Server (the ServiceHost that runs DECS), and then launch the client. When I first add entities to the world, you'll notice that they're only added to the top-left quarter of the world. That's because the Facet server was configured only to handle that quadrant.

I next added a second server configured to process the bottom right quadrant. Now when I instruct the server to add entities to the world, you can see them added to both quadrants, being handled by each Facet server appropriately.

Next, I re-open the client to show that after reconnecting to the Gateway, it immediately receives the starting state of the world and proceeds to visualize it. Finally, I ended the third video by closing all the services down.

While the previous video shows what can be done with two servers running on a single host, the goal of any distributed architecture is to be able to subdivide the world as necessary. So in the following video, I launch eight different Facet servers on my desktop machine and then connect the client to the world.

As you can see from the previous video, with eight facet services running on my local machine, I can now run with 160k entities in the world, still maintaining the 30 FPS that just last month capped out at 40k. That's a 300% increase by dividing the world into different processes.

At this point, you're probably thinking what I was. "Sure, that's great that I can run eight instances on my local machine, but what about running it on a cluster of machines?"

Great question. I decided to find out. I set up the following server cluster, each node running on an Intel i7 4790K with 32GB of RAM. That processor has four physical cores each and eight logical cores. So that's a total of 32 physical cores and 64 logical cores.

Server Cluster

After configuring each server with the correct software and frameworks, installing the Soulborn Engine, and configuring them to point to the same Gateway, I captured the following video to see the server's current max capabilities. Sort of. I'll explain what I mean in a bit.

Ok. Now we're getting somewhere! In the above video, I add 200k entities to a world that's just 1km x 1km and comprises eight different Facet servers, each living on separate host machines.

You'll notice in the video that the different facet servers begin dropping packets to the gateway very soon after it starts, and eventually, the entire world gets bogged down.

So at the moment, the maximum demonstrable performance is around 200k entities in a 1km x 1km area. That said, I suspect this results from a bug, misconfigured network buffers on the servers, or the upper limit of what the network can handle (see below for an explanation of why).

Conceptually, I could run a single server process on my host machine with 30k entities without any significant slow-down. When I increased it to eight, I got over 100k.

By spreading the eight different processes out over multiple host machines, I'd expect to get 300k - 800k entities. When I tried subdividing the world into 16 smaller processes (two per host machine), I noticed the problem worsened, and things started to crash sooner.

So again, there's either a bug somewhere, a misconfiguration, or a network bottleneck I'll need to fix before we can get a truly accurate measure of the current performance. I would have dug into those to fix them, but alas, I ran out of time for this update (and then some), so I'm afraid we'll have to accept the numbers as-is until I can explore the issue next week.

A Different Perspective

While the above demonstration shows some technical issues still need resolving, they may be due to some things that aren't obvious from the videos.

First, I wanted to stress the server and network to demonstrate how this would do in a worst-case scenario. As a result, the client is performing no calculations and no client-side prediction. The server sends the updated positions and collision information of every entity in the world... 60 times a second. That's a ton of chatter on the network. With 200k entities in the world, sending 60 times a second, that's over 12 million messages per second, not counting the collision information. That alone would stress most Linux servers.

Under normal gaming conditions, the server would send the client information, such as the position and velocity of the entities, and let the client perform its calculations and predictions, only sending updates or corrections periodically or as needed. So while I've only got a couple hundred thousand entities in the world, the actual load on the network is representative of a MUCH higher volume of entities. Like... 60x as many.

The other thing I've stated before is that virtually every entity in the world is moving simultaneously. You have 200k+ objects moving within a 1km x 1km area, each performing physics and collision detection in very few games. So this, again, is a worst-case scenario.

So in light of those two bits of information, the current test differed from actual gaming conditions. Overall, the performance of the server is phenomenal! The server should run well with just a few changes, even under extreme circumstances.

Implications and Possibilities

With all the above in mind, the platform intended to host CoE and KoE is a functional MMO engine once again. That has a few implications and possibilities.

First, I know it isn't easy to look at Medium and see an MMO. It's like Neo seeing The Matrix as a stream of bits and bytes. There are no animations, no pretty graphics, and no special effects. There's no music and no sound effects.

It's tough to see this as either CoE or KoE.

But for a minute, in your mind's eye, I want you to imagine the terrain with high-resolution textures and advanced materials instead of green wireframes. I want you to imagine that every box moving around the world is an NPC, player character, or animal. Each wears clothing and armor made by highly skilled character artists.

The stationary boxes are trees or other foliage, buildings, boulders, or other structures.

Imagine that there's a procedural skydome that shows the hues of sunset painted against the scattered clouds above.

I want you to imagine that instead of the Facet servers updating the world's positions, bounding volumes, and collision data, the servers are also spending each update cycle calculating the age, physical condition, health, stamina, and other attributes of every character in the world.

The servers keep track of the nutrients in the ground, the growth stages of all the flora, and the current temperature and wind speeds over each parcel of land.

Some facets track the age of food, carcasses, and other organic matter to see how spoiled they've become.

And much, much more.

The power of MMOGs like Elyria is that they exist even when you're not there - when you can't see it. The simulation keeps moving forward. The things you did continue to impact the world, and, just as importantly, the world is continuing to evolve in your absence.

But a world like Elyria can only exist once it runs as a separate entity from the client, capable of performing all the above calculations in real-time in a fully distributed environment.

And now we're there. In many ways, this is the tipping point for the rapid advancement of CoE as an MMO.

This section was titled "Implications and Possibilities," not just because I wanted you to imagine what CoE will be like, but also because this particular juncture in development opens up so many possibilities.

To name a few, I could...

  • Implement my original plan of launching ElyriaChat so people can interact and engage as their characters with other CoE community members
  • Spend time getting the VoxElyria or Prelyria client to talk to the new server so we can launch ElyriaMud
  • Spend time getting the UE4 CoE client working against the new server so we can resume the march towards Alpha testing CoE
  • Get the website up and running with the new backend so we can reopen the Official CoE Discord
  • Convert KoE's alpha test into an online experience
  • Implement the My Akashic Records client so people can see their holdings and other purchases in a 3D environment

And those are just the immediately apparent choices. That said, I'm still just a single individual until KoE: Settlements launches and the studio starts bringing in more revenue. So, for the time being, my focus will have to primarily remain on developing and shipping KoE: Settlements.

But on the plus side, I'll soon be able to turn my attention entirely toward game mechanics, and that's when the real fun begins. That's when we really start delivering on our mission.

What's Up Next?

Well, that wraps up this month's CoE Development Update. Looking forward, July was short due to the US holiday and my family's visitors. August will be an even shorter month due to this update being late. Only about two weeks remain until the next regularly scheduled CoE Update, and there's also a Developer Journal next week.

Given that, the scope of next month's update will be short. Looking at the outstanding work in the scope document, there's still a few more items in the server-side game engine to work on, most importantly storage & serialization, so we can take down and bring the server back up in a consistent state.

There's also more Server Platform work to do. While I'm using the Gateway as a single point of connectivity, I still need a proper pub/sub broker so collision detection can be handled appropriately along the borders of the different servers.

All that said, the next two weeks will likely be spent finishing up the work neccessary to dive into AI and gameplay mechanics beginning in September.

Until next month!

Pledged to the Continued Development of the Soulborn Engine and the Chronicles of Elyria,