This is an intro on the challenges and options to overcome while developing applications involving digital communications, particularly real-time.
There are many streaming communication protocol options to choose from, including regular socket communications, WebSockets, MQTT and a Vert.x event bus. I’ll focus on dedicated connections and leave request/response paradigms such as REST and gRPC out of this discussion. Note that RESTful APIs have been evolving to support streaming capabilities that in some ways overlap the use cases, such as StreamingResponseBody, WebFlux or using Mutiny’s Multi with Resteasy Reactive.
Recovering from a Disconnect
Because we’re talking about dedicated connections such as WebSockets, you’ll likely want some way for your clients to retry connecting in the case of loss of network continuity, server downtime or anything else temporarily impacting reachability.
In the early days of WebSockets we had to handle this ourselves. You can simply detect a premature disconnect and have a retry process in your connectivity, which you can begin when you detect you have been unexpectedly disconnected. However, today, there are libraries that help with this such as socket.io.
You’ll want to be sure you can broadcast these events to the UI. In a trading system I created, I show a connected icon when connected so you always know you have a valid connection. I also like to show the timestamp out of the last received message.
How you handle recovery once you have a valid connection re-established can vary on use case. If you only care about the latest state, you’ll simple want to be sure you receive the latest state upon re-connect. For this reason, if you re-subscribe to a feed such as live stock quotes, you’ll want to always get the current stock quote once the subscription begins even if the price hasn’t changed.
While much of this is handled at the application level and not the third-party protocol level, libraries such as socket.io can assist with conventions such as a mechanism for letting the client request all data since the last message ID it received or handling some server-side message backlog (buffering) for you.
If your data is being persisted, you can let the client query any data they missed upon reconnection to fill any gaps they need so long as they can provide parameters that describe the point where they last received a message to the point of the first message after reconnect.
If your server-side builds backlog of messages when the client is consuming them slower than your server is producing them, you’ll need a way to tackle it.
Note that all the libraries vary on how they support this, so you’ll want to understand your backlog requirements ideally before picking one. In many cases, you’ll depend on your own server-side logic for handling it.
For instance, in a video pipeline for one application, I implemented an H264 frame dropper that detects lag and adjusts the rate of dropping as well as the priority for types of video frames to drop based on how far behind it is. Our goal here is live video, and if we didn’t drop once we were 5 minutes behind we’d always be 5 minutes behind. Dropping in a way that provides the best user experience can be complicated, and no third-party library could do this for us. Dropping on the server side helps us better mitigate lag due to network contention, such as when the user is on a poor wi-fi connection.
Unlimited backlog can also be a problem as you’ll run out of memory on the server, especially with something like a video pipeline for high FPS high resolution cameras, or if you have thousands of clients subscribing to a feed, and have to handle their backlog individually.
Some libraries, such as RSocket, offer flow control which can help you make better decisions on the server end as well as help support topologies such as load balancing.
To ACK or not to ACK
This can help provide message delivery guarantees. However, it comes at a cost. If your messages are ordered (a pipeline), then you can introduce latency that can become significant if, for instance, you are catching up on backlog.
There are also cases where you need it, just not at the communications level. For instance, in trading, the client needs acknowledgement when it places an order. What it really needs to know though, isn’t that its order was received by the server endpoint, but by the order fulfillment system. You need to know that the order is in process and has certain transactional guarantees. For this reason, you’ll prefer an order confirmation message over a protocol ACK. Until then, you’re client will consider it “pending receipt”.
Socket.io provides ACK via a callback option and timeout. Vert.x also offers a reply option. Though, in practice, I don’t use it for pipelines. I do use it for request/response semantics, often to report success or failure of an operation driven by the message, and sometimes a rich response as a result of a query.
Text or binary
There are use cases where binary can increase performance or throughput of network connections. It can also help scale where network bandwidth is a primary constraint. It can decrease egress costs in the cloud.
However, there are many cases where text does the job just fine. We were streaming video via binary but then had to change very quickly to a different connector due to limitations on certain network topologies and limited availability of fast time-to-market options. This meant using Base64 in JSON, increasing our bandwidth consumption about 30%. To our surprise, it handled our load well, even with many clients streaming 4K. Of course, we had an H264 dropper for cases where clients had really constrained bandwidth. Yet, this worked for our clients. In testing, I had no problem streaming real-time high resolution video over VPN and wi-fi.
The lesson here is if you need binary, you need it; if you don’t, you don’t. It only provides benefits if you need it. Text can be simpler to develop if you don’t need binary.
The Future of Communications
We’ve come very far in capabilities in the past 10 years. For an individual client, it’s hard to imagine a use case where the benefits of low-latency high bandwidth real-time communication is more apparent than a user viewing real-time video and then using PTZ to move the camera in another part of the world, where they can watch the camera respond to their commands; and today’s technology can do just that.
While the Internet is not up-to-par on with a 100 Gigabit LAN connection to a commodities exchange, it is good enough and fast enough for retail traders to profit, security personnel to protect high value assets, and for people to navigate their cars around traffic congestion.
Thus, I expect continuous improvement in lowering network costs in cloud environments, improved scalability to handle thousands if not millions of clients, and an increase in IoT data collection.
For this reason, I’ll end by introducing the latest kid on the block, MQTT. This simplified technology has become the de facto way to collect IoT data from remote devices. Like WebSockets, the clients begin the conversation. It can provide an ACK if needed, can support authorization, and it can broadcast when a client connects or disconnects. Likewise, MQTT clients are easy to add to IoT devices, such as Arduino. It is also very easy to scale on the server side with Kubernetes. Its pub/sub model makes it easy to route messages and it integrates well with other communication technologies such as Vert.x.
In a real-time security monitoring application I added MQTT to our solution so that cameras could transmit JSON analytics (ONVIF Profile-M) when they recognize a person or vehicle, and so IoT devices could transmit their signals.