It turns out there are a lot of subleties when dealing with UDP, even before multicast is mixed in. We'll abandon the comparisons to netcat, as we've exceeded what netcat can do. But first a quick reminder of one way socat does UDP.
socat as a UDP server on port 11111.
socat STDIO UDP-LISTEN:11111
and then as a UDP client.
socat - UDP:localhost:11111
Recall from the previous article that socat's command-line structure requires two addresses. The first command is the server because it connects its standard I/O to the UDP-LISTEN: (UDP-L for short) address. So this is a UDP server listening on port 11111. The second one connects it's standard I/O ("-" is a synonym for "STDIO") to UDP, connecting out to port 11111 of localhost. This is the client. Both read from the standard input, send the data over the network, and both print to standard input what the recieve from the network.
UDP Connection Behaviours
Most textbooks make a big deal about UDP being connectionless, but I think this tends to make people give up on UDP prematurely. Notionally, there is a limited concept of a connection between the above pair of commands. That is, there's a unique pair of address/port tuples that unambiguously defines whether a UDP packet belongs to this connection. If it has the address and port of the server and the address and port of the client, then it's part of this connection.
Behaviour: Client-Server, single-plexed
To see this, start a pair of socat processes as described above, one using UDP: (client) and the other UDP-LISTEN: (server), and have the client send data. This effectively starts a connection (although a weak one). At first, when you start both the client and the server, the server cannot send any data to the client, because it doesn't know how to talk to the client. The client must send some data so the server learns about it. More significant, If you kill the client, restart it, then try to send data again (from either client or server), it may return a permission denied error, but regardless the new data won't be received on the other end. This is because the server determines that a connection is established based on the source and destination IP addresses and ports. When you restart the client, it chooses a new source port, so when it sends new data, the server doesn't recognize it as part of the old connection. Because UDP has no connection semantics, the server has no idea the original client was killed, but it still rejects new connections. If you add the fork option to the server, new connections will be accepted, but old connections will hang around indefinitely, and you can't predict which client will receive the data sent from the server. In a script, you'll often be better off to use a loop structure around calls to socat (omitting the fork option), if you do have to handle multiple connections.
It's true that there's no way to test if a UDP socket is "connected". So UDP definitely makes no service level guarantees such as ordering, guaranteed delivery, or acknowledgment of existence, but these are qualities of a connection, not part of the definition. Meh, semantics.
(By the way, the reason you may get a permission denied message is because most IP stacks will send ICMP Port Unreachable packets in response to incoming UDP packets that aren't delivered to some receiving application. When the sending system receives the ICMP packets, it tells the sending process that permission is denied. However, such packets are often dropped by firewalls, and may not be required in the IP stack implementation, which is why you might not get the permission denied message.)
Behaviour: client to client
Oddly, if you kill and restart the server, the client has no problem sending data on the new connection. So the client (the UDP: address) has an even looser concept of a connection. This makes sense. A client sending it's 500th packet does little different than when it sent the first one. (There's no set-up protocol.) So you could do this: on one system "foo", run "socat STDIO UDP:bar:11111,sourceport=11111" and on system "bar", run "socat STDIO UDP:foo:11111,sourceport=11111". By causing both "clients" to bind on a specific source port, they can act as peers and talk to each other. Either process can be killed and restarted as many times as you want, and they will always resume their conversation.
SysCall Reference: connect() & bind()
(Also, for my own reference, the UDP: socat address type creates the socket handle, the waits. When data is available to send, it calls bind() on the socket only if the sourceport option is set, then calls connect() on the socket, attaching it to the destination IP address, then uses read(), write(), and select() to share data. UDP-L:, on the other hand, creates the UDP socket handle, immediately bind() it to the listen port, then waits on select(). When data is incoming, it calls recvfrom() with a "MSG_PEEK" option so that it can figure out the source port and IP address, then uses connect() to attach that source IP and port to the socket. It used read() and write() after that. It can't receive from new client because it uses connect(), unless you use the fork option)
The downside of the peer approach is that they will only talk to each other, likes peers, rather than one being open to receiving from anyone else, like a server receiving from a client. Fortunately, socat also has UDP-SENDTO:, UDP-RECVFROM:, and UDP-RECV: addresses.
UDP-SENDTO: doesn't seem to behave any differently from UDP: address. Perhaps there is a subtle difference that I can't see at the moment. Example: "socat STDIO UDP-SENDTO:foo:11111".
SysCall Reference: sendto()/recvfrom()
(Again, for reference, UDP-SENDTO address does nothing on startup except to make the socket handle. When data is sent, it is sent via the sendto() system call. No call to connect() is made, bind() is called only if sourceport or bind options are used. It uses recvfrom() to read data, and select() so that it doesn't deadlock. I understand the theoretical differences between how UDP:'s behaviour of connect() followed by write(), and UDP-SENDTO:'s behaviour, using only sendto(). But I fail to appreciate a meaningful difference in overall behaviour. Specifically, while in general successive calls to sendto() can be directed at different destination IP addresses, socat has no way of arranging that to happen.)
Behaviour: Simple Multiplexed Server
UDP-RECVFROM: will wait for incoming data. When it gets a packet, it will then send any number of packets back to whoever sent the incoming one. But it won't ever wait for any more incoming packets. It will only send packets back to the source of the first packet received. This puzzled me, so I checked the man page, and indeed that's exactly what it's supposed to do. The man page goes on to say that this behaviour, when augmented with the fork option, is "similar to typical UDP based servers like ntpd or named." I suppose it's because some UDP-based services are strictly packet-based--a single packet from the client is answered by the server (with one or more response packets), after which the transaction is over (e.g NTP and DNS). Handling multiple packets in both directions would require an extended application protocol to sort out ordering and retries (or at least acknowledgments) and that's not necessary for every application. It's suitable for very simple and short-messaged client-server applications. You can catch a single message from the above UDP-SENDTO example with "socat STDIO UDP-RECVFROM:11111,fork".
SysCall Reference: socat-specific behaviour
(For reference, UDP-RECVFROM creates the socket, bind()'s to the given listen port, then waits for data using select(). When it receives a single packet, it calls recvfrom(). Then it goes back to select(), but only to wait for more data to be ready to send over the socket. This makes me wonder whether this "one incoming packet only" behaviour is built in to the recvfrom() system call. I tend to think it is not, but rather is a conscious design decision on the part of the socat author(s).)
Behaviour: Data Receiver
The UDP-RECV address will also wait for incoming data, just as UDP-LISTEN and UDP-RECVFROM do. However, UDP-RECV will receive all packets sent to it's listen port, from any and all clients. And it cannot send data back to any client. It's suitable for data collector applications. You can use both the UDP and UDP-SENDTO addresses to send to it. It aborts with an error if you try to make it send data. The "-u" option might be useful to prevent trying to use UDP-RECV in the wrong direction, should that be a problem. Example: "socat STDIO UDP-RECV:11111".
SysCall Reference: recvfrom()
(UDP-RECV uses recvfrom() to receive.)
Finally (sorta), UDP-DATAGRAM address exists primarily to send and receive broadcast and multicast applications, both symmetric and asymmetric. You need to use the broadcast option to make broadcast address work, otherwise you get an error. You most likely need to use the ip-add-membership= options to make multicast to work. (You wouldn't if some other application instructs the OS to do the proper IGMP protocol that makes multcast work.) It also works on standard unicast addresses.
Broadcast example: "socat - UDP-DATAGRAM:10.0.0.255:11111,broadcast,s
Multicast example: "socat - UDP-DATAGRAM:18.104.22.168:11111,bind=:111
Unicast example: "socat - UDP-DATAGRAM:10.0.0.5:11111,sp=11111"
None of these differentiate between sources. They truly are connectionless. On receive, they'll pick up any packet that makes it's way to their network interface, assuming the packet is destined for the same port that they are listening on. On send, they transfer packets to the IP address listed. The port they send to is controlled by the "sp=port" option (or "bind=:port" option for multicast). (You could argue the unicast example is't connectionless because it won't send to anybody, but it has to put something as the two address.) All these examples listen on an send to the same port. You could listen on 11111 and send on 11112, for example, but then to get two-way communication the other side would have to do the opposite, and in a broadcast or multicast example, you'd end up with a very strange partition of nodes, where one portion of nodes can talk to the other portion, but not to other members of their portion. When everyone sends to and listens one address, everyone can send & receive from everyone else.
If you had three machines, each running one of the above examples, then the unicast system could send to either the broadcast or multicast system, assuming it was using the right destination unicast address. The unicast system would receive from the broadcast system, but not the multicast one. The multicast system can receive whatever the broadcast system sends, but not the reverse. These behaviours might depend on the OS you're running, and perhaps even the ethernet driver it has. I don't think I would ever count on any of these behaviours for a production system, but they might make good tricks for testing or network investigation.
SysCall Reference: sendto()/recfrom(), again
(UDP-DATAGRAM calls recvfrom() (with the MSG_PEEK option), then again without that option. It does so continuously, and from all clients that send data. When sending data, it calls sendto().)
UDP-DATAGRAM with same source and destination ports, is what you'd most commonly use with Multicast applications. It might also be useful with UDP-RECV if you just need to listen (although UDP-DATAGRAM will do that, too), and with UDP and UDP-SENDTO for just sending data (again, UDP-DATAGRAM does that, too). I can't think of any cases where it's useful with UDP-RECVFROM.
Even more fun, if you had three machines, run this on two of them: "socat -d -d UDP-DATAGRAM:10.0.0.255:11111,broadcast,s
Well, that's all I got for UDP and socat. It should augment the previous article. I've got one more planned, for covering some advanced topics.