Reverse-engineering Insta360 Link Controller WebSockets protocol
Insta360 Link Controller software version 1.4.1 supports remotely controlling the webcam using a web interface on the phone. I want to try reverse-engineering the communication protocol so that we can build a custom controller software.
With the help of Claude, we were able to reverse-engineer the protocol in 20 minutes, and @rayriffy was able to implement a controller software for it.
Today me and @dtinth working on reverse-engineering Insta360 Link communication protocol to make it controllable with Xbox controller.
— Riffy 7/LIVE FUN!! 10/JSBKK (@rayriffy) August 31, 2024
Now we can control camera smoothly without any messy movement by dragging a mouse to move a camera itself. pic.twitter.com/UMsKoHlwqX
Background
I am part of Creatorsgarten’s livestreaming crew. We use Insta360 Link to stream our events. The camera is controlled by Insta360 Link Controller software. Being a consumer product, Insta360 does not provide a SDK for it; the only way to control it on a Mac is to use its desktop app to control the camera. We had to use a mouse to control the camera, which is not ideal for our use case.
In August 2023, Insta360 released an update to the software that allows controlling the camera via a mobile web interface.
The desktop app launches a WebSocket server, and the mobile web interface connects to it. Therefore, the desktop app must be running for the mobile web interface to work. With this, we can use Google Chrome’s Developer Tools to inspect the WebSocket messages. Time for some hardware hacking.
Reverse engineering the WebSocket protocol
From preliminary inspection, the messages are in binary format:
By looking at the Initiator tab, we can find the JavaScript code that is responsible for the communication.
When beautified, this results in 7000+ lines of code. Reverse-engineering this by hand would take a lot of time. Therefore, I asked Claude 3.5 Sonnet (through Anthropic Console) to help me with this task:
…and Claude told me:
This’s great to hear! I have never worked with protobuf or gRPC before, so I don’t know how its generated client code looks like, and I don’t know its binary format, so I wasn’t able to recognize it. I’m impressed by how Claude can recognize it from the JavaScript code (and also told me the telltale signs to look for).
But I remember that when using protobuf, there is a .proto file that describes the message format. Then client libraries can be generated from this file. With this knowledge, I asked Claude to reconstruct the protobuf schema from the JavaScript code.
With that, the job is done in mere minutes. Now we have the protobuf file.
Here’s how the Anthropics Console looks like:
Understanding the communication
Now in Google Chrome inspector, it shows a hexdump that looks like this:
00000000: 1001 5a28 0a0e 3939 3939 3939 3939 3939 ..Z(..9999999999
00000001: 3939 3939 100c 1816 2001 2004 20ff 0120 9999.... . . ..
00000002: 0228 ffff ffff ffff ffff ff01 .(..........
However, Protobufpal expects just the hex parts. So once more I asked Claude to help me with this (this time via Open Web UI + LiteLLM).
Almost correct, the correct command is actually cut -d' ' -f2-9
. Now I can decode the message:
Notes on communication flow
Upon first communication
Upon connection, we receive:
"connectionNotify": {
"connectNum": 1,
"inControl": false
}
Requesting control
Then we send this message to request control to the camera:
"controlRequest": {
"token": "AAaaaAAAAaaAaaaaAAAAaaaAAaAAaa"
}
It will send a success message back:
"controlResponse": {
"success": true
}
It will also send the serial number which we need to use in later messages:
"deviceInfoNotify": {
"deviceSerialNum": "99999999999999"
// ...
}
Heartbeating
Every once in a while, we must send a heartbeat message to keep the connection alive:
"heartbeatRequest": {}
Controlling pan and tilt speed.
Send this message to control the pan and tilt speed.
"uvcExtendRequest": {
"data": [
1,
4,
255,
2
],
"curDeviceSerialNum": "99999999999999",
"paramType": PARAM_PAN_TILT_RELATIVE,
"selector": XU_PANTILT_RELATIVE_CONTROL,
"presetPosIndex": -1
}
The data format is [signX, magnitudeX, signY, magnitudeY]
.
- Sign is 0 for no movement, 1 for positive, and 255 for negative.
- Magnitude is a positive integer. From our testing we don’t recommend going over 30.
Controlling zoom level
"uvcRequest": {
"curDeviceSerialNum": "99999999999999",
"paramType": PARAM_ZOOM,
"value": 100
}
The value must be between 100 and 400.
Code
Check out the code on GitHub.