Cloud-Controlled Remote Pan Tilt Zoom Camera API for a Logitech BCC950 Camera with Azure and SignalR
I like my giant blog post titles. Nyah.
As a remote worker for almost 5 years now, I live in video conferences. I feel really strongly about the power of seeing someone's face rather than just being a voice on a scratchy speaker phone. I've build an AutoAnswer Kiosk for Lync with some friends that you can get for free at http://lyncautoanswer.com (and read about the code here), I've got a BusyLight so the kids know I'm on a call, and the Holy Grail for the last few years has been a reliable Pan Tilt Zoom camera that I could control remotely.
Related Reading
A few years ago I super-glued a LifeCam camera to an Eagletron TrackerPod and build a web interface to it. I wanted to do this on the cheap as I can't afford (and my boss is into) a $1500 Panasonic IP Camera.
The Solution...er, the Problem
I have found my camera and built my solution. The Logitech BCC950 Conference Cam is the best balance between cost and quality and it's got Pan Tilt and (digital) Zoom functionality. The Zoom is less interesting to me than the motorized Pan Tilt.
Let's think about the constraints.
- A Logitech BCC950 PTZ camera is installed on a Windows machine in my office in Seattle.
- I'm anywhere. I'm usually in Portland but could be in a hotel.
- I may or may not be VPN'ed into work. This means I want to be able to communicate with the camera across networks, traverse NATs and generally not worry about being able to connect.
- I want to be able to control the camera in a number of ways, Web API, whatever, but ideally with cool buttons that are (or look) integrated with my corporate instant messaging system.
There's three interesting parts here, then.
- Can I even control the camera's PTZ functions programmatically?
- Can I relay messages across networks to the camera?
- Can I make a slick client interface easily?
Let's figure them out one at a time.
Can I even control the camera's PTZ functions programmatically?
I looked all over and googled my brains out trying to find an API to talk to the Logitech camera. I emailed the Logitech people and they folks me that the camera would respond to DirectShow APIs. This means I can control the camera without any drivers!
MSDN showed me PROPSETID_VIDCAP_CAMERACONTROL which has an enumeration that includes things like:
This lead me to this seven year old DirectShow .NET library that wraps the hardest parts of the DirectShow COM API. There's a little utility called GraphEdt.exe (GraphEdit) that you can get in the Windows SDK that lets you look at all the DirectShow-y things and devices and filters on your system.
This utility let me control the camera's Zoom but Pan and Tilt were grayed out! Why?
Turns out that this Logitech Camera supports only relative Pan and Tilt, not absolute. Whatever code that creates this Properties dialog was never updated to support a relative pan and tilt but the API supports it via KSPROPERTY_CAMERACONTROL_PAN_RELATIVE!
That means I can send a start message quickly followed by a stop message to pan. It's not super exact, but it should work.
Here's the C# code for my move() method. Note the scandalous Thread.Sleep call.
private void MoveInternal(KSProperties.CameraControlFeature axis, int value)
{
// Create and prepare data structures
var control = new KSProperties.KSPROPERTY_CAMERACONTROL_S();
IntPtr controlData = Marshal.AllocCoTaskMem(Marshal.SizeOf(control));
IntPtr instData = Marshal.AllocCoTaskMem(Marshal.SizeOf(control.Instance));
control.Instance.Value = value;
//TODO: Fix for Absolute
control.Instance.Flags = (int)CameraControlFlags.Relative;
Marshal.StructureToPtr(control, controlData, true);
Marshal.StructureToPtr(control.Instance, instData, true);
var hr2 = _ksPropertySet.Set(PROPSETID_VIDCAP_CAMERACONTROL, (int)axis,
instData, Marshal.SizeOf(control.Instance), controlData, Marshal.SizeOf(control));
//TODO: It's a DC motor, no better way?
Thread.Sleep(20);
control.Instance.Value = 0; //STOP!
control.Instance.Flags = (int)CameraControlFlags.Relative;
Marshal.StructureToPtr(control, controlData, true);
Marshal.StructureToPtr(control.Instance, instData, true);
var hr3 = _ksPropertySet.Set(PROPSETID_VIDCAP_CAMERACONTROL, (int)axis,
instData, Marshal.SizeOf(control.Instance), controlData, Marshal.SizeOf(control));
if (controlData != IntPtr.Zero) { Marshal.FreeCoTaskMem(controlData); }
if (instData != IntPtr.Zero) { Marshal.FreeCoTaskMem(instData); }
}
All the code for this PTZDevice wrapper is here. Once that library was working, creating a little console app to move the camera around with a keyboard was trivial.
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
while (true)
{
ConsoleKeyInfo info = Console.ReadKey();
if (info.Key == ConsoleKey.LeftArrow)
{
p.Move(-1, 0);
}
else if (info.Key == ConsoleKey.RightArrow)
{
p.Move(1, 0);
}
else if (info.Key == ConsoleKey.UpArrow)
{
p.Move(0, 1);
}
else if (info.Key == ConsoleKey.DownArrow)
{
p.Move(0, -1);
}
else if (info.Key == ConsoleKey.Home)
{
p.Zoom(1);
}
else if (info.Key == ConsoleKey.End)
{
p.Zoom(-1);
}
}
Also easy was a simple WebAPI. (I put the name of the camera to look for in a config file in both these cases.)
[HttpPost]
public void Move(int x, int y)
{
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
p.Move(x,y);
}
[HttpPost]
public void Zoom(int value)
{
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
p.Zoom(value);
}
At this point I've got the camera moving LOCALLY. Next, I mail it to Damian (my office buddy) in Seattle and he hooks it up to my office computer. But I need something to control it running on THAT machine...and talking to what?
Can I relay messages across networks to the camera?
Here's the architecture. Since I can't talk point to point via TCP between wherever I am and wherever the camera is, I need a relay. I could use a Service Bus Relay which would be great for something larger but I wanted to see if I could make something even simpler. I'd like to use HTTP since it's, well, it's HTTP.
Since Azure lets me have 10 free websites and automatically supports SSL via a wildcard cert for sites at the *.azurewebsites.net domain, it was perfect for what I needed. I want to use SSL because it's the best way to guarantee that my traffic not be affected by corporate proxy servers.
There's three parts. Let's start in the middle. What's the Relay look like? I'm going to use SignalR because it will let me not only call methods easily and asynchronously but, more importantly, it will abstract away the connection details from me. I'm looking to relay messages over a psuedo-persistent connection.
So what's the code look like for a complex relay system like this? ;)
using System;
using SignalR.Hubs;
namespace PTZSignalRRelay
{
public class RelayHub : Hub
{
public void Move(int x, int y, string groupName)
{
Clients[groupName].Move(x, y); //test
}
public void Zoom(int value, string groupName)
{
Clients[groupName].Zoom(value);
}
public void JoinRelay(string groupName)
{
Groups.Add(Context.ConnectionId, groupName);
}
}
}
Crazy, eh? That's it. Clients call JoinRelay with a name. The name is the name of the computer with the camera attached. (More on this later) This means that this single relay can handle effectively any number of clients. When a client calls to Relay with a message and group name, the relay then broadcasts to clients that have that group name.
Can I make a slick client interface easily?
I created a super basic WPF app that's just a transparent window with buttons. In fact, the background isn't white or black, it's transparent. It's a SolidColorBrush that is all but invisible. It's not totally transparent or I wouldn't be able to grab it with the mouse!
<SolidColorBrush x:Key="NotQuiteTransparent" Color="#01000000"></SolidColorBrush>
The buttons use the .NET SignalR library and call it like this.
HubConnection connection = null;
IHubProxy proxy = null;
string remoteGroup;
string url;
private void MainWindow_MouseDown(object sender, MouseButtonEventArgs e)
{
if (e.ChangedButton == MouseButton.Left)
this.DragMove();
}
private async void MoveClick(object sender, RoutedEventArgs e)
{
var ui = sender as Control;
Point p = Point.Parse(ui.Tag.ToString());
await proxy.Invoke("Move", p.X, p.Y, remoteGroup);
}
private async void ZoomClick(object sender, RoutedEventArgs e)
{
var ui = sender as Control;
int z = int.Parse(ui.Tag.ToString());
await proxy.Invoke("Zoom", z, remoteGroup);
}
private async void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
url = ConfigurationManager.AppSettings["relayServerUrl"];
remoteGroup = ConfigurationManager.AppSettings["remoteGroup"];
connection = new HubConnection(url);
proxy = connection.CreateProxy("RelayHub");
await connection.Start();
await proxy.Invoke("JoinRelay", remoteGroup);
}
The client app just needs to know the name of the computer with the camera it wants to control. That's the "GroupName" or in this case, from the client side, the "RemoteGroup." Then it knows the Relay Server URL, like https://foofooserver.azurewebsites.net. The .NET client uses async and await to make the calls non-blocking so the UI remains responsive.
Here's a bunch of traffic going through the Relay while I was testing it this afternoon, as seen by the Azure Dashboard.
The client calls the Relay and the Relay broadcasts to connected clients. The Remote Camera Listener responds to the calls. We get the machine name, join the relay and setup two methods that will respond to Move and Zoom.
The only hard thing we ran into (Thanks David Fowler!) was that the calls to the DirectShow API actually have to be on a UI thread rather than a background thread, so we have to get the current SynchronizationContext and post our messages with it. This results in a little indirection but it's not too hard to read. Note the comments.
private async void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
var deviceName = ConfigurationManager.AppSettings["DeviceName"];
device = PTZDevice.GetDevice(deviceName, PTZType.Relative);
url = ConfigurationManager.AppSettings["relayServerUrl"];
remoteGroup = Environment.MachineName; //They have to hardcode the group, but for us it's our machine name
connection = new HubConnection(url);
proxy = connection.CreateProxy("RelayHub");
//Can't do this here because DirectShow has to be on the UI thread!
// This would cause an obscure COM casting error with no clue what's up. So, um, ya.
//proxy.On<int, int>("Move",(x,y) => device.Move(x, y));
//proxy.On<int>("Zoom", (z) => device.Zoom(z));
magic = SynchronizationContext.Current;
proxy.On<int, int>("Move", (x, y) => {
//Toss this over the fence from this background thread to the UI thread
magic.Post((_) => {
Log(String.Format("Move({0},{1})", x,y));
device.Move(x, y);
}, null);
});
proxy.On<int>("Zoom", (z) => {
magic.Post((_) =>
{
Log(String.Format("Zoom({0})", z));
device.Zoom(z);
}, null);
});
try {
await connection.Start();
Log("After connection.Start()");
await proxy.Invoke("JoinRelay", remoteGroup);
Log("After JoinRelay");
}
catch (Exception pants) {
var foo = (WebException)pants.GetBaseException();
StreamReader r = new StreamReader(foo.Response.GetResponseStream());
string yousuck = r.ReadToEnd();
Log(yousuck);
throw;
}
}
It All Works Together
Now I've got all the parts. Buttons that call a Relay that then call back - through NAT and networks - to the Remote Camera Listener which uses the Camera library to move it.
It works like a champ. And, because the buttons are transparent, I can put them over the Lync window and pretend it's all integrated.
TODO: I'm hoping that someone who knows more about Windows Internals will volunteer to create some code that will automatically move the buttons as the Lync Window moves and position them over the video window in the corner. Ahem.
You can set this up yourself, but I haven't gotten around to making an install or anything. If you have a Logitech BCC950 you are welcome to use my Relay until it costs me something. There's a preliminary download up here so you'd only need the Listener on one side and the Buttons on the other. No drivers are needed since we're using DirectShow itself.
This was great fun, and more importantly, I use this PanTiltZoom System ever day and it makes my life better. The best was that I was able to do the whole thing in C#. From client UI to cloud-based relay to device control to COM wrapper, it was all C#. It makes me feel very empowered as a .NET developer to be able to make systems like this with a minimal amount of code.
Lync Developer Resources
- CodeLync
- Developing Lync (Tom Morgan)
- Justin Morris on UC
- Lync Development by Michael Greenlee
- Lync'd Up (Tom Arbuthnot)
- The Modality Systems blog
Related Links
- Hanselminutes Podcast 242 - The Plight of the Remote Worker with Pete Brown
- Building an Embodied Social Proxy or Crazy Webcam Remote Cart Thing
- Virtual Camaraderie - A Persistent Video "Portal" for the Remote Worker
- Working Remotely from Home, Telepresence and Video Conferencing: One Year Later
Sponsor: Big thanks to this week's sponsor. Check them out, it's a great program, I've done it and spoken to actual live humans who help you get started writing an app! Begin your 30-day journey to creating a Windows Store app or game for Windows 8 or Windows Phone today. Your Idea. Your App. 30 Days.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
You're more than likely right about that. My guess is that is a conference room of some degree. Personally, I thought the use would probably be in someone remotely monitoring their house for those that are into that sort of thing (I'd probably benefit from it purely because my short term memory is horrid and I find myself driving back to the driveway when I question whether I locked the door after driving 40 ft away).
All in all, as a web-only guy this is incredibly neat to me. I wonder how long before someone does it on Windows 8 (take that, Android@Home).
Scott, you unintentionally answered a question I was thinking about this morning (I think. I believe it is answered, anyway). My knowledge of SignalR is non-existent (currently), but I was actually wondering if SignalR and async competed with each other or complemented each other. Judging from what I got from this, they don't appear to really overlap but you could use them together so the complementing is closer, right?
Anyway that one can use Skype instead of Lync, as our business does not have a Lync Server?
Specifically, how would one replace the Lync references in the UI Suppressed or Non-Suppressed Auto Answer App with Skype references?
I have one question though.. Why is it necessary for your machine (the one with WPF app) to join the relay? I would have thought its enough just to call zoom/tilt methods with the right group name? SignalR then only sends the command to the computer wirh the camera.
Sorry if im asking something obvious but it currently doesnt make much sense to me:)
http://dilbert.com/strips/comic/2012-10-26/
No good wide angle lenses exist for webcams so you can't see more than 3-4 people in a conference room filled with 15 to 20.Hi Scott, I am also a remote worker in Melbourne, Australia and had found similar issues being part of meetings. For a long time we had a person in the conference room responsible for manually aiming a webcam, which wasn't ideal. Earlier this year though I found the Genius WideCam 1050 webcam which has a 120 degree wide angle lens. It has made attending meetings much better. We sit it at the end of the table and it's like sitting at the table. I can see everyone.
The main drawback is that everything in the image is much smaller. This has two consequences. First, as you pointed out, being able to see whiteboards is a problem. We get around this by using oovoo for the video and adding a laptop with another webcam just on the whiteboard.
The second is that you can't see as much detail in people's faces, but I think seeing everyone outweighs this. I'd like to try the F100 Full HD version to see if that improves things, but it doesn't seem to be available in Australia.
Also, if you find that you're using GraphEdt.exe much, like I do, then you might be interested in monogram graphstudio which is an open source version of the GraphEdit tool with heaps of extra handy features added.
Nate
I would have preferred to do it with hooks (push vs. poll), but, you need to delve into C++ to do that and I didn't feel like it. So the code uses P/Invoke to figure out if the window we want is active (once a second) and once it finds once it figures out where it is (every 100ms). I went and did this for Lync - at least as far as my version is involved. If you need to hack it yourself for your version grab Window Detective (drag the "Pick Window" toolbar button to the Lync window) and figure out what gets updated when a video session is active. For my version the *last* window (vis a vi control) with the class "CtrlControlSink", that contains a "LCC_VideoParent" is shown (WindowStyles.Visible). Hopefully my P/Invoke wrapping efforts make it obvious.
But I have a bumblebee question. When I look at the uvc 1.1 specs and descriptors for the BCC950 there seems to be a mismatch concerning pan relative and tilt relative. Only the latter is checked in the BCC950 descriptors. So my question is how does you bumblebee fly?
+1 for the Genius WideCam 1050. We have several of them throughout our offices and they work great. As Nate said, they are more for the feeling of "being there" than reading whiteboards, but they do improve the experience quite a lot. One thing we don't like about them is the integrated microphone, so we actually use the CadAudio U7 microphone for sound.
Think about using the windows kinect sdk. With audio and skeleton recognition you can determine the speaker and where he sits and then move the camera to him automagically.
Or you can implement person buttons in your remote control and just click on the person you want to see.
I'm sure one day you will present us with an update of this project :-)
Any chance you could release a compiled version of the basic applet that does the PTZ for us non-coder type remote workers?
Will this solution work with any PTZ camera or does it have to have some specific compatibility requirements?
Thanks again!
Shawn
If you do decide to improve it, may I suggest making it so that it doesn't need a remote client, but can run from a browser, relaying actual https to the relay server or the camera. That way people could control from all platforms with no install. I realize the camera needs to be in windows to use your code base and the APIs you found.
I'm playing around with my BCC950 and your c# code, but I'm stuck on the xml.config file (for the PTZControl program) where you describe the Device to be connected to.
Just
<appSettings>
<add DeviceName = "BCC950 ConferenceCam" />
</appSettings>
is not working. You have a clue?
Thanks & Cheers Enrico
I'm running into an unusual problem... I have a BCC950 but none of my code, your code or Logitech's PTZ example code seems to work. For example, your PTZDeviceConsole.exe gives this error:
C:\Users\Ian\Desktop\PanTiltZoomSystemv0.9\PanTiltZoomSystemv0.9\ConsoleTester>P
TZDeviceConsole.exe
Unhandled Exception: System.NotSupportedException: This camera doesn't appear to
support Relative Pan and Tilt
at PTZ.PTZDevice..ctor(String name, PTZType type)
at PTZ.PTZDevice.GetDevice(String name, PTZType type)
at PTZDeviceConsole.Program.Main(String[] args)
Any clue what might be causing this error? Essentially what I've found is that my system recognizes the camera, but any attempt to get() or set() camera properties fails.
For Example:(SupportFor(KSProperties.CameraControlFeature.KSPROPERTY_CAMERACONTROL_PAN_RELATIVE)
I'm starting to think it's something firmware related...Though I haven't found any tools to flashing the firmware.
Any clues?
It would be really good to have that, as the lack of controls is what is preventing me from buying the BCC950. Thanks!
Comments are closed.