Scott Hanselman

CLR and DLR and BCL, oh my! - Whirlwind Tour around .NET 4 (and Visual Studio 2010) Beta 1

May 20, 2009 Comment on this post [18] Posted in BCL | Learning .NET | TechEd
Sponsored By

Just got a great tweet from Jeremiah Morrill about .NET 4.

"Office and COM Interop that is actually fun to do" Show me! Until then, I'm calling shenanigans.

I love a challenge! In my first Whirlwind Tour post on ASP I mentioned how COM Interop and Office Interop was fun with 4.

I've done a lot of COM Interop with C# and a LOT of Office Automation. Once upon a time, I worked at a company called Chrome Data, and we created a Fax Server with a Digiboard. Folks would call into a number, and the person who took the order would pick the make/model/style/year of the car and "instantly" fax a complete report about the vehicle. It used VB3, SQL Server 4.21 and Word 6.0 and a magical thing called "OLE Automation."

Fast forward 15 years and I sent an email to Mads Torgerson, a PM on C# that said:

I’m doing a sample for a friend where I’m simply spinning through an Automation API over a Word Doc to get and change some CustomDocumentProperties.

I’m really surprised at how current C# sucks at this. Of course, it makes sense, given all the IDispatch code in Word, but still. Dim != var as they say. Fix it!

Well, everything except "fix it!" is true. I added that just now. ;) I did a post on this a while back showing how scary the C# code was. This is/was somewhere where Visual Basic truly excels. I vowed to only use VB for Office Automation code after this fiasco.

If you want to melt your brain, check out the old code. No joke. I've collapsed the block because it's too scary. See the "ref missings"? The reflection? The Get/Sets? Scandalous!

{
ApplicationClass WordApp = new ApplicationClass();
WordApp.Visible = true;
object missing = System.Reflection.Missing.Value;
object readOnly = false;
object isVisible = true;
object fileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"..\..\..\NewTest.doc");
Microsoft.Office.Interop.Word.Document aDoc = WordApp.Documents.Open(
ref fileName,ref missing,ref readOnly,ref missing,
ref missing,ref missing,ref missing,ref missing,
ref missing,ref missing,ref missing,ref isVisible,
ref missing,ref missing,ref missing,ref missing);

aDoc.Activate();

string propertyValue = GetCustomPropertyValue(aDoc, "CustomProperty1");
SetCustomPropertyValue(aDoc, "CustomProperty1", "Hanselman");

foreach (Range r in aDoc.StoryRanges)
{
r.Fields.Update();
}
}

public string GetCustomPropertyValue(Document doc, string propertyName)
{
object oDocCustomProps = doc.CustomDocumentProperties;
Type typeDocCustomProps = oDocCustomProps.GetType();
object oCustomProp = typeDocCustomProps.InvokeMember("Item",
BindingFlags.Default |
BindingFlags.GetProperty,
null, oDocCustomProps,
new object[] { propertyName });

Type typePropertyValue = oCustomProp.GetType();
string propertyValue = typePropertyValue.InvokeMember("Value",
BindingFlags.Default |
BindingFlags.GetProperty,
null, oCustomProp,
new object[] { }).ToString();

return propertyValue;
}

public void SetCustomPropertyValue(Document doc, string propertyName, string propertyValue)
{
object oDocCustomProps = doc.CustomDocumentProperties;
Type typeDocCustomProps = oDocCustomProps.GetType();
typeDocCustomProps.InvokeMember("Item",
BindingFlags.Default |
BindingFlags.SetProperty,
null, oDocCustomProps,
new object[] { propertyName, propertyValue });
}

Fast forward to C# under .NET 4.

var WordApp = new ApplicationClass();
WordApp.Visible = true;
string fileName = @"NewTest.doc";
Document aDoc = WordApp.Documents.Open(fileName, ReadOnly: true, Visible: true);
aDoc.Activate();

string propertyValue = aDoc.CustomDocumentProperties["FISHORTNAME"].Value;
aDoc.CustomDocumentProperties["FISHORTNAME"].Value = "HanselBank";
string newPropertyValue = aDoc.CustomDocumentProperties["FISHORTNAME"].Value

foreach (Range r in aDoc.StoryRanges)
{
foreach (Field b in r.Fields)
{
b.Update();
}
}

See how all the crap that was originally reflection gets dispatched dynamically? But that's not even that great an example.

Word and Excel Automation with C# 4

That's just an example from Jonathan Carter and Jason Olson that gets running processes (using LINQ, woot) then makes a chart in Excel, then puts the chart in Word.

using System;
using System.Diagnostics;
using System.Linq;
using Excel = Microsoft.Office.Interop.Excel;
using Word = Microsoft.Office.Interop.Word;

namespace One.SimplifyingYourCodeWithCSharp
{
class Program
{
static void Main(string[] args)
{
GenerateChart(copyToWord: true);
}

static void GenerateChart(bool copyToWord = false)
{
var excel = new Excel.Application();
excel.Visible = true;
excel.Workbooks.Add();

excel.get_Range("A1").Value2 = "Process Name";
excel.get_Range("B1").Value2 = "Memory Usage";

var processes = Process.GetProcesses()
.OrderByDescending(p => p.WorkingSet64)
.Take(10);
int i = 2;
foreach (var p in processes)
{
excel.get_Range("A" + i).Value2 = p.ProcessName;
excel.get_Range("B" + i).Value2 = p.WorkingSet64;
i++;
}

Excel.Range range = excel.get_Range("A1");
Excel.Chart chart = (Excel.Chart)excel.ActiveWorkbook.Charts.Add(
After: excel.ActiveSheet);

chart.ChartWizard(Source: range.CurrentRegion,
Title: "Memory Usage in " + Environment.MachineName);

chart.ChartStyle = 45;
chart.CopyPicture(Excel.XlPictureAppearance.xlScreen,
Excel.XlCopyPictureFormat.xlBitmap,
Excel.XlPictureAppearance.xlScreen);

if (copyToWord)
{
var word = new Word.Application();
word.Visible = true;
word.Documents.Add();

word.Selection.Paste();
}
}
}
}

Notice the named parameters in C#, like 'Title: "whatever"' and "copyToWord: true"?

PIAs no long stand for Pain in the *ss - Type Equivalence and Embedded Interop Assemblies

Primary Interop Assemblies are .NET assemblies that bridge the gap between a .NET app and a COM server. They are also a PIA, ahem. When there's articles about your technology on the web called "Common Pitfalls With ______" you know there's trouble.

Typically you reference these Interop assemblies in Visual Studio and they show up, predictably, as references in your assembly. Here's a screenshot with the project on the right, and the assembly under Reflector on the left.

image

This means that those PIAs better be deployed on the end user's machine. Some PIAs can be as large as 20 megs or more! That's because for each COM interface, struct, enum, etc, there is a managed equivalent for marshalling data. That sucks, especially if I just want to make a chart and I'm not using any other types. It's great that Office has thousands of types, but don't make me carry them all around.

However, now I can click on Properties for these references and click Embed Interop Types = true. Now, check the screenshot. The references are gone and new types have appeared.

 image

Just the types I was using are now embedded within my application. However, since the types are equivalent, the runtime handles this fact and we don't have to do anything.

This all adds up to Office and COM Interop that is actually fun to do. Ok, maybe not fun, but better than a really bad paper cut. Huh, Jeremiah? ;)

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service
May 20, 2009 9:48
Good news on the cleaning up of the Office interop interface.

Thinking it would be great now if 'Record Macro' had an option to generate c# instead of VBA, though it would be simple enough to do the conversion manually now.

I currently, with .NET 3.5, do early development in VBA before switching to C# and I suspect I'm not the only one.
May 20, 2009 10:56
Well nothing new on my side, because i am doing office automation using VB.Net, for my opinion the best interop language for office until c# 4.0.
What i liked is the embed interop types, but why the well didn't the collegues did this before?
Most of the stuff coming for c# 4.0 is nearly already available in vb.net so no new features for me :)
May 20, 2009 11:23
It's true! This is stuff that VB has always excelled in!
May 20, 2009 12:48
Scott, your second screenshot shows the 'References' node collapsed in Reflector. This doesn't proove that the references are gone at all! Lies!
May 20, 2009 12:55
Matt - Updated with new screenshot. Happy now, Professor Positive? ;)
May 20, 2009 13:38
Missing from both your old code sample and the new c# 4.0 ones is dereferencing the com objects, e.g.:

Excel.Range worksheetCells = worksheet.Cells;

//do stuff

Marshal.ReleaseComObject(worksheetCells);


Not doing this (ReleaseComObject) with 2.x/3.x code leads to memory leaks (and occasional crashes) in the office app you're interfacing with. Is that still the case with 4.x or does it have a built-in automagic refcounter decrement mechanism? Triggered by the GC maybe?
May 20, 2009 13:59
Scott, much happier thanks! I'm considering using the Excel Automation in some LOB web applications and I'm hoping this feature will reduce the overall impact on deployment and the runtime.
May 20, 2009 14:09
Hey Now Scott,

These posts on .NET 4 & VS10 have been great. The Chart example is so nice & useful.

Hanselminutes fan,

Catto
May 20, 2009 16:35
So, 8 or so years after the release of .NET 1.0 the office team is still pushing COM and Interop ?

Where is the managed API for Office ?

The biggest grief is dealing with 'is it XP/2003/2007 - is it html, etc...' as well. I cringe whenever I see the 'read the excel document in and do such and such' - or create a multi-page dynamic excel spreadsheet.

I find the current 'interop' API to be a complete mess as you show above... :)

May 20, 2009 18:57
Is this different than the Office Open XML library? Or has that library now been sucked up into the 4.0 framework?
May 20, 2009 19:23




I agree with Steve. I understand why this new com interop stuff is useful. But seriously, IDispatch needs to die. When was VB6 released again?
it's been 10 years and office (and others) are still releasing these horrendous APIs.

How about writing (and enforcing, company-wide use of) an fxcop-like tool for idl files?
May 20, 2009 19:52
I can only second what steve and Kristofer mention above. They need to sort out the whole ReleaseComObject issue and clean up the API while they are at it.

The obvious example of the ReleaseComObject problem is a For Each loop where the IL uses an intermediary variable behind the sceens to itterate the objects through IEnumberable but dosen't have the sense to realsie that it is working with a COM API and so therefore dosent call ReleaseComObject and each time round the loop you creat another orphaned COM ptr that dosen't get cleaned up until the process terminates!

We really should not have to handle all of that junk, why can MS not make a single interop library that takes care of multiple office versions and handles all COM ptr ReleaseComObject calls and memory management for the developer so we can treat it like a first class .net API even if it is not behind the sceens!
May 20, 2009 20:55
Kristofer, we’ve always had automatic releases for COM objects as the GC determines it needs to clean them up. They don’t get cleaned up immediately, but do get cleaned up at some point later after a GC or two. I've confirmed with this with the team.

They said:

In the type of app in the blog it doesn’t really matter, if people screw up their app will fail and they’ll get obvious ref-counting bugs. In cases where their code is being loaded inside of office, rather than the other way around, it is much more concerning because you can end up releasing someone else’s reference then there can be problems that they don’t notice in their code but breaks someone elses office add-in instead.

So, the point is that while ReleaseComObject is more deterministic, it's usually not necessary as the GC will do the same thing.
May 20, 2009 21:00
Visual Foxpro (VFP) was also great with COM Interop, with intellisense support, etc.

Ah, but VFP lives in .Net: http://www.etecnologia.net/Products/VFPCompiler/VFPDeveloperStudio.html (I am not affiliated). Here's an example of running Word from within .Net, with no references added to the project:

USING NAMESPACE System
TLocal loWord
loWord = CreateObject("word.application")
loWord.Application.Documents.Add()
loWord.ActiveDocument.SaveAs("c:\temp\myvfpdotnetdoc.doc")
loWord.quit()

That's it. I agree .Net 4 is heading in the right direction. For many of us, the eTecnologia effort bringing VFP capabilities to .Net is already there. The result of the above, btw, is a CLI-compliant .dll or .exe.
May 20, 2009 21:38
Sorry, "simple" != "fun". Better luck next time!
May 21, 2009 2:35
Oh, if only we'd had this five years ago.
J
May 21, 2009 4:46
Scott,

Maybe that's the intention, but in the real world it unfortunately won't dereference office interop com objects.

I have a bunch of excel interop apps that just read and write stuff to/from excel spreadsheets. Without explicit calls to ReleaseComObject, the Excel process keeps growing and growing. It stays bloated after GC. Sometimes it will crash. Maybe something excel-specific..?


private static void SetCellFormula(Excel.Range range, int rowNo, int colNo, string formula)
{
Excel.Range cell = (Excel.Range)range.Cells[rowNo, colNo];
cell.Formula = formula;
Marshal.ReleaseComObject(cell); //without explicit cleanup, excel will bloat...
}


Oh well. As others said above, I hope MSFT will supply a managed API for Office, Visual Studio etc some day....
May 21, 2009 13:51
The Embed Interop Type thingie sounds a lot like classic static linking from the C/C++ world, is that right? I really miss that feature in C#.

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.