Assembly Fiefdoms: What's the Right Number of Assemblies/Libraries?
Deciding how to physically partition up your application isn't a .NET specific topic, but I'll use the word "Assemblies" and use .NET terminology since we work largely in .NET.
What's the right number of assemblies? When do you split up functionality into another assembly? Should you use ILMerge to make a single über-assembly? Perhaps one class or one namespace should be inside each assembly?
Here's my thinking...organized very simply.
- The number of assemblies should NOT relate or correlate in any way to the number of developers.
- Just as your logical design doesn't have to be paralleled in your physical deployment, your number of employees shouldn't affect the number of assemblies you have. If you are a single developer, or if you're 50, there's a "comfortable" number (a number that's right for that app) of assemblies your application should have , and that number doesn't change if you add or remove people.
- Your source control system shouldn't affect your assembly count.
- May karma help you if you're using Visual Source Safe or a source control system that requires exclusive checkouts, but try to avoid letting your source management system indirectly or directly put pressure on developer to move towards a certain number of assemblies. Avoid Assembly Fiefdoms.
- 1 Namespace != 1 Assembly
- Consider first organizing your code logically (namespaces) rather than physically (assemblies). Use namespace hierarchies that make sense. System.IO and System.Compression and System.IO.Ports come to mind.
Robert Martin has an excellent PDF on Granularity, while focused on C++, the concepts aren't C++ specific. He has a section called "Designing with Packages" where he asks these questions, that we should ask ourselves when we start organizing an application. Substitute "packages" for "assemblies" if you like. He defines a package as a "releasable entity.":
1. What are the best partitioning criteria?
2. What are the relationships that exist between packages, and what design principles govern their use?
3. Should packages be designed before classes (Top down)? Or should classes be
designed before packages (Bottom up)?
4. How are packages physically represented? In C++? In the development environment?
5. Once created, to what purpose will we put these packages?
[Robert Martin's "Granularity"]
He then goes on to answer these questions. Do read the PDF, but here's some terse quotes along with my own commentary:
"The granule of reuse is the granule of release."
If you're creating a reusable library or framework, think about the developer who is "downstream" from you and how they will reuse your library.
"The classes in a package are reused together. If you reuse one of the classes in a package, you reuse them all."
Not only this, but you reuse all their dependencies. Consider what the developer will be creating with your library and if he/she needs to include other assemblies of yours to accomplish the goal. If they have to add 4 assemblies to implement 1 plugin, you might consider rethinking the number of assemblies in your deployment.
Consider the effect of changes on the user of your library. When you make a change to a class, perhaps one that the user/developer doesn't care about, do they need to distribute that newly versioned library anyway, receiving a change in a class they didn't use?
"The classes in a package should be closed together against the same kinds of changes."
Developers don't usually use/reuse a single class from your assembly, rather, they'll use a series of related classes, either in the same namespace or in a parent namespace. Consider this, and the guideline above when you decide what classes/namespaces go in what assembly.
Classes that are related usually change for related reasons and should be kept together in a related assembly, so that when change happens, it's contained in a reasonably sized physical package.
"The dependant structure between packages must be a directed acyclic graph."
Basically, avoid circular dependencies. Avoid leapfrogging and having a (stable) Common assembly dependant on a downstream assembly. Remember that stability measures how easily an assembly can change without affecting everyone else. Ralf Westphal has a great article on Dependency Structure using Lattix, an NDepend competitor. Patrick Smacchia, the NDepend Lead Developer, has a fine article on the evils of Dependency Cycles.
Ralf has some very pragmatic advice that I agree with:
"Distinguish between implementation and deployment. Split code into as many assemblies as seems necessary to get high testability, and productivity, and retain a good structure." [Ralf Westphal]
This might initially sound like a punt or a cop-out, but it's not. The number of assemblies you release will likely asymptotically approach one. It'll definitely be closer to one than to one-million. However, if you ever reach one, worry, and think seriously about how you got there.
Udi Dahan commented on Patrick's Guiding Principles of Software Development and said, quoting himself from elsewhere (emphasis mine):
In the end, you should see bunches of projects/dlls which go together, while between the bunches there is almost no dependence whatsoever. The boundaries between bunches will almost always be an interface project/dll.
This will have the pleasant side-effect of enabling concurrent development of bunches with developers hardly ever stepping on each other’s toes. Actually, the number of projects in a given developer’s solution will probably decrease, since they no longer have to deal with all parts of the system.
I would respectfully disagree with the bolded point and direct back to the principle that team structure should never dictate deployment structure. A good source control system along with pervasive communication as well as good task assignment should prevent "stepping on...toes." Sure, it's perhaps a side effect, but definitively not one worth mentioning or focusing on.
What's the conclusion? Consider these things when designing your deployment structure (that is, when deciding if something should be a new assembly or not):
- Where will it be deployed?
- Remember that compile-time references aren't necessarily runtime references. Assemblies aren't loaded until the code that uses them is JIT'ted.
- Think about where your 3rd Party Libraries fit in. Are they hanging off your "root" assembly, or are they abstracted away behind a plugin assembly?
- Consider putting "undesirable dependencies" as far away from your main stuff as possible. Don't get them tangled in your main tree. "Things burn, Colonel."
- If you're making a library, think about how many dependencies "come along for the ride" when your downstream developer/user does an Add Reference.
- If they have to add 4 assemblies to make 1 plugin, that's weak. (DasBlog has this problem with custom macros, we force you to add 2. That's lame of us.)
- If you're big into plugins and you're *all about* interfaces, do consider putting them in their own assembly.
- Consider grouping assemblies based on shared functionality, where Assembly == Subsystem, while watching for cycles and in appropriate, ahem, coupling.
- If you notice that some assemblies seem to always go together, always versioning and changing together, like peas and carrots, perhaps they should be one.
- Notice "natural clumping" and consider if that clumping is happening for reasons of Good Design or Bad Design, and act accordingly.
- Personally, while I think ILMerge is a very cool tool, I have to find a good reason to use it in the Enterprise, except for when releasing small XCopy-able utilities to end-user's machines where I really want a single file.
What IS the right number of assemblies, Dear Reader?
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
What IS the right number of assemblies, Dear Reader?
The answer is 7.
Yes. 7.
I think basing assemblies around re-use is a great determinant. However, I would focus on actual reuse and not on planned reuse. Also, it has to be reuse outside of this particular project.
I think it'd be interesting to see if there's a statistical significance to the number of lines of code per assembly in projects in the wild. Hmm....
i do get the feeling you are ignore the Stable Dependencies Principle and the Stable Abstraction principle..
http://www.objectmentor.com/resources/articles/stability.pdf
One of the greatest benefits of building an application using TDD (to me), is the effect that promoting testability to a major concern gives. This drives me into the land of interface-based programming--creating islands of isolation/testability/stability within my application. For someone who may not practice TDD, this usually means I write a unit-test, then i write an interface, then i write a class, then run the test (CustomerTest, ICustomer, Customer). The interfaces are "maximally stable" (to borrow words from Bob Martin)..Now, you can leave all that stuck in a single project if you like, but if i lean back on the Stable Dependency Principle, I see that I can create a more stable system if i split the interfaces (and abstract classes) into assemblies of stability. This means any code consuming the component/layer will depend on the interface assembly. The implementation assembly will also depend on the interface assembly. This leads both the implementation and client code to depend on a maximally stable package (an island of stablity if you will). I then use an IoC container to wire the two together at runtime. This gives me a very stable design to work with. I could drive a car between my dependencies if I wanted. That's from a TDD perspective. If I fall back on some of the more waterfall-like practices, this also agrees very much with the principle of "Separation of Interface and Implementation" (POSA1, "Enabling Techniques for Software Architecture"). While not specifically directed at packaging, it too deals with splitting the contract (interface/abstract class) from the implementation. This again drives me to separate the contract from the implementation in terms of packaging so that stability is increased. I hope I don't come across as "the guy with the obscure book", but there are many more examples I could cite from POSA1. I could take this post farther, but I don't want to hijak your post any further than I already have. And don't take any of this personally, I'm a huge hanselman fan!! :-)
The Alhambra viewed at dusk, when the lights first turn on, is truly one of the images that I will never in my life forget. It's worth whatever it takes to get there; Granada is amazing.
Evan - Good points...I think it's time I did either an IOC post or podcast. What's your favorite framework? Microsoft's or Windsor?
Oran - Would you mind expanding on your Amazon comment? You are saying that Amazon's well factored SOA is as a result of team configuration, rather than them deciding as a group to have contracts between teams? I'm trying to understand why the "dog" of their team structure "wagged" the tail of their deployment strategy. Please, share.
That would be awesome! My personal preference would have to go to Windsor, but that's just me.. Spring would probably be my #2..
Re: Amazon..
http://www.se-radio.net/index.php?post_id=157593
i.e. WebAppX depends on Logging.dll which depends on LogDataLayer.dll etc.
I would be interested to see if we have any assemblies that should be merged with another, where we have 2 assemblies that are always used together in our projects.
Each dev team is also responsible for deployment and operation of their service. You build it, you run it. So quality is higher, and you've got a tighter feedback loop with users of your service.
This Jim Gray interview with Werner Vogels has more details:
http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=388
MS themselves have namespaces that span multiple assemblies: System.Configuration and System.IO are two that come to mind,
The second most important factor is probably location. Some assemblies should be in the GAC while other should be in some program-folder. Merging these assemblies might be a bad idea.
Having read your recent guiding principles article in conjunction with Agile Software Development, Principles, Patterns, and Practices, I was puzzled by Patrick's comment that "Fewer assemblies is better". Whilst I agree that less is more, applying the principles from ASPPP leads to more assemblies, not less. An example: I have a FileConverter interface that describes an interface for converting a given input file into an arbitary format (defined by the implementation), outputting to a given output folder.
Implementations of this might include a PdfToPngFileConverter, or a WmfToPngFileConverter (spot the theme here ;-). Now following ASPPP, and using dependency injection, I would be inclined to create an assembly for each (interface and implementation, or possibly merge the interface into each implementations assembly). Is this a good or bad approach?
Software projects are never conceived, designed, implemented, then deployed. There's evolution throughout that whole cycle then it essentially repeats upon delivery. If you don't recognize that post-delivery evolution (e.g. you'll add a class to this namespace, or add a method/property to that class, etc.) and the restrictions that it will impose, you'll run the risk of running into many problems in the long-term. It's often best to pre-design what you hope will be stable and you expect to be more evolutionary, and separate the two both logically and physically. A set of basic math functionality is a good example of something that will remain stable--laws of mathematics never change, they can be safely separated from other functionality (e.g a customer can't ask you to change how the sine of an angle is calculated).
The degree to which you need to disassociate functionality, if applicable, should also be taken into account as early as possible. If you need the ability to deploy sub-sets of functionality without including all of the functionality for sales requirements, that should be pretty high on the criteria for physical separation. If a sub-set of functionality need be updated in real-time, this is also a disassociation criteria. If one particular assembly need be loaded in its own AppDomain (.NET specific)--to be unloaded, updated, and reloaded--that should take precedence over other criteria.
On a more general note: appropriate class and namespace design should automatically abstract the people working on them from their lower-level organization, to a certain degree. For example, if you find that two or more developers are stepping on each other's toes checking a particular file in, then it's not designed appropriately.
And there is another important factor why I like to create multiple assemblies. I make a strong effort to keep all accessibility modifiers to the lowest level. When class methods are used only internal to an assembly, it can not 'accidentally' be used from other assemblies. It forces all the developers to go through the main door. And it avoid the cyclic dependencies not just among packages, but also among usercontrols and classes.
I've caught myself often that I wanted to re-use some class functionality in a different assembly that was not accessible for me, and it forced me to re-factor the design with adapter patterns, interfaces etc. right at that time.
I often read in blogs where they argue to keep the numbers of assemblies as small as possible. Mainly to safe on the startup loading time. I think you could get this startup time down, by signing the assemblies, and placing them into the GAC. Personally I just show a flash screen and let the end-user take a zip of his/her coffee.
To Chris: There is a plugin for Lutz Roeder's Reflector that shows the assembly dependencies. The plugin is using the graph rendering work of Microsoft GLEE. The graph component is developed by Jonathan de Halleux. (see http://www.codeplex.com/reflectoraddins/Wiki/View.aspx?title=Graph&referringTitle=Home)
From what I understand about your Amazon comments, the structure would be related to inter-team structure. So, to the extent that you design your teams to align with functional blocks, this would be correct.
Scott is referring to intra-team structure, i.e., between individual developers. It would be just horrible to see "Chris'sAssembly.dll" and "Scott'sAssembly.dll". Indeed, I once had to clean up legacy code that had been developed by a different company in just this manner, and understanding the code was well nigh impossible.
These projects are important for managing dependencies during development, testing, integration, etc but when going into production there may be a case for rolling these projects into fewer physical assemblies (but not one that I can think of besides mscorlib).
http://msdn2.microsoft.com/en-us/library/ms173121(VS.80).aspx
To make it a bit more interesting, consider how you would handle these changes if the assemblies are strongly signed and tossed into the GAC with each release getting an updated version number. And what if you are building a distributed SOA solution with many small services? How does that affect the above guidelines for number of assemblies?
Where I work we are doing a lot more with WCF and resolving these issues is starting to become a priority as I am sure it is elsewhere.
In the end, you should see bunches of projects/dlls which go together, while between the bunches there is almost no dependence whatsoever. The boundaries between bunches will almost always be an interface project/dll."
This will have the pleasant side-effect of enabling concurrent development of bunches with developers hardly ever stepping on each other’s toes. Actually, the number of projects in a given developer’s solution will probably decrease, since they no longer have to deal with all parts of the system.
That's the description from Udi that triggered the Amazon analogy for me. The bold part is what Scott disagreed with. The structure and benefits listed are the same as for the Amazon scenario. Scott went on to list pervasive communication as one of the solutions for stepping on toes. Intra-team, yes. Inter-team, restructuring is probably a more efficient long-term solution. But even within a team (or within a developer's own code), information hiding and interfaces help to minimize the amount of information someone must understand before they can safely start using your code.
Although the cost of communication is less within a small team, it is still a factor that influences design structure. If a particular dependency is going to require much conversation between developers, that "face-to-face contract" should be reflected in the actual code contract as well for the benefit of the next developer who comes along. So in this sense, I think inter-personal communication is always going to have some impact on inter-module communication.
Here's our assembly structure
OurCompany.OurProduct.Partners.WebControls
OurCompany.OurProduct.Partners.WebUI
OurCompany.OurProduct.Administration.WebControls
OurCompany.OurProduct.Administration.WebUI
OurCompany.OurProduct.Services.Main
OurCompany.OurProduct.Services.OtherService
OurCompany.OurProduct.Services.3rdService
OurCompany.OurProduct.Engine
the Engine is where datainterface, global enums, business objects, etc are.
Whaddya think? Based on Scott's comments, I was thinking that perhaps the WebControls aspect of things could be merged into the Engine, since they're highly dependent on the Engine. However, thinking about deployment, I dont want to have partner controls installed in the admin environment, so that's why they were split up.. to provide the minimum amount of code deployed for each purpose.
Comments? Hideous? Decent? Give it to me straight :)
But even within a team (or within a developer's own code), information hiding and interfaces help to minimize the amount of information someone must understand before they can safely start using your code.
That was the nugget of truth in case anyone glossed over it. Information Hiding is one of the key tools we have as developers for dealing with complexity. Also note that information hiding is not necessarily the same as encapsulation. The splitting of type (interface/abstract class) from implementation (class) is one technique for information hiding.
How you leverage this is probably up for debate. I prefer to package the contract and implementation separately (effectively doubling the number of assemblies). This forces me to focus only on little parts of the application at any given time, and happens to be very valuable in the current application I'm working on (health care risk modeling). In a simpler application, I might be tempted to forgo this, gambling that the loan on my "technical debt" never gets called in.
But, there are other benefits too. It forces me as a developer to program to the interface and not to the implementation of other classes. Given that Layer A only references Layer B's interface assembly, trying to use the implementation gives me a compile time error. This prevents me from "cheating" the design.
I'm sure this might come off as radical, but it has held true for me so far. The first time I started developing like this, I had a small "aha!" moment. It was the first time I had seen my Application Design (the evolutionary kind) cleanly pull away from the Application Implementation. A good question to ask yourself is, where is my design? Can you show someone? I'll give you a hint, the implementation is not the design, and the uml model is not the design.
I have not yet found a princple, pattern, or scenario which disagrees with the above (in terms of Enterprise Applications).
I'm sure there are reasons to care, but what are they?
The second issue to consider is that each assembly has a certain amount of overhead that limits the size of the assembly. In DNN we have a sitation where we have 6 different assemblies for different HttpModules. Most of these modules could be swapped out with alternate implementations however in most cases they are not. When compiled separately, these assemblies require 150k of memory. However, a quick test showed that if we combined these assemblies we could shrink the combined size down to 35K. In most environments this would not be that significant, however, when running in a shared hosting environment this memory usage can have a significant impact. When you multiply this 100K + of overhead times the number of apps running in some shared hosting environments, then this waste adds up quickly.
I guess the bottom line is that it is also important to look at your target platform and deployment scenario and ensure that your design choices are not going to have adverse impacts on application usage.
I think, like most questions, the right answer is different for everyone and the aggregate answers generally follow a bell curve. If I were to throw in my opinion, which it looks like I am, I'd say that development does not necessarily need to dictate what deployment does. And that's a relief because they have entirely different sets of concerns! A development team may want to analyze how work is going to be divided for construction and testing with an eye forward to how it will be maintained after release. (Scott, VSS 2K5 allows Exclusive Locks to be disabled at installation time) The deployment team is, in my experience, usually guided by development and architecture teams in how to approach this question. That's not to say that there aren't many conscientious build engineers out there who make an effort to address production performance issues in their configurations.
I have to admit that, except when it has directly affected my ability to edit or test code, I've never given this topic much thought. At the beginning of each project you always have the obligatory exchange among the developers.... "So did you put that in source control? Where?" "Oh, ok, I'll just start putting these files in that project, too."
It's scary stuff, but it's the reality in alot of shops. I won't say my org is innocent, but I will say that in the year and a half I've been here, we've implemented namespace standards, codepage:class relationship standards, and a couple solutions (I use that term loosely) including our primary money-maker only a year ago were re-tooled enough that they could be compiled in a single Visual Studio Solution file.
We are constantly seeking improvement. Our latest leaps in improvement have been performance- and stability-centric. But I believe the next improvements will be process-related and we need to make sure dev-to-deployment code hand-offs get fair attention.
Thanks for the great blog post, Scott!
Comments are closed.