Man Crushed Under Weight of 34TeraBytes

Well, that is the headline for the article about why I haven’t been posting very much….OK, that and about a million other reasons – primarily it’s because I’ve been distracted by playing too much WoW after work.

Mainly my daytime hours are consumed by my job. I’ve got a lot on my plate. Many issues to deal with everyday. Many issues often cycle back to being causes or symptoms of one thing in particular: digital asset management. Here is a glimpse into the problem:

total_count_of_files

Yes, that’s 34Terabytes spread over more than 1 million files…sigh. It’s comprised of some 50 disks – FireWire, USB, Storage Area Network (SAN) Volumes. I’ve become Chief Media Wrangler (in addition to being Director of Post-Production).

“Time for a little Media Management, ” I used to joke with a freelance client of mine back in the days before I took this job. Now, if only I had a little time for managing this media, it would be a little more, well,  manageable. I know there are pieces of this puzzle I am missing (Final Cut Server for one) and am working towards adding some of those important bits, but I thought I’d take a step back and go over what I’m doing to manage this stuff and how I’m trying to  handle on it. This is a bit deep and gets into the larger issue of project and asset management from the 50,000-foot view. If you don’t have a problem with managing what you’ve got, then more power to you. However, if you’re like many of the people I’ve encountered in this business – you don’t even realize you have a problem, until it’s much, much larger than 34TBs or over 1,000,000 files/assets.

{This post is verry long, technical, and rather detailed – don’t click More unless you know what you’re getting yourself into}

What is all that stuff?

OK, so to start we’ve got to figure out what all this stuff is. What are all these files and what’s important to keep? Unfortunately, there has been quite a bit of sloppiness in how these assets have been created and stored. I’m sure my situation is not unique. A large facility and a large number of people add up to a bit of a mess when it comes to assets used for creating multiple projects, from broadcast to corporate to simple conversions – particularly when things aren’t kept orderly as you go.

If you don’t already know, the term asset refers to anything that goes into making your project. Technically, they are Digital Assets, as defined by Wikipedia, but for sake of simplicity and inclusiveness I use asset to mean anything that can be associated with a project – still images, movies, audio clip, documents, tape logs, and every other file associated with a given project.

To back up a bit, it all starts with adopting a structure of how & where you store your assets and then sticking to it. Consistency is one of the most important aspect of good asset management. Making a plan and then sticking to it will help solve many more problems than you realize – mostly before they become problems in the first place.

Here’s my current plan:

nahb_asset_mgmnt_setup

And what you are looking at above:

  1. Each edit suite gets it’s own SAN Volume for the files they create (renders, assets they generate, etc) What you don’t see in this screen capture are the SAN volumes for the various projects. Each broadcast show gets it’s own SAN volume as well as In-House work and we have an Elements SAN Volume (for SFX, Music, etc.)
  2. On each suite’s media volume there are only two folders at the root level (well, that’s all I’ve approved to be there!) A folder called Assets and a folder called Final Cut Pro Documents:
    • The FCP folder is the same one that FCP creates by default. All captured footage (i.e. coming from timecoded tape) goes into the Capture Scratch folder by project. I know that there is quite a bit of contention about the system FCP uses for media files. The big problem we ran into when we tried to adopt a customized system was that as soon as someone either trashed prefs or didn’t know about the custom system it all went to pot – my favorite symptom of this problem is the Capture Scratch folder inside a Capture Scratch folder. This way I can say to someone, “just trash your FCP prefs,” and they can set the primary media drive to the right drive for their suite and off they go. Since we do 98% of capturing on our two ingest stations, most edit suites do not have any media in their capture scratch folders anyways.
    • The Assets folder contains all other files for a given project (see number 3 below for hierarchy info). This would be all the files that go into making a project – graphics, VO, music, AE, Motion, etc. Anything that isn’t already on a timecoded tape backup somewhere.

    These two folders give us the ability to easily back up (or move) a given project. With the SAN, now we don’t move things too much. At the conclusion of a given project we just backup the Project’s Asset folder and we can delete the Capture Scratch (because it can be re-captured from tape if it needs to be).

  3. The Generic Assets folder is what we use to create an initial starting point for every project. We have a copy of this folder available to every system and they just need to duplicate it into their assets folder and rename for their given project. It should be self-explanatory. Feel free to download your own copy of this blank hierarchy to use on your own projects. Feel free to add to it however you see fit.

OK, so now you see how it “should” be structured. Of course, it isn’t all that perfect. It’s a “too many cooks” problem. Each editor does things their own way and as soon as that starts happening, it all goes awry. The most important point I can make about this is: Pick a system and stick to it! Write it up and share it with everyone who touches your projects. Make sure that it is consistently applied to every project that comes in.

BTW, my system is somewhat adaptable to most styles of working as long everyone adheres to the Assets Folder and FCP Documents Folder system. Everything else is pretty malleable.

One flaw in all Asset Management Systems (whatever they are) is the failure of the humans feeding the system to do so properly. All the tools in the world aren’t going to be of much help if the people you have creating the content don’t play by the rules or put stuff in the proper place.

For backing up assets, we have a Quantum Superloader 3A. It is an autoloading, LTO backup device. It’s attached to the SAN-connected systems via Ethernet. It can backup 400GB per tape and the unit can hold up to 16 tapes – 6TBs of backup goodness.

Get your @#%$ together!

So, how did we get 34TBs in over 1 million files? Just doing our jobs. Face it, media productions generate assets. Lots of assets. That is the nature of the beast. I have boxes of camera originals that line the walls of our studio. At one point we figured out that we probably have 20-25,000 camera original and master tapes floating around here. The conversion to digital, whether digitizing those tapes or creating assets in a digital form from scratch, and the ease of asset creation means that we’ve now got so much more stuff to work with and manage.

My current problem is more an issue of detective work than anything else at this point. I have to work across all those multiple disks and systems to track down what is the latest version of each asset and what do we need to keep and what can be purged. More on that later.

Sadly, the first step is getting it all consolidated into one location – or as few locations as possible. In my case, I’ve created 5 SAN volumes to use as staging for archival. I send project to the archive volumes and when one archive drive fills up (400GB), then it gets backed up to tape. I hope to eliminate the interim step of consolidation one day by adding a tape backup/cataloging server to my LTO Superloader 3A, but there is something satisfying in working through consolidating all the assets for a given project and then sending it to backup and erasing it from the drives.

One important issue to confront when doing this kind of work is Version Control. How do know what file is the most recent one and which version is the FINAL? Most people go by dates. That’s fine to a point. Here’s a problem we’re having at the moment and dates are actually the cause not the solution: A project got worked on several months ago. The English version was finished and approved. Then the work started on the Spanish version. At some point, it appears, the English version got updated and it didn’t get sent to the editor working on the Spanish version. So, months down the line, when the two versions are done and approved and we go to online both shows, we discover that the English version isn’t the final, revised version. But the FCP project with the final Spanish is dated March 9, 2009. It’s the most recent, so it has to be the final version, right? Not so with this project. Now I’m on a wild goose chase to track down, across all these systems what is actually the final version. (Neat things is, with the tools I’m telling you about here, we were able to track down the correct final English version, so I know this stuff can work to help resolve problems).

Now back to how to handle Version Control? I’ve got two tools in my pocket so far that have been very helpful.

  • FolderSynchronizer – Created by Softobe, this program is a real help when it comes to making sure folders are in sync across more than one drive. It can handle backing up or syncing folders. It can do comparisons between folders and sync based on creation/modification dates. You can get a preview of what will be copied before you do it to make sure you don’t lose something important. It is a great first line of syncing folders because it can merge multiple folders into one very smoothly.
  • Gridiron Flow – Currently in Beta, this program will track all the elements that go into a FCP or AE or Photshop project. It makes tracking down what’s used in a project and what came out of a project much easier. This app also can do version tracking and scan drives to root out links between files and projects. I can’t do it justice in this explanation – go see for yourself. So far it has worked as advertised and seems to be helpful in tracking down what went into a given project, but I can’t say I’ve had a problem that only Flow could solve yet – time will tell.

I do know that Final Cut Server will help immensely with these problems. I have plans to get one in and working with our system this year, so we’ll see how that works out and what difference it really makes.

Binge! Purge!

We installed our SAN in April 2008. In less than a year, we’ve accumulated 34TBs of assets. We’ve actually had much more than that go through the system. Several TBs of media have been loaded, edited, and purged from the system over the last 12 months. The real question, always, is what should you keep? What can I throw away? What do I have to back up? That’s my problem now. I haven’t had much time to do the media management as of late and things are piling up. I’m behind on it and it shows.

The single best criteria I have for making a decision of what stays and what goes is the answer to this question: How hard would it be to get it back on the system? If it comes from a timecoded source tape, then it can be deleted as soon as I get confirmation that the final project has been delivered. If it comes from a tapeless camera, I only need to make sure the original files are backed up to LTO and then I can delete both the card backups and the imported media from the drives. If it is non-timecoded assets then I need to get them backed up to LTO as well and they can be deleted from the SAN.

We do run into the issue of re-using footage quite often. We’re lucky enough to have 48TBs worth of SAN at our disposal, so we can keep a lot of generic B-roll footage online and available whenever it’s needed. Still working on developing a system of how best to decide if it’s worthy of keeping around and how best to catalog it all, but that’s a work in progress.

Keeping track of it all going forward

OK, so after we go through all these assets, how can we keep track of everything we’ve got and where it all is? Simple answer: a database. The kind of database really depends on the level of sophistication you’d like (or need). The range runs from a simple text file to purpose-built asset management applications. Either end of the spectrum can help you get a handle on the problems faced by trying to keep tabs on all these elements.

Here’s a few of the options I’m using here to keep track of things:

  • Simple Text Files – Using this simple bit of code:ls -lpR > index.txt in a Terminal window will create a useful bit of information. (Or you could just get this Automator Workflow and make your own droplet). It makes a file listing of the complete hierarchy of a drive or a set of folders. Here’s what it’s output looks like:ls_listing_sampleYou can see that the listing contains a bunch of info about every file and every folder in the hierarchy. This info is useful even by itself. Because you are on a Mac you have this blessing/curse called Spotlight. Many have issues with it and find that it causes more trouble than it solves, but one thing that it does is catalog the contents of text files. Now if you create indices for all your drives, then Spotlight will catalog the contents of them and you can get results to show up in Spotlight searches. We found that if you use this method, it helps to add “.txt” to any search terms you are looking for – it forces it to look into those .txt files and your results will be more accurate.
  • The next level involves bringing in some sort of application to handle the cataloging and the searching. There are a couple out there that can handle this kind of work. CDFinder is a good choice. Another good choice is FileFinder. I use it – compared the two and they both did well, FileFinder was much faster at cataloging than CDFinder so I went with it. YMMV. FileFinder allows me to scan a volume, create a catalog of than volume, and then search across all saved catalogs. It works with anything that is a volume – Data CDs, Data DVDs, Internal or External Drives, SAN Volumes. It stores the catalog in a database and you can disconnect or eject the media and still be able to search it’s catalog later on. It will also do preview import – creating thumbnail images of the stills on the volume. These are a big help when trying to track down a graphic and you aren’t forced to get all the way to restoring the file only to find it isn’t the right version.
  • The upper-level involves an asset management tool like Avid Interplay or Final Cut Server. Haven’t gotten there yet, but will likely enter the world of Final Cut Server later this year.

Back it up and get it out of here

After I get it all sorted and cataloged, I will need to back it up. You’ve already heard about the Superloader 3A. It’s a great tool. Not crazy about the manual interaction I must have with it. A cataloging server will make a big difference in how I use it.

Two important issues to watch out for when dealing with a LTO back up drive: file naming and backing up in 400GB chunks.

For file naming, there are a set of rather restrictive rules about how to go about naming your files and what can and cannot be included in the filenames for anything you are backing up. Here are the rules I apply to anything that gets backed up:

Our file naming limitations include the following recommendation:
A-Series Filenames can not exceed 97 characters.
To be absolutely safe, adhere to the Windows filename restrictions:
No control characters. Carriage return (CR), NULL, and Linefeed (LF)  are control characters
Don’t use < > : ” / \ | ? * %
Don’t use space or period as last character

I use A Better Finder Renamer droplet I created to perfom this clean up work. Doing this renaming has the unfortunate side-effect of breaking some of the links to files in FCP/AE projects, but better to have a few broken links than lose an entire project because the file name prevented the files from being backed up in the first place.

Since I need to break things up into 400GB chunks, I have two down-and-dirty methods for acheiving this. I already mentioned the 400GB-sized SAN volumes. Those are super-easy to rely on. Just fill one up then back it up. The other method is for when I need to break a ser of folders/files down into 400GB chunks – or any size chunks for that matter. Folder Splitter will do just that. It can take a folder or folders full of files and then split them out into folders of any size you specify. You can use it to split an 80GB folder into 4GB chunks for backup to DVDs or a 1TB monster folder hierarchy and split it into 400GB chunks for back up to an LTO3 tape. Works great.

Wrapping Up

Wow! That’s a lot of info. I blasted it all out there in a super-duper quick fashion. Several of the stages of this process could easily merit long posts in and of themselves. I will definitely go back and delve into sections of this larger post in much more detail in the future. Leave comments below if you have any questions or wish for me to expand on a topic. Thanks for reading this far & I hope this helped at least get you thinking in the right direction about your own media management.

Resources

If you wish to learn more about the broad area of Digital Asset Management(DAM), I’d start with some books to get you into the right mind set & to start learning some of the lingo and arenas the field encompasses:

Focal Press offers two volumes which I find very informative and very in-depth. Given the nature of this subject matter they can also be very difficult to get into, but I recommend picking up at least one if not both to serve as a launchpad into the investigation and conceptualizing of your own DAM procedures.

“Digital Asset Management” – First EditionSecond Edition by David Austerberry

The book is highly recommended for both beginners and more advanced readers, alrhough perhaps not for engineers who are involved in digital asset management on a daily basis. The book is up-to-date, informative, relevant, comprehensive and accurate.-European Broadcasting Union, January 2007

“Implementing a Digital Asset Mangement System” – First Edition – by Jens Jacobsen, Tilman Schenker, Lisa Edwards

Learn how the top CG film, computer game and web development companies have saved significant time and money on their projects by optimizing a digital asset management systems and streamlining production processes. Success stories of Sony Pictures Imageworks, Lionhead and other big players illustrate the way of working in big companies. Success stories of several small but very agile companies show the reader how the techniques are applied when the budget is small.

Another book that I find incredibly useful, both for info for me and as a tool to help describe the process and workflow considerations to my colleagues (even the ones who don’t eat, sleep , and breathe this stuff) is written by my friends Robbie Carman and Jason Osder.

“Final Cut Pro Workflows” – Site

From my previous review:

It is a great resource for all things workflow-related, but it also has a great set of chapters that cover formats, codecs, and compression. It will be a great book to get non-technical people to understand the complicated world that is non-linear, digital media production. It is also going to become an important part of getting everyone I work with to really think hard about the workflows of our projects and how to streamline them.

Web resources are sparse, but they are out there:

Wikipedia’s entry

DAM Users – more of an event planning group, but there might be good networking and info at the events

DAM Article – gives good overview of topic

5 Replies to “Man Crushed Under Weight of 34TeraBytes”

  1. Great post, Ben! I’ll be rereading and digesting this one. Even with only two edit suites, dealing with asset management is easier said than done! There’s always that nagging fear, after a project has been finalized and archived, that you’re going to accidentally throw away some irreplaceable file that got saved in the wrong place in the heat of trying to hit a deadline, and didn’t make it on to the archive.

    Thanks for sharing this.

  2. Yeah, the loss of that one little, essential file is always an error I fear. The risk it will happen goes up exponentially as you add more suites and editors to the mix. Gridiron Flow will go a long way to help solve that. Also the new version of Automatic Duck Media Copy does the job of consolidating the media used for a given project to one location for archival. Probably will add that to the mix soon.

  3. Nice post. In the future I think we’ll just upload everything to Google. Then, thanks to automatic facial recognition and text to speech, we’ll just search for everything.

  4. Nice post Ben,

    It’s relieving to see that I’m not (your not) the only one who is dealing with Asset management and huge amount of data.
    At our studio we have three to four sets using one Shared Raid 5 of 6Tb, less data then your 32Gb, but maintaining it workable brings you to the same workflow.

    For syncing our data from the “working” RAID 5 to our Backup Raid 5 we use synchronize! x plus
    It does a sync everynight.

    When a project is finished we back it up to (custom) external hard drives
    drive A stays with us and drive B (containing the same data as drive A) goes to an external location. When material is needed we search it in the database of Diskcatalogmaker And is easy to access and retrieve when the drive is plugged in the system.

  5. Very useful and informative tutorial.
    Im currently looking at the workflow transition to final cut server, which we will be recieving shortly.
    Very interesting info about the LTO3-superloader. we have a regular LTO3 without the superloader: one tape at a time. I think it will work, with FCS automatic proxy file generation it will make the process even easier.

    At the moment for our LTO archive system we use “alias archive” to create a “carbon copy” of a folder but with alias files. We can see what project is where, but now what each individual media file contains.

Leave a Reply