The 5 things AI tools need for good workflows

27 Dec

I’ve been getting questions about my thoughts on AI in workflows, both in my consultancy as well as just by regularly talking to folks. Artificial intelligence (AI) is a big story going around this year of 2022, so many folks know about it, but are unsure what it means. To talk about AI you first have to make sure you are using the right definition for the right conversation. For example in game development, the job of AI programmer has existed for 30+ years, and if you have played a game then there is a good chance it used AI. In games, AI may be given to non-playable characters (NPCs). Those NPCs may be given a task of walking from point A to point B, to take cover when they are being attacked, to respond verbally with pre-recorded responses if a specific action happens nearby, and of course to react to what the player does. We call that AI. Lately though, the internet has been thinking about AI more in the sense of ChatGPT or DLSS (Deep Learning Super Sampling, by Nvidia). That’s also AI, depending on what you think AI means. So it is important to nail down what kind of AI you are talking about, and for this particular post I am focusing on AI for workflows in 3D content development. A lot of this will also be applicable to other content creation workflows, but again it’s good to specify due to the multiple different AI fields. So let’s go.

1. AI content should be editable

Some of the visual content generation AIs out there currently are able to generate a specific image, or images. If there is a particular issue with one of the images though, such as an ear or hand not rendering correctly, you’re out of luck. It will be able to regenerate an entirely new image, but often not an edited or ‘fixed’ image. You are unable to ask to change small specific things, such as adding an earring, removing one, etc. There is an AI that I have seen who can do this on a text basis, which is ChatGPT:

To be fair, getting ChatGPT to understand that prompt, and accurately edit its response, took about 6 tries.

If the content is only generated, and afterwards not editable, then it isn’t useful inside workflows. Generated content cannot be perfect every single time for every generation. There will be changes required, either immediately after generation, and sometimes days, weeks, or months after generation. The needs of a project change all the time, and this is a normal and organic part of any development process. A lot of that change comes out from user testing and user observing.

You may have seen a lot of videos lately of being able to ‘texture any scene in one click!’ and similar claims. That may be true, but does it also make for correct materials? Are the shaders expensive? Are the UVs clean? Can you edit all of those parts and easily import them into existing editors and engines? If not, then it’s just neat, but not useful except for very particular use cases.

Promethean AI is, I think, a great example of doing editable content well. It’s a technology that does many things inside 3D workflows, such as being able to detect what an object is, generating a scene with those objects, and then also letting the user as well as the technology edit that scene.

This is a very interesting technology which I think will become more mainstream over the years, exactly because it uses the existing workflows and tools, and fits inside them to allow more powerful usage of those tools.

Two more examples:

If an AI generates 2D content, and allows a download to .PSD or some other layered format: Are those layers organized in a clear and effective way for humans to work with and edit them? If not, now there's just more boring work to fix that.
If an AI generates 3D content, and allows for a download to .obj, .USD or some other format: Are the meshes, animations etc, organized in a clear and effective way for humans to work with and edit them? If not, now there's just more boring work to fix that.

If the content cannot be easily editable, then it adds more work instead of removing work. To do that, it needs to be generated in a way human beings can work with it, but it also has to work within existing tools. Which brings me to point two.

2. AI content should fit within existing pipelines and workflows

If an AI technology cannot easily integrate within an existing workflow, it’ll be nearly useless. The power of DLSS for example is that it just works for the end user. It will try and create more frames in-between frames depending on what is shown. The users does not have to do anything for it, or think about how to use it, they just turn it on in their options. For the developer side, Nvidia created integrations for Unity and Unreal, so adoption there is easier. It still takes some effort of course, it isn’t free, but there are tried and proofed ways of doing it.

For example, let’s say you have built an AI that can automatically generate 3D content. If that content is hard to get into Unreal and Unity, then the AI has already lost most of its value. Even if the 3D content is amazingly good, looks beautiful, gets generated fast, and is editable, if it cannot be used with existing systems and instead requires its own renderer, engine, or file format, then the barrier to entry will be too big. Not because the technology isn’t good, or neat. It may be super good, and really neat. But you cannot ask everyone to switch over to whatever system you have that works for your AI. Those changes cost a lot of time and money. Even for technologies that are being used in many places already and are successfully used for released projects, such as the USD file format, the adoption is slow. That is because the integration of USD into existing tools and workflows isn’t at a level yet where everyone can easily jump into it. Good AI needs to integrate into existing workflows. Of course it’s not just about Unity or Unreal, many studios have their own internal renderer or engine as well. If you have to head off into a separate app to get the AI workflow done, as well as to do any iteration, then many of the gains of using the tech will be lost already.

So even if an AI tool is amazing, it has to fit within existing bounds. A full reworking of everything sounds neat, but it’s like inventing a car that levitates over magnets inside the road. I bet it works perfectly great on your test track, and it looks awesome! How are you going to get every country in the world to change their roads to have magnets inside them though? You won’t. You have to think about those things before making those innovations. If your tools do not fit within existing bounds, that means you would also have to make a full on engine, editor, and tools for all workflows all by yourself. Will you do that?

AI sometimes feels like this really powerful ball of energy, that is very hard to 'focus' in the right direction for what you need, where you need it. If the AI can do something way faster and better than a human being, that’s cool. If the AI can do something that was previously entirely impossible to do by humans, then that is amazing! If that thing then cannot easily be used and iterated on for actual production workflows though, it doesn’t matter how neat it is. Because it turns out it’s just a tech demo. We have all seen a lot of tech demos over the years, and they are really neat, but they have to actually resolve a problem within daily use cases for users, which brings me to the third point.

3. AI should fully solve a problem

Every year I see motion tracking hardware technology being shown at conferences and events. It’s getting more accurate, but at this point it’s becoming hard to tell the difference between them, and their previous year’s showcase. It’s cool to see. The mocap actors do dances, throw around swords, do backflips, and it all looks super accurate on the screen next to them. Cool. But as long as I hear of animators having to spend days and weeks to ‘clean up’ mocap data, we haven’t fully gained on this workflow. It just becomes easier to record base mocap faster, but that just creates even more content that has to be cleaned up. In the end, are animators doing more menial and boring work, instead of less? If technology, AI or otherwise, were to solve that problem of cleanup, then that would be neat. When you allow smart folks to do more interesting and qualitative work, that’s progress.

It’s a bit like seeing ‘fusion energy has been invented!’ in the news every few years. I am guessing they are making big strides, and I am really happy that they are doing so. Until it is directly connected into existing power infrastructure though, it does not matter for the end user. For scientists it may be a neat bit of progress. And that’s the key here: Does it fundamentally and actually fully solve a problem for the end user? If not, then it’s a cool tech demo. It’s cool. Keep on making those. But it’s not an end result that folks can work with. AI, and other general automation, often has that too. Powerful automation isn't a big help if the basic user experience still doesn’t feel good.

For example, let’s talk about a real problem that many folks deal with in many projects: Can AI solve a translucency issue for me? Let’s say you're in a 3D editor. You have a glass door, a window, and a semi-translucent cape on a character on screen at one time. The amount of depth fighting is ridiculous, and it looks awful. A classic problem! What if you can highlight that area of depth fighting. You ask the AI 'How do I solve this?' and right inside the editor it either solves it for you, or immediately shows you different ways to resolve it. That is an interesting future! AI in the middle of a workflow, that solves a direct problem many folks deal with. Like with hotspot texturing, which removes really annoying and monotonous work like UVing a mesh to have correct trimming. That is useful automation! AI that can generate concept art is neat, but it’s not like that is this huge daily workflow issue or a huge cost to dampen. That the concept art AI is on the road to better AI is neat, but it doesn’t mean we’re good to go and AI is now a part of tooling. Regular daily workflow issues need to be addressed for AI to be adopted.

This does not just go for content generation either. Here is another, much more basic and mundane real world problem: Why am I still getting out-of-office replies every single time I reply to an e-mail thread, when I have obviously already seen, opened, and closed the previous 5 out-of-office replies from the same person? It may seem silly, but it is in resolving these kinds real daily issues that can get people on board with automation and AI within daily uses and workflows.

Another example: What about senior and principle artists, really smart folks, having to spend hours and hours UVing their assets? A boring a monotonous task that is absolutely necessary to do? I’ve seen the ministry of flat go forward to resolve this, and that looks pretty neat! Resolving a real issue, and trying to put that into an editable shape that is useable with existing pipelines!

The question there of course is whether that even counts as AI. Is a sufficiently powerful algorithm automatically counted as AI? Is texture hotspotting AI? Is a fence that automatically adjusts to height and length changes AI? Or do we need to count those as generative algorithms? For example here is a video about SpeedTree:

Is this more or less neat than the other 3D content generation AI stuff you've see lately on social media? Were you impressed by that video and think that tree and world generation is right on the cusp of being able to generate entire AAA games? That video was uploaded to SpeedTree’s youtube channel in 2009. That is 13 years ago as of today. SpeedTree was originally released in 2002. 20 years ago. A very, very long time, especially in the world of tech. Does knowing it was that long ago change your mind on the timeline of AI? Would you count that as AI? Does knowing when that video was uploaded make you less impressed, or more impressed? Ask yourself: Why?

Sure, it’s a procedural algorithm, that has certain values it needs exposed to work with. With an AI, those values may be learned through a network or millions of existing data points. Is that difference in data point scale the difference between AI and procedural content? Or is it that you can be surprised by the results of the generated content? You can be just as surprised and delighted by a procedural algorithm with few options as one with millions of options. Are sufficiently advanced generative shaders AI?

Automation is great and all, but if AI software can't understand basic pattern recognition, then we still have a long way to go. It's the same thing for so many software tools: You're in an editor, and you select a specific object, and change a specific value on it. You do the exact same thing to an other one. Then another one. Then another one. Shouldn't the system ask 'Hey, do you want to change that specific value on all those specific objects? Or select them all at once to do it in bulk? I can do that for you. Because this is getting awfully repetitive for a computer that has a couple billion transistors.'

The very first goal of that AI should be to understand the intention of the user. And give them ways to solve real problems. Another real world example: What about when you are in a group call, and the person talking forgets to unmute? Could an AI automatically unmute them at the right times, and not the wrong times? Now that is useful in daily workflows! Another one: How many times have you had to upload your CV to some automated system, and then right afterwards still had to fill in all your information?! Those are real and actual issues millions of people deal with every day, where the intention is very clear, yet the machine with billions of transistors still does it wrong. So what’s all that power used for then? If you make a very powerful AI tool, it doesn’t mean that through sheer power it is useful or good.

There are also similar issues with terrain generation. Being able to generate a terrain is neat, but can it also allow for caves? The answer is usually a very quick ‘No.’. Does the world feel empty as it’s just very big, and does not contain content that is worthwhile to be in? Does it not provide an easy enough editing workflow to change and iterate that content later on? Having a lot of power to generate stuff is neat, but if the basic use cases for end users are not taken into account, then what comes out of the content generation pipeline only resolves half a problem. And in the end the other problems are then usually resolved with an excessive amount of manual work, and worse: Crunching through weekends. Like with the mocap example above. It’s important to understand that sometimes with more automation, it actually creates more work, not less. Garry, from Garry’s mod, also wrote a neat blogpost about procedural generation for Rust on this website, here.

Procedural generation is neat. Sometimes it can resolve half of the issue, and that sounds better than nothing. When the limitations are eventually reached though, those issues have to then be manually fixed. Powerful technology in and of itself isn’t always helpful. It’s technologically neat, but brings more trouble than it solves. This is why tech demos are always best viewed as neat magic tricks. None of them are real until you see them integrated into an existing workflow that allows human beings to do more high quality work that can ship in projects. Not more content, but better content.

That is what I like about Nanite’s intention for Unreal Engine: It truly wants to take away work from those making projects, and it is intended to ‘just work’ right away. There’s no need to write your code in a different way, or to make content in a different way. It should ‘just’ work. You mark a checkbox on import, and it should work. You want to change the content later, after import? Sure, just mark that same checkbox afterwards.

So, what do developers have to deal with every single day? Performance budgets. For Nanite the idea is to take that issue out of their hands, but not force those same users to make all their content in a completely new way. To let them use their current digital content creation (DCC) tools, like Maya, 3dsMax, Blender, etc, and that content should ‘just’ work without having to edit it first into a particular file format. Of course there are still some limitations, like assets have to be static for example, or there being no translucency, but being able to pull in most assets made over the last few decades is a wide margin for usability. Brian Karis from Epic Games gave a great talk about getting to Nanite, and this is the key slide for me: Understand your users.

Solve the issues your users are dealing with. And solve them without needing those users to adjust what they do well already. If your particular AI cannot do that, then it’s a neat tech demo. Neat, and on the road to potentially a better and more powerful tool, but going from neat tech demo to actual real world implementation of solving a real user problem is a huge step that can take a very long time. It can take years.

4. AI should to be honest about being human

The wrong answers by ChatGPT remind me of what I call the ‘newspaper problem’, but it’s more official name is the ‘Gell-Mann Amnesia effect’. It works like this: You read the paper, and think you’re learning a lot. Suddenly you see an article related to your specific expertise. It’s completely wrong about the topic, it’s missing a lot of detail, and it misses obvious counter arguments. You know enough about this topic, and because of that you know this article is wrong. Anyway, on to the next article. Hey that next one is pretty neat, you didn’t know about that.

There is this assumption of accuracy that we all easily fall into. I love how QI mentioned this. It’s a British quiz show, and in one of the later episodes they remove the points that contestants won from old episodes by bringing up that the scientific method makes old information obsolete or wrong sometimes. The answers that they gave were correct then, but are wrong now. It happens! That’s normal, and that’s progress. But, the old QI episodes are still out there, without corrections. That wrong information still gets spread out to folks. With ChatGPT it will be similar.

How do you know a piece of information is accurate without being an expert at the topic? Will you actually read the sources, if those even get mentioned? Will you read a scientific paper with complex jargon you don’t know the meaning of? How much of the information given to you on daily basis is wrong? Even things you learned in highschool can be out of date. Were you taught the tongue map back in the day, with the different taste buds around different areas of the tongue? Turns out that’s debunked. Or is it? I’m not an expert on tongue sciene! I can only believe what I believe to be true! People in my highscool class were using pippets to drop different flavours around their tongue and thus believed, through their own actions and senses, that the information about the tongue map was true. And even it still wasn’t actually true! They fooled even themselves, as we all often do. And if someone spoke up that it didn’t work for them, maybe you were told it’s their problem, as it did work for others.

All information is constantly changing. Experts argue at conferences about what is true. Seasoned experts with decades of experience cannot agree on certain topics, so how could you? And because of that, how can AI? You may as well be told some bit of information is correct as that it’s wrong depending on which expert and source you ask. So of course ChatGPT will be wrong too, sometimes. And it will probably stay that way too because it uses human information. Which changes, experts can’t agree on, and is constantly iterated. AI is fed by human information, and humans are fallible. Garbage in, garbage out. Gold nuggets in, gold nuggets out. Human information in, human information out.

So if you’re making some way for AI to generate content, make sure the user understands it is fallible. That it can be wrong. That not everything it generates is true or correct. That it can only do as much as the data it was trained on. Especially cultural information is very easy to get wrong. Like what if you asked an AI to generate a Dutch newspaper article, vs it generating a Belgian newspaper article? Would they both be in the Dutch or Flemish language? Or would the Belgian one be in German? Or French? How would it decide, and how would the human being ask it to generate that content even know that Belgium has 3 official languages? The AI needs to explain this to the user, or at least explain that it is easily fallible.

5. AI should have a good user experience and interface

Take boring tasks or unknown tasks out of folks’ hands. Make that content editable and changeable. Solve a real problem. Those are important. Those are all things that come down to the final and most important item: It does not matter how technologically impressive an AI is, if the user experience is bad to the point of being annoying to use.

How many procedural building generation demos have you seen since 2005? I’m talking about the ones where they can drag the confines of the building around, and it automatically adjusts the amount of windows, doors, roof tiles, etc? It was neat the first few times, but it’s getting a bit stale to see those demos again and again. Until that kind of tech is integrated into a system with a good user experience, and is useful for the general building of projects, the impressiveness of it will cease. It’s neat for the person who has built it, and it’s nice to see them share their own personal accomplishments, but a lot of the people who view those demos then immediately think such a tool is ready to use within a real project situation. It is not. I think we should all praise the personal work that went into it, and cheer on the learning experience, but realize that it is truly impressive when the user experience and adaptability of such a tech demo is shown to be good.

For example, it would be impressive when: You can easily make any shape building, including triangular and hexagonal shaped bases. When you can do this by simply clicking on the ground to make a shape. When you can easily swap out to different window styles, roof styles, and create your own assets within simple metrics so that they always fit. When you can easily integrate this into your project, and change that content later. When you do not have ask your users to read the python code, or do some weird install method with a command line terminal or .ini files to get it going. When it has a good user experience. Otherwise it is still this extremely powerful ball of energy, that can really only be targeted in a specific direction. If you don’t need that direction, then it will be useless to you.

Here is another great example, covered in a Twitter thread: https://twitter.com/AntonioCasilli/status/1565821558114979842?t=PZYjD9tZ6COIHXfrUfNcLQ&s=19. It is about an AI food scanner, installed to save time, but in the end the users have to adjust how they place their food, and someone has to take time to make sure all the scans are actually correct. So the problem is only partially solved, leaving everyone around it with more work, and the user experience is degraded instead of improved.

I am sure every technology could potentially be fully used by only a command line terminal input mechanism, but we invented user interfaces for good reasons, and they have allowed technology to flourish outside of the walled garden of programmers and other technical folks. Fantastic technology without a good user experience isn’t worth much to the wide audience of humanity. If you are able to generate a city, but then later are unable to adjust the city size or streets, adjust its navmesh, change components of the city, add trees, change materials and textures at scale, etc, then it’s just a neat tech demo. It’s not a real world use case. While a neat tech demo can give a description of the future, and can help us get to that end goal, they cannot be themselves that future without a good user experience and user interface. Again, that speedtree demo video is from 2009. Neat procedural technology has existed for a long while. The technologies that stick around to the general public, that dictate the future and get used for real projects, are the easily useable ones. Sometimes that even means the less technologically advanced ones are those that stick around, just because of better user experiences. Neat tech alone won’t save you.

That’s just a top 5

There are more issues I haven’t touched on here, like prompting having to become a skill. You start having to optimize all these tools for the various words thrown into them, as if you’re doing search engine optimization (SEO). And SEO is already causing trouble all over the internet, because when a metric becomes a goal, it ceases to be a good metric. That ouroboros will keep eating itself until it dies. Maybe it already has. Like folks adding ‘reddit’ to any google search, to make sure they get a result that is useful. That problem extends even more so with AI, because it is supposed to learn from every interaction. So you will end up with everyone having to say or type ‘artstation award winning best beautiful’ before every visual prompt just to make sure they get something that looks good. Words will lose their meaning.

And there are the copyright issues. How much has the AI learned, and copied, from copyrighted content? I think this is still going to become a very tough topic in the future, and it may explode into a legal battle that will force everyone to redo or close projects entirely. It may even shut down existing games that have used such content, or cause big payouts for use of content that was generated from datasets that used copyrighted content.

So will AI become a big thing in the future? Of course, depending on what you think counts as AI. Will automation become a bigger and bigger thing in the future? Absolutely, as we have already seen over the last 200 years. So an automated automation, called AI, will always be the future. Though I think the only ones that will really do well, and stand the test of time, are those who achieve the above 5 points.

Thank you for reading! If you enjoyed this post, you can find my talks here, and my other posts here. You can also subscribe below to get a notification when I release a new post.

Robin-Yann Storm