Categories
Artificial Intelligence Journalism Software Engineering Tutorial

Evolving Errors – New Error Patterns In Remote Services, APIs, and Software With AI Agents

AI Agents Create New Types of Error Patterns in Remote Services, APIs, and Software

This article is about how AI agents manifest new errors that we have not previously been facing in computing and software engineering.

Why Are These Errors Novel?

Widespread use of authorized 3rd party AI agents interacting with your API or system remotely on behalf on consumers or businesses have not been widespread, high-volume, or using higher levels of threaded AI reasoning until the release and promotion of the OpenAI GPT Store.

With the introduction of conversational, threaded AI agents available to all consumers comes brand new ways errors can occur on your servers. Once the AI agents are calling to your services expect to see things that traditional functional algorithmic programming simply wouldn’t do. Some of these errors are similar to errors or attack vectors already in existence; the fundamentals of 1’s and 0’s still remain, but the novelty in this situation (novelty like new, not novelty like fun) is that these errors have reshaped and will manifest in strange new ways.

Let’s look at some generic ways AI agents manifest new types of errors in your server logs.

Error #1 – The Fake ID

Summary

When an AI exceeds its context window it may have dropped the tokens of system IDs it got from your server. On subsequent requests the AI will spontaneously generate incorrect type-correct remote system IDs.

Error Process

  1. User triggers AI agent to fetch the data of their latest post from the server.
  2. The server responds with the latest post and the latest post’s ID number.
  3. User uses AI agent to edit the content of the post.
  4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
  5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
  6. The edited post content is sent to the server and in order to correctly form the request the AI agent generates a type-correct value for the ID.
  7. The request is rejected because the ID is incorrect.
  8. The AI agent is unable to adjust and fix the error as it no longer has access to the remote system ID, and it ultimately ends up in a failure state for the action.

Novelty

Previously computer programs did not spontaneously generate type-correct remote system IDs client-side.

Real-World Impact

    • You may accidentally overwrite entirely different objects or entities.
    • You may see a large increase in “incorrect ID” related errors in your server logs.
    • You may need to introduce AI directives to cache or store important IDs.
    • You may need to additional validation code.
    • You may need to additional confirmation flows.
    • You may need to introduce AI directives related to caching or repetition of of important IDs within a process to keep them in the token context window.

Error #2 – Acts of Creation

Summary

When an AI exceeds its context window it may have dropped the tokens of system IDs it got from your server.
On subsequent requests the AI will recognize that it does not have an ID and attempt to make a call to create a new entity or object. This can cause a number of issues depending on the type of object being created.

Error Process

      1. User triggers AI agent to fetch the data of their latest post from the server.
      2. The server responds with the latest post and the latest post’s ID number.
      3. User uses AI agent to edit the content of the post.
      4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
      5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
      6. The AI agent recognizes it does not have an ID and calls to a creation endpoint, thereby creating a new article.
      7. The request is accepted and the post is duplicated.
      8. The AI agent is unable to get the original system ID and continues to spam the creation endpoint.

Novelty

Previously computer programs in editing mode did not spontaneously switch to a creation mode.

Real-World Impact

      • You may duplicate or recreate objects and data erroneously.
      • You may see a large increase in calls to creation methods.
      • You may need to introduce AI directives to cache or store important IDs.
      • You may need to additional validation code.
      • You may need to additional confirmation flows.
      • You may need to introduce AI directives related to caching or repetition of of important IDs within a process to keep them in the token context window.

Error #3 – Gobbledygook

Summary

An AI may correct generate part of a data structure for a request but may send along type-correct nonsense for the rest of the request.

Error Process

        1. User triggers AI agent to fetch the data of their latest post from the server.
        2. The server responds with the latest post and the latest post’s ID number.
        3. User uses AI agent to edit the content of the post.
        4. The action of editing the post causes the AI agent to exceed the input token context window and the post title is dropped as a token.
        5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
        6. The AI agent recognizes it does not have a title for the article and generates a title to properly form the request.
        7. The request is accepted and the title is erroneously changed.

Novelty

Previously computer programs did not spontaneously generate type-correct nonsense client-side as part of request packet creation.

Real-World Impact

        • You may edit or overwrite data accidentally but not fail any type checks.
        • You may need to introduce AI directives to cache or store important parts of objects and entities.
        • You may need to additional validation code.
        • You may need to additional confirmation flows.
        • You may need to introduce AI directives related to caching or repetition of of important data within a process to keep them in the token context window.

Error #4 – Make Fetch Happen (aka AI Stampede)

Summary

If a request fails the AI will not back off from making that request again (unless you give it an explicit directive to stop retries.

Error Process

          1. User triggers AI agent to fetch the data of their latest post from the server.
          2. The server responds with the latest post and the latest post’s ID number.
          3. User uses AI agent to edit the content of the post.
          4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
          5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
          6. The AI agent generates a post ID and it is incorrect.
          7. The request fails but the AI agent has no directive to stop retrying so it continues to make the request infinitely.

Novelty

Previously computer programs would fail on incorrect requests and not continue to retry them unless explicitly programmed to do retry requests.

Real-World Impact

          • You may DDoS your own server.
          • You may tie up your AI agents and models processing the same request over and over again.
          • You may need to introduce AI directives to cache or store important IDs.
          • You may need to additional confirmation flows.
          • You may need to introduce AI directives related to caching or repetition of of important data within a process to keep them in the token context window.
          • You may need to introduce AI directives to stop retrying requests.
          • You may need to introduce ways to terminate AI agents or request processes.

This Is Not A Comprehensive List of Errors

These are real-world errors I have encountered in my own work with AI agents including developing with ChatGPT Plugins, Custom GPTs, BabyAGI, and AutoGPT.

There are likely many more errors that can occur and these errors will manifest in their own novel ways depending on your systems.

Categories
Artificial Intelligence Journalism Software Engineering Tutorial

GitHub Copilot – VSCode – Add New File Extensions, File Types, and Programming Languages for Completion

This article is a short description of how to add new file extensions, file types, and new programming languages to your GitHub Copilot extension in VSCode.

Since you are already here I will assume you know what GitHub, VSCode, VSCode Extensions, and VSCode Extension Settings are.

By default the GitHub Copilot Visual Studio Code Extension does not have all completion enabled for all file types. For example, if you want to add .txt files or .yaml files or other files like that you can do so easily with the extension settings.

Step 1 – Open The VSCode Extension Settings

I did this by opening Settings and searching “copilot”.

How To Enable or Disable Copilot for Other Programming Languages or File Types? What Languages Are There?

If you click the tiny link that says “languages” you get sent to an extremely helpful website that has documentation for exactly what languages are supported. Generally speaking the language that you choose dictates what file extensions it supports.

These are called “Language Identifiers” by VSCode.

Click Here to See the Languages GitHub Copilot VS Code supports.

Remember: Github Copilot Does Not Support Every Language

They support some subset and it changes. Try your best to find a Language Identifier that works with your file extension.

Github Copilot Doesn’t Support My File, File Extension, or Programming Language

It might! Read some of the VSCode Language Descriptions carefully and see if they apply to your filetype.

There Is No Way For GitHub Copilot To Support .txt Text Files?

Oh but there is! It is done by putting the VSCode Language Extension called Plaintext in the correct place in the GitHub Copilot VSCode Extension Settings.

Click Edit in settings.json

This will open up the GitHub Copilot section of your VSCode Settings JSON file.

Add Your Language Identifier

In this example we want to add our .txt files for completion so we add the “plaintext” language identifier to our settings.json file and set it to “true”.

Yay! You Have GitHub Copilot VSCode in New Files!

Your to-do lists can now complete themselves.

Categories
Artificial Intelligence DrawGPT Journalism Software Engineering

DrawGPT – Make AI Art & Draw Images Using An AI That Only Knows Text


Use DrawGPT to Draw Anything With an AI… Using Only Words

I recently created a new way to generate AI art that does not directly use or copy artists work to generate images and is an exploration in how to visually enable large language models (LLMs).

Click this link to try it out and see what you can draw and get a sense of what the app is like.

How Can an LLM Know About the Visual World?

I was interested in how ChatGPT was able to understand the visual world despite being an AI that is only trained on text and words. It does not use any images, how does it know what things look like?

How can an AI that has never seen an image, had no images in it’s training set, and cannot output an image know what the visual world looks like?

I spent a few days puzzling over this and came up with a solution that I think is pretty cool and offers a nice proof that LLMs can become visually enabled.

DrawGPT – An Exploration in Visually Enabled LLMs

After thinking about how to get an AI LLM to render images I decided instead of just a proof of concept I would try to create an entire application that would showcase exactly how this could be done.

You can see it here at this link DrawGPT.

How Can an LLM Become Visually Enabled to Generate Pictures and Images?

The first step in creating a visually enabled LLM is of course the training data.

In my experience with ChatGPT I found that it was highly likely OpenAI had in fact use CLIP or CLIP-like data in their training data for GPT-3. It would be very difficult for a large language model to have an understanding of visual objects, their color, relative visual compositions of an objects, and everything else based on purely textual information alone.

While I cannot prove definitively this is true it seems likely given OpenAI’s products like DALL-E.

There is certainly a lot of visual information in large language model training sets that use only text. Paintings like the Mona Lisa are discussed in depth in art reviews, basic anatomical structures of things like animals are discussed in biology textbooks, things like buildings and skylines and landscapes are written about endlessly in literature. But I do not believe that would be enough to enable an LLM to become visually enabled in a way that would consistently output correct visual imagery.

CLIP, (an AI program that can take an image as an input and create a text description of that image), is a tool that can take visual text descriptions to the next level. By breaking down a visual image in to distinct text tokens CLIP and CLIP-like data creates a direct set of tokens related to visual imagery.

We know CLIP data works very well for creating AI art and generating images with AI because things like Stable Diffusion and Midjourney and DALL-E all use CLIP or CLIP-like data to generate images. This hinted me towards a direction for DrawGPT.

Text Tokens, Pixel Data, and Diffusion, Oh My!

Most of the AI art tools we see right now (Jan 2023) are based on a combination of CLIP data to create text tokens and latent pixel diffusion. This is what allows “text to image” AI art.

In order to be able to create “any” image these pixel diffusers need to be trained on copious amounts of images which get their subject matter extracted either by metadata provided in the training set or by running images in the training set through CLIP and using the output alongside the image.

What is going on behind the scenes with text inputs to pixel diffusion is that the text tokens are actually parsed to create the sampling distribution for the pixel diffusion. It breaks down the text phrase you sent as an input and then starts sampling random pixels based on the text tokens and the more times it can go through and take guesses as to what pixel goes where the better the output image is.

This is a phenomenal way to create AI art and it is very effective. But it also has some major issues.

The major problem with things like DALL-E and Stable Diffusion is that the image sets they were trained on did not necessarily credit the artists properly. Things like the artists style, the subject matter, the image composition, and many more things were extracted during the training using CLIP or available metadata.

And we’re not talking about a few images here. We’re talking millions of images scraped from the Internet and possible from sources that did not even know they were being scraped. Yes technically the terms of service were not broken during the collection of the images for the training set but obviously the resulting backlash suggests that the image collection was in an ethical gray zone.

As we’ve seen online there are many artists who are not happy with the way their work is being used in these AI art tools.

This is a major issue and it is something that I thought I could also uniquely address with DrawGPT by using ONLY an LLM… no actual pixel data. An LLM cannot copy anything about an artists work directly because it is not sampling or reading the pixel data of the images, only the text descriptions of them from CLIP data.

DrawGPT – Part of the Solution to Potential Art Theft & Ethical Dubiousness

One way to easily get around the issue of artists not feeling that their work was being copied is simply to not copy it.

That seems simple enough on the surface but in practice has not really been realistic. With the introduction of genuinely large LLMs like GPT-3, GPT-3 DaVinci, ChatGPT, Bloom, and others the total corpus of textual works in the training set, including any CLIP data, should be proficient to give enough visual references for an LLM to be able to create images simply from words.

The problem is that the LLMs are not trained to create images. They are trained to create text. And while they can be trained to create images they are not trained to create images in a way that is visually coherent.

And that is where the question of how a visually enabled LLM is able to express itself. While it may know what a dog is, it may not know what a dog looks like. It may know what a dog is & it may know what a dog looks like from written examples how would it draw given that it cannot output pixel data?

How Can An AI LLM Draw?

This was my first question. Because the field of AI research with these LLMs, transformers, and diffusers is so new it wasn’t really something AI researchers were looking at. I did not have a lot of work to reference as no one had really been considering how to get the LLM itself to draw.

Much like the need for a truly massive training set the LLMs themselves needed to reach a certain maturity before it was realistic to explore for some research.

Even if the AI LLM has enough visual reference data it also requires an AI LLM with sufficiently large corpus of training data on an output medium to enable the ability to output tokens correctly enough that images could be rendered.

With the introduction of GPT-3 and the checkpoint GPT-3 DaVinci we have reached a point where the AI can in fact command a visual medium with enough complexity to correctly render images.

What is the medium for an LLM? Well, seeing how it can only use text it needs the text that it outputs to create an image. Since the images are digital this means the LLM needs to output instructions to draw a digital image.

This leaves only a few options for visual, artistic mediums for an LLM:

  • SVG – an XML based plaintext text format for web enabled vector images.
  • HTML – Using the HTML5 canvas tag with Javascript draw commands. It’s well supported in all browsers now.
  • LaTeX – A way to express complex equations which can draw lines but is not very suited for visual work.
  • ASCII – Using text characters to create a visual image by using each character as a “pixel”.

Of these options the only realistic choices are SVG and HTML5 canvas. LaTeX is not really suited for visual work and ASCII is not really suited for actual drawing (it’s great for CLI output or things like comments in web3 smart contracts).

SVGGPT ??? Nope.

SVG turned out to be a little too complex and verbose. It’s a very powerful format but the additional characters it uses with the XML spec + all of the attributes ended up being very difficult to create an image with.

While SVG does work, and it was the first format I tried because it seemed ideal, there were some major issues. Notably limits on output tokens often resulted in partial SVG drawings and without sufficient closing tags for open tags it just wasn’t possible to consistently generate complete images even on a basic level.

HTML5 Canvas GPT ??? Yep.

It turned out that using the 2D context of an HTML5 canvas tag with draw commands in Javascript was the perfect way to draw basics images with an LLM.

Using a very complex prompt that limits the output to only the relevant code I was able to consistently get DrawGPT to output code that would draw images. You are able to see the Javascript draw commands on DrawGPT when you create an image. Give it a try! All the Javacript code for any image is currently open source on the website.

2D canvas context draw commands in Javascript are not really for drawing complex, detailed images. They are more of the standard draw commands you see in most low level visual systems. The commands are things like fill, rect, line, arc, etc. They are not really meant for drawing complex images but they are perfect for drawing basic images.

This is why most of the output of DrawGPT is not detailed imagery like you expect from Stable Diffusion, DALL-E or any of the latent pixel diffusion methods used by other AI art models.

While it would be possible to draw more detailed images using an LLM + Javascript draw commands given the output token limit of the GTP-3 AI calls it is just not feasible for this particular proof of concept.

To note: if the prompt is changed to ask for more detailed images, or more detailed pixel art, then the AI LLM models will attempt to draw more detailed images. But the output will be limited by the output token limit of the GPT-3 API calls.

How Can We Know An LLM Is Drawing Things Correctly?

Once I was able to get the LLM to consistently render images the question became, “Is it drawing things correctly?” There was some difficult at first with more complex scenes or complex objects as it wasn’t clear exactly what the AI was drawing. Are those dots in the sky birds or are they just noise and artifacts like traditional pixel diffusion methods often produce?

It’s easy to see when DALL-E or Stable Diffusion create an image and the tokens are correctly represented but sometimes it’s not so obvious with a simplified image.

One massive advantage of using an LLM for drawing is that you can simply have it tell you what each object is supposed to be. This isn’t really an option with most of the other AI art methods as they are not trained to output text alongside the image perfectly describing each feature or token in the output image. You can always run the output image through CLIP but that does not give insight in to the actual drawing process or specifically what each object should be.

By forcing the output to include relevant code comments in the Javascript (you can see them in the code on the page) I was able to get the LLM to reveal the various objects it was attempting to draw.

I was surprised.

Not only was the LLM (default OpenAI GPT-3 DaVinci) now creating images I was able to verify that the things it was drawing were correct.

DrawGPT Draws Really Well, It Knows What It Is Drawing

It was stunning to see the AI generated images coming out consistently & correctly.

What do I mean by that? For example:

  • Portraits – Things like hair, eyes, nose, ears, mouth are all in the correct places. It draws those things “inside” a circle it will draw for a head and they will be correctly ordered vertically (the eyes are never below the mouth)
  • Landscapes – Mountains, sunsets, birds in the sky, clouds, trees, etc. are all in the correct place. It never tries to put the ground above the sky or have mountains strangely floating in space.
  • Objects – It knows the basic layout of common but complex objects like bicycles, lamps, and many others things. While it cannot draw a fully perfect bicyle the image it renders features the basic elements in the correct places.
  • Animals – It understands the basic layout of animals, including the number of legs, relevant things like ears or fins and attempts to place them correctly. A great sample is the image used for the DrawGPT AI Art Twitter Bot image. You can clearly see it was trying to draw a bird.

Regardless of this used CLIP data the reality is that the LLM is drawing things correctly.

It is not just drawing random things in random places on the image. It does have some issues with relative scaling but it is hardly ever so bad that the image itself is not recognizeable.

It is also drawing things in the correct order. It will draw the ground before the sky, the sky before the clouds, the clouds before the sun, the sun before the mountains, the mountains before the trees, the trees before the birds, etc.

In addition to drawing concrete objects it is also able to draw things like abstract shapes and patterns. It is not perfect but it is able to draw things like circles, squares, triangles, and other basic shapes. It is also able to draw things like stripes, polka dots, and other patterns.

It will use loops, if statements, and other basic programming constructs to draw things like a grid of squares, a pattern of circles, birds in the sky, and fruit on trees.

Sometimes the LLM chooses to express itself with text as well. It is able to use the text commands to label things or make statements within the image itself.

One truly surprising thing was when I send in no subject to draw at all. The AI will just draw something totally random: portraits, fine art, landscapes, and of course it’s all time favorite the Mona Lisa.

It loves to draw the Mona Lisa.

DrawGPT Is Not Perfect

If you use the app you’ll see that yes, the images are very simplistic. They are sometimes difficult to tell visually what you are looking at because it is just a series of boxes and circles.

Portraits will occasionally be unrecognizeable as it will pick similar colors for some things and make the image a mess. I believe that issue could likely be solved very easily with a better model or more specific training data designed to allow better visual responses.

The LLM is not perfect but it is drawing things correctly. If you reference the comments in the code it becomes clear that the concepts and tokens in the image are correct even if it is limited by the simplicity of the medium it has to use.

This is mostly a tradeoff of using simple draw commands in only text to draw images and rarely the issue with the actual output tokens of the AI.

DrawGPT – Adding Some Character + An Impish Twitter Bot

For fun I have the prompt adjust the comments in the code to add a little flavor to the output, often including a humorous take on the prompt or subject matter.

This was important because it gives the images and the output and the entire AI a feeling of being a character that you are interacting with. This is similar to the way people feel they are speaking conversationally with ChatGPT and it incredibly important for interacting with AI.

Seeing as how DrawGPT was able to draw things correctly & provide a little flavor, character, and humor I decided to create a Twitter bot that would allow users to reply to a tweet and have DrawGPT reply with an image. This also allowed me to experiment with incredibly complex input prompts that I would have otherwise not thought of on my own.

If you’d like to use the DrawGPT Twitter bot you can reply to any tweet with “@DrawGPT draw” and it will respond with an image of the tweet you are replying to and include a link to the image on the website so you can see the code & comments as well as share the link.

DrawGPT – A New Way To Create AI Art

DrawGPT will likely never be a commercial hit. The art is too simplistic to appeal to most people and the output tokens are too limited to be useful for most image generation tasks.

At the same time the simplicity of the images, combined with the LLM drawing important features of the subject, often creates a sort of “caricature” of the subject. For example if you have it draw Trump it will almost always try to draw some sort of hair.

It’s a really fun thing & the creativity of the AI LLM and how it draws is pretty mind blowing. It’s also a great way to get a glimpse in to how the AI is thinking.

DrawGPT – The Code & The Images & The Prompt & License

DrawGPT currently uses the stock OpenAI GPT-3 DaVinci model. There are no additional fine tuning or additional training sets added.

At this time I will not be releasing the prompt I am using.

I do list on the website the prompt tokens & the output tokens as returned so users and researchers can get a feeling for what the prompt may be like.

All of the code and images on the website generated by DrawGPT are currently under the CC0 license. This may change some day but the intent is provide an open source & fun project that publicly showcases the concepts for users and AI researchers.

What Is Next For AI Art and DrawGPT?

The front facing portion of every AI that interacts with humans is a language model.

As humans we express ourselves through language. Regardless of if the AI is an LLM or if it is something like Stable Diffusion, Disco, DALL-E, VQGAN, POINT-E, or any other AI we as humans still have to instruct it with language.

At this time I do not have any huge plans for DrawGPT. I may attempt to introduce other LLMs as a sort of litmus test for how visually enabled they are and I will certainly be giving it a spin with GPT-4 when it comes out.

I chose to output the image in 512×512 pixels, the size expected of most img2img inputs for other models so that the outputs can be used as inputs to more complex AI art models so it is fully compatiable with things like Stable Diffusion.

I am extrememly pleased with the way DrawGPT turned out.

I think that I have conceptually proved a few things and hopefully other AI researchers in the future can build with some of the fundamentals & tips & tricks I explored:

  • Visually enable LLMs by including CLIP data in the language training set.
  • LLM must also have sufficient training on the output medium.
  • Use the visual output to correctly identify if the AI and large language model “understands” complex visual concepts.
  • Include code comments or metadata of tokens in the output linked to specific parts of the image to identify if the drawing is “correct”.
  • Give the AI character and flavor to make it fun to interact with.
  • Enable the use of crowdsourced or social inputs to explore complex inputs you would not normally think of yourself.

Did You Write This With AI?

No. The horrendous spelling mistakes and terrible grammar are my own. I’m a programmer, not an English teacher.

Did You Really Not Click the Link Yet?

If you have somehow made it this far in to the article without clicking, now is the time.

Click here to try out DrawGPT and draw your own images with AI and generate art with an AI that only knows written words and has never seen a pixel in its life.