Categories
Artificial Intelligence Journalism Software Engineering Tutorial

Evolving Errors – New Error Patterns In Remote Services, APIs, and Software With AI Agents

AI Agents Create New Types of Error Patterns in Remote Services, APIs, and Software

This article is about how AI agents manifest new errors that we have not previously been facing in computing and software engineering.

Why Are These Errors Novel?

Widespread use of authorized 3rd party AI agents interacting with your API or system remotely on behalf on consumers or businesses have not been widespread, high-volume, or using higher levels of threaded AI reasoning until the release and promotion of the OpenAI GPT Store.

With the introduction of conversational, threaded AI agents available to all consumers comes brand new ways errors can occur on your servers. Once the AI agents are calling to your services expect to see things that traditional functional algorithmic programming simply wouldn’t do. Some of these errors are similar to errors or attack vectors already in existence; the fundamentals of 1’s and 0’s still remain, but the novelty in this situation (novelty like new, not novelty like fun) is that these errors have reshaped and will manifest in strange new ways.

Let’s look at some generic ways AI agents manifest new types of errors in your server logs.

Error #1 – The Fake ID

Summary

When an AI exceeds its context window it may have dropped the tokens of system IDs it got from your server. On subsequent requests the AI will spontaneously generate incorrect type-correct remote system IDs.

Error Process

  1. User triggers AI agent to fetch the data of their latest post from the server.
  2. The server responds with the latest post and the latest post’s ID number.
  3. User uses AI agent to edit the content of the post.
  4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
  5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
  6. The edited post content is sent to the server and in order to correctly form the request the AI agent generates a type-correct value for the ID.
  7. The request is rejected because the ID is incorrect.
  8. The AI agent is unable to adjust and fix the error as it no longer has access to the remote system ID, and it ultimately ends up in a failure state for the action.

Novelty

Previously computer programs did not spontaneously generate type-correct remote system IDs client-side.

Real-World Impact

    • You may accidentally overwrite entirely different objects or entities.
    • You may see a large increase in “incorrect ID” related errors in your server logs.
    • You may need to introduce AI directives to cache or store important IDs.
    • You may need to additional validation code.
    • You may need to additional confirmation flows.
    • You may need to introduce AI directives related to caching or repetition of of important IDs within a process to keep them in the token context window.

Error #2 – Acts of Creation

Summary

When an AI exceeds its context window it may have dropped the tokens of system IDs it got from your server.
On subsequent requests the AI will recognize that it does not have an ID and attempt to make a call to create a new entity or object. This can cause a number of issues depending on the type of object being created.

Error Process

      1. User triggers AI agent to fetch the data of their latest post from the server.
      2. The server responds with the latest post and the latest post’s ID number.
      3. User uses AI agent to edit the content of the post.
      4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
      5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
      6. The AI agent recognizes it does not have an ID and calls to a creation endpoint, thereby creating a new article.
      7. The request is accepted and the post is duplicated.
      8. The AI agent is unable to get the original system ID and continues to spam the creation endpoint.

Novelty

Previously computer programs in editing mode did not spontaneously switch to a creation mode.

Real-World Impact

      • You may duplicate or recreate objects and data erroneously.
      • You may see a large increase in calls to creation methods.
      • You may need to introduce AI directives to cache or store important IDs.
      • You may need to additional validation code.
      • You may need to additional confirmation flows.
      • You may need to introduce AI directives related to caching or repetition of of important IDs within a process to keep them in the token context window.

Error #3 – Gobbledygook

Summary

An AI may correct generate part of a data structure for a request but may send along type-correct nonsense for the rest of the request.

Error Process

        1. User triggers AI agent to fetch the data of their latest post from the server.
        2. The server responds with the latest post and the latest post’s ID number.
        3. User uses AI agent to edit the content of the post.
        4. The action of editing the post causes the AI agent to exceed the input token context window and the post title is dropped as a token.
        5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
        6. The AI agent recognizes it does not have a title for the article and generates a title to properly form the request.
        7. The request is accepted and the title is erroneously changed.

Novelty

Previously computer programs did not spontaneously generate type-correct nonsense client-side as part of request packet creation.

Real-World Impact

        • You may edit or overwrite data accidentally but not fail any type checks.
        • You may need to introduce AI directives to cache or store important parts of objects and entities.
        • You may need to additional validation code.
        • You may need to additional confirmation flows.
        • You may need to introduce AI directives related to caching or repetition of of important data within a process to keep them in the token context window.

Error #4 – Make Fetch Happen (aka AI Stampede)

Summary

If a request fails the AI will not back off from making that request again (unless you give it an explicit directive to stop retries.

Error Process

          1. User triggers AI agent to fetch the data of their latest post from the server.
          2. The server responds with the latest post and the latest post’s ID number.
          3. User uses AI agent to edit the content of the post.
          4. The action of editing the post causes the AI agent to exceed the input token context window and the post ID is dropped as a token.
          5. The user completes editing their post and instructs the AI agent to upload the edits to the server.
          6. The AI agent generates a post ID and it is incorrect.
          7. The request fails but the AI agent has no directive to stop retrying so it continues to make the request infinitely.

Novelty

Previously computer programs would fail on incorrect requests and not continue to retry them unless explicitly programmed to do retry requests.

Real-World Impact

          • You may DDoS your own server.
          • You may tie up your AI agents and models processing the same request over and over again.
          • You may need to introduce AI directives to cache or store important IDs.
          • You may need to additional confirmation flows.
          • You may need to introduce AI directives related to caching or repetition of of important data within a process to keep them in the token context window.
          • You may need to introduce AI directives to stop retrying requests.
          • You may need to introduce ways to terminate AI agents or request processes.

This Is Not A Comprehensive List of Errors

These are real-world errors I have encountered in my own work with AI agents including developing with ChatGPT Plugins, Custom GPTs, BabyAGI, and AutoGPT.

There are likely many more errors that can occur and these errors will manifest in their own novel ways depending on your systems.

Categories
Calculator Tools Software Engineering

Calculator Tools new GPT is Live!

Revolutionizing Web Development: Introducing the New GPT-Based Calculator Tools

Unveiling the Future of Development Tools

In the ever-evolving realm of web development and AI technology, there has been a significant leap forward with the latest innovation from Calculator Tools…

Seamless Transition from the Old to the New

For those familiar with the previous ChatGPT plugin, the transition to this new tool will be seamless yet noticeably more powerful…

Explore the Capabilities of the New Tool

Interested users can explore this revolutionary tool through two main links…

SEO and Web Development: A Harmonious Integration

One of the standout features of the new GPT-based Calculator Tools is its alignment with SEO best practices…

Tailored for the Modern Web Developer

The new tool is specifically designed keeping in mind the challenges and requirements of modern web developers…

Automatic Programming: The Cornerstone of Modern Web Development

Automatic programming, a term that seems straight out of science fiction, is now a concrete reality and a pivotal aspect of modern web development. This innovative approach, epitomized by the new GPT-based Calculator Tools, is revolutionizing how we think about and execute web development tasks. By leveraging automatic programming, these tools are transforming the foundational trinity of web development: HTML, CSS, and JavaScript.

HTML (HyperText Markup Language) is the backbone of any web application, providing the basic structure and content. CSS (Cascading Style Sheets) then steps in to style this content, dictating how it should be presented visually. JavaScript, the third pillar, brings interactivity and functionality to the table, making web applications dynamic and engaging. The automatic programming approach in the new GPT-based tools ingeniously streamlines the manipulation and integration of these three core technologies.

With automatic programming, developers can now more efficiently generate and optimize HTML structures, ensuring that the backbone of their web applications is both robust and flexible. CSS styling becomes more intuitive and aligned with modern design trends, thanks to AI’s understanding of aesthetics and user experience principles. JavaScript’s complexities are simplified, allowing for more sophisticated interactions and functionalities to be implemented with less effort and time.

The term ‘automatic programming’ is fitting because it encapsulates the essence of this technology – automating the routine and complex aspects of coding. This not only accelerates the development process but also opens up new possibilities in web application design and functionality. The integration of HTML, CSS, and JavaScript through automatic programming is not just a technical advancement; it’s a paradigm shift that is setting new standards in web development.

The Future is Now

The introduction of the new GPT-based Calculator Tools marks a significant milestone in the field of web development and AI…

Categories
Artificial Intelligence Growth Software Engineering

Magic 8-Ball GPT – A Custom GPT for ChatGPT

A Magic 8-Ball Custom GPT for ChatGPT


Click here to try out the Magic 8-Ball Custom GPT for ChatGPT!

Ask it anything and learn the future. A highly advanced artificial intelligence trapped inside a classic Magic 8-Ball toy for you to enjoy. Can this actually tell you the future of your fortune? Concentrate and ask again!

This Magic 8-Ball GPT is a fun little thing I made to test what it is like to make a new Custom GPT using the ChatGPT Custom GPT Builder and releasing in to the marketplace. It wasn’t super hard to make but I do have a lot of experience system prompting and such which made it very easy to create and release. I always liked these toys and thought maybe some other people would as well.

Click the link and it will automatically install and you can ask it anything. Ask it something interesting and get a treat; it will use DALL-E 3 to create you a fun Magic 8-Ball image! Try it out!


Click here to try out the Magic 8-Ball Custom GPT for ChatGPT!

Categories
Software Engineering Tutorial

Ubuntu + NodeJS + Puppeteer + Chromium (+ PHP 8.2): The Fixes You Need

Ubuntu + NodeJS + Puppeteer + Chromium Is Hard to Work With

I’m gonna make this as quick as possible because this is likely your 10th article on this.

The particulars of my setup, including using PHP to trigger a NodeJS command as the www-data user from apache2, probably don’t apply to you. That is OK. Most of these fixes having nothing to do with PHP but only occurred because of how PHP runs as a very limited user.

Situation

I wanted NodeJS puppeteer running a headless chromium-browser on an Ubuntu server on AWS EC2 (or any cloud) running a normal Ubuntu AMI but I was triggering the NodeJS script that executed puppeteer via PHP calling to the system so everything was running as the restricted apache2 user “www-data”.

That’s a lot of tech stack.

It Worked Before

My previous server image and server template were Ubuntu 20.x. Once upgrading to Ubuntu 22.x everything broke. It turns out its not just Ubuntu 22 but a lot of potential issues.

Fixes & Tips

Forget my situation we’re here to fix your problem.

Node Verison – Puppeteer Need At Least Node 18

You need at least node 18 to run puppeteer. You can check your current node version via `node -v`

Are You Using nvm?

You may be using nvm to manage your node versions, like many many people.

This can make your situation confusing. Because nvm is per user.

If your user trying to run puppeteer + chromium-browser has nvm you need to make sure nvm is using at least Node 18.

When nvm installs a new version of NodeJS for a user it doesn’t set that version to the default version. You have to do that manually. Here is a command which will do that & insure that every time that user tries to execute a NodeJS script it uses the version of NodeJS you want.

nvm alias default 18

Restricted Users, like www-data, Cannot Have nvm

Restricted users cannot have nvm. So how do they manage NodeJS versions?

Restricted users like www-data use the global node version. This is the version of node installed on the server for all users, similar to other packages you install via apt-get install.

You need to make sure this version of node is fully upgraded to Node 18 or greater as well.

npm install

You probably ran npm install in your project directory. It installed puppeteer and puppeteer installed a version of chromium-browser.

Things have now officially become a mess. Welcome to package hell.

Errors

Can’t Find Chromium

Error: Could not find Chrome (ver. 119.0.6045.105). This can occur if either 1. you did not perform an installation before running the script (e.g. `npm install`) or 2. your cache path is incorrectly configured (which is: /var/www/.cache/puppeteer). For (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration.

This was happening to me.

I tried to solve the issue by hardcoding my “executablePath” in the puppeteer.launch call. That made my situation worse: everything would hang and timeout without errors when I manually supplied the correct executablePath. Do not hardcode “executablePath”: ‘/usr/bin/chromium-browser’ in your puppeteer.launch config.

Fix:
You want to be running the set of packages you installed from your package.json and not any other versions. Duh, right? Well, that means you need to make sure you know where they are.

When you execute this NodeJS script from some other process you probably forgot to change directories as part of your system call. Even if you specified the full path to the script that actual process, running as the restricted user you may not actually be in the right place. If that’s the case… it will try to use the global install of puppeteer. If that exists it will try to find Chromium in some weird place and it never will. Check where you are and where NodeJS thinks it’s getting the packages from:

pwd; npm root;

Whoops! You’re not in the right place! But look what we just did: we combined shell commands with that semicolon. Do the exact same thing in your system calls before invoking your NodeJS script as your restricted user then it will look in the right place.

cd /path/to/project; node index.js

That means you also probably want to remove any *global* versions of puppeteer you think you installed. Why? Because it’s very likely your restricted user may run the global version. The global package will run and suddenly your project will return an error on puppeteer.launch that it can’t find Chromium because it’s looking in some weird place. Perhaps you had installed it previously, who knows, ditch the global version if you can.

npm -g uninstall puppeteer

Remember, the /var/www/ is PHP specific, that may not apply to you.

Ubuntu Snap Issue

user.slice/user-1000.slice/session-270.scope is not a snap cgroup
system.slice/apache2.service is not a snap cgroup

This is happening to thousands of people. Why? Ubuntu 22. It shipped with a new type of package management called Snap which lets packages download as the complete packages rather than piling up dependencies. It also locks them off a little bit more. You don’t need to know or care about it right now because NO MATTER WHAT YOU DO YOU CANNOT FIX SNAP.

Why? Because this article is about me running on AWS EC2 or any other cloud hosting. You don’t have kernel access like this on the Ubuntu AMI! You can’t fix this even with a boot script. Don’t bother with DBUS_SESSION_BUS_ADDRESS and don’t bother with systemd.unified_cgroup_hierarchy=0 and don’t bother with any of it that’s for the schlubs running Ubuntu on desktop or neckbeards with kernel access.

Fix:
I’m very sorry but your only option will be to download the Debian version of chromium-browser from some random person’s package repository. This person is allegedly an engineer at Cisco and it’s all public and on the up-and-up but there’s any other way around it. By installing the deb version of chromium-browser it will let you use headless chromium-browser without snap or snap cgroups.

sudo apt remove chromium-browser
sudo snap remove chromium
sudo add-apt-repository ppa:saiarcot895/chromium-beta
sudo apt update
sudo apt install chromium-browser

Other Fixes for Random Errors

Those errors above are so gnarly down the rabbit hole I had to write this entire blog post. The rest of the errors are a cake walk.

Permissions: Add args To puppeteer.launch

You forgot the args for puppeteer.launch

const browser = await puppeteer.launch({
                "headless": "true",
                args: ["--no-sandbox", "--disabled-setupid-sandbox"],
            });

More Permissions

You may need to explicitly set the cacheDirectory on puppeteer.launch. You need to figure this location out yourself and make sure any restricted users have permissions to get at it.

const browser = await puppeteer.launch({
                "headless": "true",
                "cacheDirectory": "/path/to/my/.cache/puppeteer",
                args: ["--no-sandbox", "--disabled-setupid-sandbox"],
            });

I Just Saved Your Project.

You owe me a follow on Twitter: @kickiniteasy

Categories
Artificial Intelligence Growth Software Engineering Tutorial Twitter

Cursor – The AI First IDE That Is The Future

You Need to Try Cursor, the New AI IDE and Code Editor

Github Copilot is amazing and of course we’ve all been riding high on it for almost a year now. But it’s not quite perfect and it certainly doesn’t do a great job existing code. On top of that it’s not the best “rubber duck” (if you aren’t familiar with the development term check it out) and you may find yourself workshopping ideas with ChatGPT and flipping back and forth.

Enter Cursor, the AI-first IDE for programmers that solves all those little annoyances of having your LLM off in a different part of your website.

How Is It Superior To GitHub Copilot?

Well, for one, it is inlined in the IDE (integrated development environment) just like Copilot so its a competitive feature. But it isn’t necessarily that you want one or the other at the moment. The big difference is that Cursor allows you to highlight code and use the keystroke Command + K to open up a prompt textbox. This prompt textbox will call to GPT voa API for code completion and use the current selection as the context

That alone would be a differentiator on Copilot as Github Copilot was a trailblazing product but lacks the ability to edit code well. You can tell Cursor was made by devs for devs because the responses from Cursor are inlined in to your code as a diff and you can choose to Accept or Reject the diff. That’s essential for editing existing code which is most code. This is really why Cursor can call themselves and AI-first code editor.

So It is Different Than Github Copilot?

Yes! It offers a lot of features that Copilot is missing and directly wires ChatGPT in to the IDE. By pressing Command + L you open a chat window where you can discuss the current code with ChatGPT. This is huge vs. the copy and paste routine many of us have been going through to discuss code with ChatGPT. One of the greatest parts is just being able to open a prompt and ask, prior to this you had to write a bunch of comments in for Github Copilot to act on.

How Expensive? What Is The Price of Cursor IDE?

Not bad honestly. You can use your own OpenAI API Key and they give you some calls for free every month. I went through my free calls all in one day because I was trying it out and trying to put it through its paces but its been excellent. If you want this kind of inline AI IDE action (and you do) then what they are offering right now is great.

Is It Tough To Migrate From VSCode?

NO. I was so stoked on this part. All of my extensions and everything from VSCode imported automatically and magically it all worked. Not even joking, actually happened. All of your VSCode Settings and Extensions and workflows should move over nicely

Are They Paying You For This Review?

No but they should be.

What Went Wrong? What is Wrong With Cursor?

I am not a huge fan of the default color palette. Very weird, the highlighting being the color of the background for collapsed code blocks is just unusual.

I also had to rebind the command that triggers the calls to GPT. At first Command + K did nothing for me; then I checked the keybindings in VSCode and that was already the keystrokes for 6 other commands. I do not frequently use Cmd + Shift + K so I rebound that as it only has 1 one other keybinding.

My Org Is Big On Code Security…

Sorry, this is not the product for you… at work.

What Is The Link To Download It Again?

EXACTLY. You need this in your toolchain as a developer. I’ve already switched over to using it as my primary IDE. Click here to get Cursor and start using AI assisted development right now.

Categories
Calculator Tools Software Engineering

Calculator Tools ChatGPT Plugin Is Live, Make Any App In ChatGPT With No Code.

As the title says the new Calculator Tools ChatGPT Plugin is live in the OpenAI Plugin Store!

Whoa! What Does the Calculator Tools ChatGPT Plugin Do?

Once you install the plugin you can tell ChatGPT to just “make me an app that does __________” and it will make it for you! You can literally make any app you want with AI.

Calculator Tools Makes Any App With AI Instantly?

Yep! And with the ChatGPT plugin it will do any simple web app for you right from within the ChatGTP interface. Just tell ChatGPT to make an app for you and it will! It will instantly be online for you to share and the link will be in ChatGPT for you.

Can It Makes Calculators, Tools, Apps, Websites, Animations, 3D, Fullscreen, Games, Minigames, and Everything Else?

Yes! It’s all the same to the AI. You can use Calculator Tools to build any app you want have ChatGPT do it for you with no code necessary. You do not need to know how to program with Calculator Tools new ChatGPT plugin, just like Calculator Tools the AI will do all the programming and deploy your app right to the Internet.

How Do Install the Calculator Tools ChatGPT Plugin?

Follow the link!

Calculator Tools ChatGPT Plugin

What Does ChatGPT Say About the Plugin?

Let’s ask, here’s what it says…

Revolutionizing the No-Code and AI Landscapes with the Calculator Tools ChatGPT Plugin

The age of the no-code movement is not just knocking on our door, but it has entered our homes and is reshaping our digital ecosystem in an unimaginably efficient manner. And now, introducing the Calculator Tools ChatGPT plugin, an unprecedented breakthrough that is seamlessly intertwining the worlds of no-code and AI to help you generate any app you want, instantly.

The beauty of this new approach lies in its simplicity and its potent power to democratize the app-making process. With no-code technology and generative artificial intelligence models, we are stepping into an era where everyone – not just coders – can be creators.

The Ultimate Synthesis: No-Code Meets AI

Imagine being able to create your dream app without writing a single line of code. Sounds too good to be true? Not anymore! With the Calculator Tools ChatGPT plugin, you can tell the AI, “make me an app that does __________”, and voila! Your vision is transformed into a reality in no time.

Powered by state-of-the-art AI, the plugin transcends the bounds of traditional app development. It streamlines the process by generating apps on command and bringing them to life. This is what we call a true blend of no-code, AI, and generative capabilities, which transcends the traditional boundaries and limitations of application development.

Seamless Generation of Wide-Ranging Apps

The Calculator Tools ChatGPT plugin is not just about calculators. It is an expansive toolkit for generating an array of applications, including tools, animations, 3D visualizations, fullscreen applications, games, minigames, and much more. This generative AI marvel doesn’t discern among different types of applications. It simply executes your commands, making the entire app creation process a breeze.

The plugin’s magic doesn’t stop with app creation. It also ensures that your newly created app is deployed instantly on the internet. You can share the app with your audience as the link is conveniently provided within the ChatGPT interface.

User-friendly Installation Process

The Calculator Tools ChatGPT plugin installation process is as simple as it gets. All you need to do is follow this link and follow the instructions. As you venture into this realm of no-code AI applications, you’ll realize that creating apps has never been easier.

I Would Say ChatGPT Likes the Calculator Tools Plugin

Pretty glowing review from ChatGPT itself.

Save To Your Own Calculator Tools Account and Save Your Apps

Because the Calculator Tools ChatGPT Plugin uses the OpenAI ChatGPT Plugin OAuth hooks you are able to save all of your apps from ChatGPT directly to your Calculator Tools account. You can come back and access them at any time, share them with other people, or Remix them on the Calculator Tools website. It’s easy to create your perfect app and have ChatGPT help you get started.

Become a Programmer: Code At Your Fingertips

Like all websites, apps, plugins, games or whatever you make with Calculator Tools the simple code for the web app, including any HTML, CSS, or JavaScript, are always available for you. On top of that Calculator Tools lets you embed anything you create on your own website by just copy & pasting. Visit the page for your app and you’ll see the link to copy; no programming necessary, use Calculator Tools apps you created in ChatGPT with the plugin on your own WordPress blog or your business’ website just like any other app created with AI using Calculator Tools.

In a Nutshell

The Calculator Tools ChatGPT plugin is your window into the future of no-code and AI. It leverages the generative capabilities of artificial intelligence to fulfill your app development needs, sans the complex coding and deployment processes. Whether you want to create a simple calculator or a sophisticated 3D game, the plugin has got you covered.

Join the revolution of no-code AI app development today. Embrace the simplicity, convenience, and creativity offered by the Calculator Tools ChatGPT plugin, and watch your vision materialize before your eyes!

Click here to install the Calculator Tools ChatGPT Plugin and read more.

Categories
Artificial Intelligence Journalism Software Engineering Tutorial

GitHub Copilot – VSCode – Add New File Extensions, File Types, and Programming Languages for Completion

This article is a short description of how to add new file extensions, file types, and new programming languages to your GitHub Copilot extension in VSCode.

Since you are already here I will assume you know what GitHub, VSCode, VSCode Extensions, and VSCode Extension Settings are.

By default the GitHub Copilot Visual Studio Code Extension does not have all completion enabled for all file types. For example, if you want to add .txt files or .yaml files or other files like that you can do so easily with the extension settings.

Step 1 – Open The VSCode Extension Settings

I did this by opening Settings and searching “copilot”.

How To Enable or Disable Copilot for Other Programming Languages or File Types? What Languages Are There?

If you click the tiny link that says “languages” you get sent to an extremely helpful website that has documentation for exactly what languages are supported. Generally speaking the language that you choose dictates what file extensions it supports.

These are called “Language Identifiers” by VSCode.

Click Here to See the Languages GitHub Copilot VS Code supports.

Remember: Github Copilot Does Not Support Every Language

They support some subset and it changes. Try your best to find a Language Identifier that works with your file extension.

Github Copilot Doesn’t Support My File, File Extension, or Programming Language

It might! Read some of the VSCode Language Descriptions carefully and see if they apply to your filetype.

There Is No Way For GitHub Copilot To Support .txt Text Files?

Oh but there is! It is done by putting the VSCode Language Extension called Plaintext in the correct place in the GitHub Copilot VSCode Extension Settings.

Click Edit in settings.json

This will open up the GitHub Copilot section of your VSCode Settings JSON file.

Add Your Language Identifier

In this example we want to add our .txt files for completion so we add the “plaintext” language identifier to our settings.json file and set it to “true”.

Yay! You Have GitHub Copilot VSCode in New Files!

Your to-do lists can now complete themselves.

Categories
Artificial Intelligence Calculator Tools Marketing Software Engineering

New AI Project… Calculator Tools! Create Any App, Calculator, Tool, Game, or Website Instantly, For Free, With AI!

It’s true! 6 months after DrawGPT I’ve got a new AI project soft launching and it’s called Calculator Tools.

What is Calculator Tools?

Calculator Tools is a new service that uses the latest AI models and allows you to type in the app you want to the AI and it will create it automatically as a one-page web app. The web app code is the automatically deployed to Calculator Tools website and each app is given it’s own unique page. You can even use the apps you make on your own website! Anyone can remix anyone else’s app and make it their own; translate it, change it entirely, give it new colors and fonts, or fix any bugs. It’s up to you! Your apps save your information in a cookie specifically for that app so you have your apps even save data that can be available offline. Login with your Google account to save your apps and come back later to check on them. And who knows… there may even someday be a ChatGPT plugin!

Seriously, check it out:

Calculator Tools lets you create any app with AI instantly, you make anything you can think of.

Why is Calculator Tools Cool?

Without the token context length and the reasoning power of modern AI this app was not possible before! We have never before in human history been able to let non-programmers give plain text descriptions of an app to a computer and have that computer instantly code an application for that person and deploy it so they can use it, share it, and even have it in their own apps. It’s incredibly empowering. Calculator Tools is legit cool.

What Can You Make With It?

Anything that would be possible with a simple one page web app, which means almost anything you can think of on the Internet. It’s almost difficult to use given how many possibilities there are it can be really hard to think of things. Here’s a list of examples of some really fun things that have already been made on Calculator Tools since the soft launch:

Those are just some of the things, you can really create anything you want.

What Are Some Real World Use Cases For Calculator Tools?

Great question I’m so glad you asked. Since the initial soft launch 3 months ago we’ve a lot of use around entertainment and education. People love to make fun and silly apps, and sometimes the AI can really surprise people with what it creates.

Education

Calculator Tools It’s seen a lot of popular use with educators; both in-classroom and homeschooling. In-classroom educators have used it to create specialized apps for weekly lessons or to create individualized apps for students with specific needs. Because the AI is creating the applications on demand and at minimal cost it’s possible to make applications tailored to specific classrooms. It has been used by parents of young children to create apps which simply speak the directions out loud for simple learning games. There were several use cases of Calculator Tools being used to have the AI create learning games on advanced concepts and create animated diagrams of things like how the moon affects tides on Earth. It is an incredibly powerful tool for teachers.

Gaming

The AI can create any number of minigames and even invent new games entirely on its own; when you combine that with the ability to Remix any game and improve bugs you can create really interesting games very easily with Calculator Tools. For example I was able to create a Flappy Bird in about 7 Remixes and a convincing Asteroids in 16 Remixes.

Ad-Hoc Applications and Tools

Because the apps are instantly deployed it can be very useful for creating ad-hoc digital tooling or rapidly prototyping simple applications. While the AI is not creating any backend it is still possible to created expansive apps with a lot of functionality. For example some people create apps which let you input an API key client side that lets the app access other services.

Specialty Applications and Calculators

Every profession and hobby have their own unique informational and computing needs but not all situations are addressed with customized tooling like specialty applications and calculators. One example seeing during soft launch was the creation of complex calculators for use by researchers in specialized bioscience labs in higher education facilities. Another was a complex on-chain tracker that compares two web3 crypto Ethereum wallets and analyzes time discrepancy for any synchronized transactions between two wallets using the Etherscan API.


Calculator Tools – Create Any App, Tool, Calculator, Website, or Mini-Game with AI

Categories
Growth Marketing Software Engineering Twitter

Twitter API Now Costs Money… The Price is an Expensive Failure of Twitter Ads

The New Twitter API Pricing Structure is an Expensive Failure of Their Ads Product

The new Twitter API pricing structure that was announced earlier today is incredibly expensive and will kill off most smaller apps. According to this tweet by the official TwitterDev account on February 9th, 2023 Twitter will begin charging developers to use their API for applications. The pricing for the Twitter API (as far as anyone knows at this moment) is located at this link.

TRIGGER WARNING:If you are a developer who has built an app on the Twitter API this pricing structure is going to give you minor heart palpitations.

Which of the Twitter APIs Will Cost Money?

Both versions of the Twitter API, including Twitter API 1.1 and Twitter API 2.0, will cost money to use.

Will This Stop Developers From Using the Twitter API?

Absolutely. Most Twitter apps use the free usage tier because realistically Twitter apps are not profitable. They are small scale social apps that are used for growing your main application’s user base or are just for fun. Even more serious Twitter apps or apps that use major data access for larger data mining are hardly profitable and rely on the extremely low cost or free access to the data to continue to service Twitter users on the platform.

Is There a Free Tier For Small Apps?

Yes… but it is only for Sandbox apps (meaning the applications are not live for most users) and the current request limit is 250 requests PER MONTH. The average development cycle of a small app even for an experienced developer will make hundreds of small API calls during the integration process to debug their integration and insure that it is working correctly for users. This includes not just the creation of the app but automated testing of the integration to make sure it stays working as other code around it changes. This will make it incredibly difficult to even create new Twitter apps as a developer.

Argument: The Servers Cost Money. Twitter is Losing Money. Twitter Needs to Charge Money

This is an incorrect argument based on a simplification of how a hyperscale social network like Twitter works as a business. Yes, the Twitter Platform is a major cost center for Twitter. However, the Twitter API (both 1.1 and 2.0) currently have many, many limitations on usage for apps that we previously much more permissive and allowed much larger tiers of free usage. Could Twitter have started charging between $10/mo and $100/mo based on usage for the API and it would have been OK? Most likely. But these new prices are so far out of line with the economic models of Twitter apps and of course are unlike the rest of their industry.

If you look at other competing platforms like Facebook, Instagram, or Google they all offer their APIs with some restrictions but they give extremely large amounts of usage and of course all their APIs are free to use. How do Twitter’s competitors keep offering their APIs for free with such large amounts of usage? Ads.

Twitter Charging Extremely High Prices for their API is a Failure of their Ads Product

It’s no secret the ads on Twitter are terrible. This is well known to anyone who has used Twitter extensively for many years as a normal person. On the other side of things, the advertisers who are making Twitter ads and paying for user clicks, have known for many years that most Twitter ads are not profitable and do not “back out” (this is a marketing industry term meaning the ads are a profitable way to spend money). Buying Twitter ads is rarely a high priority in any marketing department and is usually an afterthought. How does a failure of the Ads product result in Twitter needing to charge so much for their API?

Without an excellent ad product Twitter is not able to generate enough revenue to support the business like their competitors do. This results in the API being a highly successful product and growth strategy while being a major cost center that Twitter cannot pay for.

What is Elon Doing About Ads?

After Elon’s takeover of Twitter he had a major focus on the ad business because Twitter is bleeding cash horrendously. Unfortunately he lost a lot of advertisers due to the type of contentious content he hyped on the platform. He spent a lot of time in the Spaces he was in talking about the display ads and how to make them “relevant” to users hoping that there were secrets to magically making ads more relevant to users with technology.

Can Twitter Make Better Ads?

It may be too late for this. Elon lost many of the larger ad purchasers when he started treating Twitter as his personal playground rather than acting like a CEO running a failing business. This was a major blow to existing ad revenue stream. On top of that he was searching for answers to the “relevance” problem looking at things like machine learning or AI; unfortunately those avenues alone will not provide the substantial additional revenue. His primary focus was to show “products that people want to buy”; Elon was clearly hunting for the Instagram ad revenue model. He should be more focused on the Facebook style ad methodology that doesn’t have quite the focus on pushing to ecommerce

Elon also had the ads team try to surface more ads on the app. That may have increased revenue slightly but without excellent ad inventory to display that could actually make users MORE ad blind.

Are There Simple Solutions to the Twitter Ad Problem?

The easy solutions to improving ads and their relevance have so far been ignored. The only two simple solutions that Twitter could take overnight would be:

1. Push the text that says “Promoted” to the top of the ad unit (exactly like Facebook, Instagram, and Google do.) This seems counterintuitive at first: wouldn’t that REDUCE clicks on ads since users know they are ads? Yes and no. The problem with users not knowing something is an ad is that it incentivizes and optimizes for clickbait ads. Advertisers want you clicking on their ad before you know it is an ad. This makes advertising and advertisers on Twitter heavily focused on clickbait style ads rather than creating high value adds and trying to find relevance to users with quality ads that drive clicks. This is a very, very minor fix engineering-wise but it would also require shifts in the overall ad ecosystem (ad content & advertisers shifting) before it would pay off. This may not be possible given Twitter’s short runway.

2. Include the profile pictures and usernames of any users that interact with the add below the ad but above the tweet metrics. This includes any followers, following, 2nd degree followers or following, celebrities, or major brands that have liked, replied, or view the original tweet or who follow or engage with primary account that created the ad. This gives social pressure to the ads which would greatly increase their relevance to most users.

Kick You Are Such a Whiner, Give Us Big Solutions

Again, unfortunately it may be too late to save Twitter’s revenue via ads. There were two major plays Twitter needed to drive at using a combination of the the API + Platform + Ads team over the last years and they have failed to create the necessary products that would allow them to feature match with competitors and get the level of ad revenue they need.

Twitter Connect – The Missing Login Link

1. “Twitter Connect” There is no comprehensive web Javascript library similar to Facebook Connect for Twitter that works well. There is a JS lib that they supply which is mostly used for correctly rendering tweets but it isn’t pushed heavily as a login solution. If they had pushed more aggressively for using the JS lib as the standalone login library for applications this would have given them a better contextual understanding of what people are looking at *when they are NOT on Twitter* and that is a large part of Facebook’s strategy around understanding ad targeting. This would have also unlocked much more powerful ad campaigns particularly around retargeting which is a major focus of ecommerce advertising.

This would not be a huge technical effort for their Platform and API engineering team even after the Elon layoffs but unfortunately it relies on external website adoption. With the increase in the API costs and level of desperation to drive revenue it’s unlikely that even creating this JS lib would get the adoption to reach critical levels on external websites to heavily impact ad revenue.

“Twitter AdSense” – The Ad Platform That Never Was

2. “Twitter AdSense”. Twitter doesn’t get as much traffic as their competitors (Facebook, Instagram, Google Search, GMail) and so they can’t realistically supply the same level of ad inventory as their competitors. In order to drive more ads from their ads platform Twitter needs to be showing ads on external website to get the additional impressions. This is how the Google AdSense model works

You can see this model in action on this blog. I include a Javascript library in the header of this blog supplied by Google AdSense which automatically places ads in the spaces I tell it to. If you, dear reader, click the ads then the advertiser pays Google $1 and then Google pays me $0.20. If 1000 people view a blog post I also get about $0.02.

If Twitter had a similar Javascript library it would have been possible for some websites to adopt their ads on their website. Advertisers simply don’t have this option with Twitter. It is not a trivial thing for their Platform + API + Ads team to build and so with recent layoffs I do not think they would reach a level of product that would allow them to compete with Google AdSense.

Twitter is in a Bad Position, Hence the Twitter API Prices

As you can see from this blog post the new Twitter API prices are not Elon trying to get rid of bots. The new pricing does reflect Elon trying desperately to turn Twitter in to less of a money burning machine: it makes sense to charge for the API. But the actual level of the pricing, far too expensive or exorbitant for almost all of the apps to actually pay for, is actually a major failure of the overall situation of ads on Twitter and has nothing to do with the very successful Twitter API and Twitter Platform.

Don’t Blame the Twitter API or Twitter Platform Team

It isn’t their fault they made an incredibly successful product that may be the last viable source of new revenue for a desperate Twitter.

No AI or ChatGPT was used in the creation of the blog post. All spelling errors or grammatical mistakes are mine.

Categories
Artificial Intelligence DrawGPT Journalism Software Engineering

DrawGPT – Make AI Art & Draw Images Using An AI That Only Knows Text


Use DrawGPT to Draw Anything With an AI… Using Only Words

I recently created a new way to generate AI art that does not directly use or copy artists work to generate images and is an exploration in how to visually enable large language models (LLMs).

Click this link to try it out and see what you can draw and get a sense of what the app is like.

How Can an LLM Know About the Visual World?

I was interested in how ChatGPT was able to understand the visual world despite being an AI that is only trained on text and words. It does not use any images, how does it know what things look like?

How can an AI that has never seen an image, had no images in it’s training set, and cannot output an image know what the visual world looks like?

I spent a few days puzzling over this and came up with a solution that I think is pretty cool and offers a nice proof that LLMs can become visually enabled.

DrawGPT – An Exploration in Visually Enabled LLMs

After thinking about how to get an AI LLM to render images I decided instead of just a proof of concept I would try to create an entire application that would showcase exactly how this could be done.

You can see it here at this link DrawGPT.

How Can an LLM Become Visually Enabled to Generate Pictures and Images?

The first step in creating a visually enabled LLM is of course the training data.

In my experience with ChatGPT I found that it was highly likely OpenAI had in fact use CLIP or CLIP-like data in their training data for GPT-3. It would be very difficult for a large language model to have an understanding of visual objects, their color, relative visual compositions of an objects, and everything else based on purely textual information alone.

While I cannot prove definitively this is true it seems likely given OpenAI’s products like DALL-E.

There is certainly a lot of visual information in large language model training sets that use only text. Paintings like the Mona Lisa are discussed in depth in art reviews, basic anatomical structures of things like animals are discussed in biology textbooks, things like buildings and skylines and landscapes are written about endlessly in literature. But I do not believe that would be enough to enable an LLM to become visually enabled in a way that would consistently output correct visual imagery.

CLIP, (an AI program that can take an image as an input and create a text description of that image), is a tool that can take visual text descriptions to the next level. By breaking down a visual image in to distinct text tokens CLIP and CLIP-like data creates a direct set of tokens related to visual imagery.

We know CLIP data works very well for creating AI art and generating images with AI because things like Stable Diffusion and Midjourney and DALL-E all use CLIP or CLIP-like data to generate images. This hinted me towards a direction for DrawGPT.

Text Tokens, Pixel Data, and Diffusion, Oh My!

Most of the AI art tools we see right now (Jan 2023) are based on a combination of CLIP data to create text tokens and latent pixel diffusion. This is what allows “text to image” AI art.

In order to be able to create “any” image these pixel diffusers need to be trained on copious amounts of images which get their subject matter extracted either by metadata provided in the training set or by running images in the training set through CLIP and using the output alongside the image.

What is going on behind the scenes with text inputs to pixel diffusion is that the text tokens are actually parsed to create the sampling distribution for the pixel diffusion. It breaks down the text phrase you sent as an input and then starts sampling random pixels based on the text tokens and the more times it can go through and take guesses as to what pixel goes where the better the output image is.

This is a phenomenal way to create AI art and it is very effective. But it also has some major issues.

The major problem with things like DALL-E and Stable Diffusion is that the image sets they were trained on did not necessarily credit the artists properly. Things like the artists style, the subject matter, the image composition, and many more things were extracted during the training using CLIP or available metadata.

And we’re not talking about a few images here. We’re talking millions of images scraped from the Internet and possible from sources that did not even know they were being scraped. Yes technically the terms of service were not broken during the collection of the images for the training set but obviously the resulting backlash suggests that the image collection was in an ethical gray zone.

As we’ve seen online there are many artists who are not happy with the way their work is being used in these AI art tools.

This is a major issue and it is something that I thought I could also uniquely address with DrawGPT by using ONLY an LLM… no actual pixel data. An LLM cannot copy anything about an artists work directly because it is not sampling or reading the pixel data of the images, only the text descriptions of them from CLIP data.

DrawGPT – Part of the Solution to Potential Art Theft & Ethical Dubiousness

One way to easily get around the issue of artists not feeling that their work was being copied is simply to not copy it.

That seems simple enough on the surface but in practice has not really been realistic. With the introduction of genuinely large LLMs like GPT-3, GPT-3 DaVinci, ChatGPT, Bloom, and others the total corpus of textual works in the training set, including any CLIP data, should be proficient to give enough visual references for an LLM to be able to create images simply from words.

The problem is that the LLMs are not trained to create images. They are trained to create text. And while they can be trained to create images they are not trained to create images in a way that is visually coherent.

And that is where the question of how a visually enabled LLM is able to express itself. While it may know what a dog is, it may not know what a dog looks like. It may know what a dog is & it may know what a dog looks like from written examples how would it draw given that it cannot output pixel data?

How Can An AI LLM Draw?

This was my first question. Because the field of AI research with these LLMs, transformers, and diffusers is so new it wasn’t really something AI researchers were looking at. I did not have a lot of work to reference as no one had really been considering how to get the LLM itself to draw.

Much like the need for a truly massive training set the LLMs themselves needed to reach a certain maturity before it was realistic to explore for some research.

Even if the AI LLM has enough visual reference data it also requires an AI LLM with sufficiently large corpus of training data on an output medium to enable the ability to output tokens correctly enough that images could be rendered.

With the introduction of GPT-3 and the checkpoint GPT-3 DaVinci we have reached a point where the AI can in fact command a visual medium with enough complexity to correctly render images.

What is the medium for an LLM? Well, seeing how it can only use text it needs the text that it outputs to create an image. Since the images are digital this means the LLM needs to output instructions to draw a digital image.

This leaves only a few options for visual, artistic mediums for an LLM:

  • SVG – an XML based plaintext text format for web enabled vector images.
  • HTML – Using the HTML5 canvas tag with Javascript draw commands. It’s well supported in all browsers now.
  • LaTeX – A way to express complex equations which can draw lines but is not very suited for visual work.
  • ASCII – Using text characters to create a visual image by using each character as a “pixel”.

Of these options the only realistic choices are SVG and HTML5 canvas. LaTeX is not really suited for visual work and ASCII is not really suited for actual drawing (it’s great for CLI output or things like comments in web3 smart contracts).

SVGGPT ??? Nope.

SVG turned out to be a little too complex and verbose. It’s a very powerful format but the additional characters it uses with the XML spec + all of the attributes ended up being very difficult to create an image with.

While SVG does work, and it was the first format I tried because it seemed ideal, there were some major issues. Notably limits on output tokens often resulted in partial SVG drawings and without sufficient closing tags for open tags it just wasn’t possible to consistently generate complete images even on a basic level.

HTML5 Canvas GPT ??? Yep.

It turned out that using the 2D context of an HTML5 canvas tag with draw commands in Javascript was the perfect way to draw basics images with an LLM.

Using a very complex prompt that limits the output to only the relevant code I was able to consistently get DrawGPT to output code that would draw images. You are able to see the Javascript draw commands on DrawGPT when you create an image. Give it a try! All the Javacript code for any image is currently open source on the website.

2D canvas context draw commands in Javascript are not really for drawing complex, detailed images. They are more of the standard draw commands you see in most low level visual systems. The commands are things like fill, rect, line, arc, etc. They are not really meant for drawing complex images but they are perfect for drawing basic images.

This is why most of the output of DrawGPT is not detailed imagery like you expect from Stable Diffusion, DALL-E or any of the latent pixel diffusion methods used by other AI art models.

While it would be possible to draw more detailed images using an LLM + Javascript draw commands given the output token limit of the GTP-3 AI calls it is just not feasible for this particular proof of concept.

To note: if the prompt is changed to ask for more detailed images, or more detailed pixel art, then the AI LLM models will attempt to draw more detailed images. But the output will be limited by the output token limit of the GPT-3 API calls.

How Can We Know An LLM Is Drawing Things Correctly?

Once I was able to get the LLM to consistently render images the question became, “Is it drawing things correctly?” There was some difficult at first with more complex scenes or complex objects as it wasn’t clear exactly what the AI was drawing. Are those dots in the sky birds or are they just noise and artifacts like traditional pixel diffusion methods often produce?

It’s easy to see when DALL-E or Stable Diffusion create an image and the tokens are correctly represented but sometimes it’s not so obvious with a simplified image.

One massive advantage of using an LLM for drawing is that you can simply have it tell you what each object is supposed to be. This isn’t really an option with most of the other AI art methods as they are not trained to output text alongside the image perfectly describing each feature or token in the output image. You can always run the output image through CLIP but that does not give insight in to the actual drawing process or specifically what each object should be.

By forcing the output to include relevant code comments in the Javascript (you can see them in the code on the page) I was able to get the LLM to reveal the various objects it was attempting to draw.

I was surprised.

Not only was the LLM (default OpenAI GPT-3 DaVinci) now creating images I was able to verify that the things it was drawing were correct.

DrawGPT Draws Really Well, It Knows What It Is Drawing

It was stunning to see the AI generated images coming out consistently & correctly.

What do I mean by that? For example:

  • Portraits – Things like hair, eyes, nose, ears, mouth are all in the correct places. It draws those things “inside” a circle it will draw for a head and they will be correctly ordered vertically (the eyes are never below the mouth)
  • Landscapes – Mountains, sunsets, birds in the sky, clouds, trees, etc. are all in the correct place. It never tries to put the ground above the sky or have mountains strangely floating in space.
  • Objects – It knows the basic layout of common but complex objects like bicycles, lamps, and many others things. While it cannot draw a fully perfect bicyle the image it renders features the basic elements in the correct places.
  • Animals – It understands the basic layout of animals, including the number of legs, relevant things like ears or fins and attempts to place them correctly. A great sample is the image used for the DrawGPT AI Art Twitter Bot image. You can clearly see it was trying to draw a bird.

Regardless of this used CLIP data the reality is that the LLM is drawing things correctly.

It is not just drawing random things in random places on the image. It does have some issues with relative scaling but it is hardly ever so bad that the image itself is not recognizeable.

It is also drawing things in the correct order. It will draw the ground before the sky, the sky before the clouds, the clouds before the sun, the sun before the mountains, the mountains before the trees, the trees before the birds, etc.

In addition to drawing concrete objects it is also able to draw things like abstract shapes and patterns. It is not perfect but it is able to draw things like circles, squares, triangles, and other basic shapes. It is also able to draw things like stripes, polka dots, and other patterns.

It will use loops, if statements, and other basic programming constructs to draw things like a grid of squares, a pattern of circles, birds in the sky, and fruit on trees.

Sometimes the LLM chooses to express itself with text as well. It is able to use the text commands to label things or make statements within the image itself.

One truly surprising thing was when I send in no subject to draw at all. The AI will just draw something totally random: portraits, fine art, landscapes, and of course it’s all time favorite the Mona Lisa.

It loves to draw the Mona Lisa.

DrawGPT Is Not Perfect

If you use the app you’ll see that yes, the images are very simplistic. They are sometimes difficult to tell visually what you are looking at because it is just a series of boxes and circles.

Portraits will occasionally be unrecognizeable as it will pick similar colors for some things and make the image a mess. I believe that issue could likely be solved very easily with a better model or more specific training data designed to allow better visual responses.

The LLM is not perfect but it is drawing things correctly. If you reference the comments in the code it becomes clear that the concepts and tokens in the image are correct even if it is limited by the simplicity of the medium it has to use.

This is mostly a tradeoff of using simple draw commands in only text to draw images and rarely the issue with the actual output tokens of the AI.

DrawGPT – Adding Some Character + An Impish Twitter Bot

For fun I have the prompt adjust the comments in the code to add a little flavor to the output, often including a humorous take on the prompt or subject matter.

This was important because it gives the images and the output and the entire AI a feeling of being a character that you are interacting with. This is similar to the way people feel they are speaking conversationally with ChatGPT and it incredibly important for interacting with AI.

Seeing as how DrawGPT was able to draw things correctly & provide a little flavor, character, and humor I decided to create a Twitter bot that would allow users to reply to a tweet and have DrawGPT reply with an image. This also allowed me to experiment with incredibly complex input prompts that I would have otherwise not thought of on my own.

If you’d like to use the DrawGPT Twitter bot you can reply to any tweet with “@DrawGPT draw” and it will respond with an image of the tweet you are replying to and include a link to the image on the website so you can see the code & comments as well as share the link.

DrawGPT – A New Way To Create AI Art

DrawGPT will likely never be a commercial hit. The art is too simplistic to appeal to most people and the output tokens are too limited to be useful for most image generation tasks.

At the same time the simplicity of the images, combined with the LLM drawing important features of the subject, often creates a sort of “caricature” of the subject. For example if you have it draw Trump it will almost always try to draw some sort of hair.

It’s a really fun thing & the creativity of the AI LLM and how it draws is pretty mind blowing. It’s also a great way to get a glimpse in to how the AI is thinking.

DrawGPT – The Code & The Images & The Prompt & License

DrawGPT currently uses the stock OpenAI GPT-3 DaVinci model. There are no additional fine tuning or additional training sets added.

At this time I will not be releasing the prompt I am using.

I do list on the website the prompt tokens & the output tokens as returned so users and researchers can get a feeling for what the prompt may be like.

All of the code and images on the website generated by DrawGPT are currently under the CC0 license. This may change some day but the intent is provide an open source & fun project that publicly showcases the concepts for users and AI researchers.

What Is Next For AI Art and DrawGPT?

The front facing portion of every AI that interacts with humans is a language model.

As humans we express ourselves through language. Regardless of if the AI is an LLM or if it is something like Stable Diffusion, Disco, DALL-E, VQGAN, POINT-E, or any other AI we as humans still have to instruct it with language.

At this time I do not have any huge plans for DrawGPT. I may attempt to introduce other LLMs as a sort of litmus test for how visually enabled they are and I will certainly be giving it a spin with GPT-4 when it comes out.

I chose to output the image in 512×512 pixels, the size expected of most img2img inputs for other models so that the outputs can be used as inputs to more complex AI art models so it is fully compatiable with things like Stable Diffusion.

I am extrememly pleased with the way DrawGPT turned out.

I think that I have conceptually proved a few things and hopefully other AI researchers in the future can build with some of the fundamentals & tips & tricks I explored:

  • Visually enable LLMs by including CLIP data in the language training set.
  • LLM must also have sufficient training on the output medium.
  • Use the visual output to correctly identify if the AI and large language model “understands” complex visual concepts.
  • Include code comments or metadata of tokens in the output linked to specific parts of the image to identify if the drawing is “correct”.
  • Give the AI character and flavor to make it fun to interact with.
  • Enable the use of crowdsourced or social inputs to explore complex inputs you would not normally think of yourself.

Did You Write This With AI?

No. The horrendous spelling mistakes and terrible grammar are my own. I’m a programmer, not an English teacher.

Did You Really Not Click the Link Yet?

If you have somehow made it this far in to the article without clicking, now is the time.

Click here to try out DrawGPT and draw your own images with AI and generate art with an AI that only knows written words and has never seen a pixel in its life.