‍‍‍‍‍‍
Logo Afffect Media - marketing and advertising news.
Marketing

How can you recognize AI-generated content?

AI is everywhere, in this context distinguishing the true from the false is becoming increasingly pressing. We take stock.

Thanks to its ability to create remarkably realistic content, AI has become a powerful tool capable of shaping and manipulating information like never before. As AI-generated content becomes more widespread, an important question arises: how can we distinguish between what is real and what is artificially created?

In less time than it takes to say it, generative AI tools have gone from research prototype to commercial product. Models such as ChatGPT, Dall-E or Sora from OpenAI, Gemini or VEO from Google, can now generate text, images and videos that are often indistinguishable from human-created content. As such, generative AI models raise concerns about the credibility of digital content and the ease of producing harmful content in the future.

Why is it important to detect AI-created content?

The motivations of politicians and regulators are diverse, but include limiting the proliferation of spam, scams, "non-consensual" pornography and targeted harassment, fake news, and confirming the authenticity of certain content for legal proceedings. Implementing effective detection methods is therefore a priority. At this stage, while some technologies are being developed to detect AI-generated content, few tools are actually operational. In the meantime, a few techniques can help identify AI-generated content and mitigate its negative impacts. We take stock.

How do you recognize AI-generated text?

AI-generated articles are increasingly being published in the media and on blogs, so it's important to be able to detect this type of content. Here are some practical ways to identify them:

  • Analyze the writing style: AI-generated articles often lack the human touch and can lack depth and originality. Watch out for robotic-sounding phrases or unnatural language patterns. Look for inconsistencies in grammar, punctuation and sentence structure. Watch for inconsistencies: AI can have difficulty maintaining coherence throughout an article.
  • Analyze language: examine the use of jargon or technical terms. AI-generated content may overuse these terms without providing clear explanations or context. Take note of excessive repetition or unnecessary verbosity. AI may tend to repeat phrases or use overly complex language to compensate for limited comprehension.
  • Spot factual errors: even if AI can produce coherent text, it can sometimes contain false or inaccurate information. Cross-checking facts with reliable sources can reveal AI-generated content. AI can also produce rambling or illogical arguments, lacking the natural progression of thought often found in human-written content.
  • Think speed and volume: AI algorithms can generate articles at extraordinary speed and in large quantities. If a news source suddenly produces an unusually high volume of articles, this may indicate the use of AI.
  • Absence of bias or controversial opinions: AI-generated articles are less likely to express controversial viewpoints or prejudices. They tend to provide neutral information without taking sides.
  • Check the source or author: If it's a press article, don't hesitate to compare it with other writings by the author.

How to detect an image created by AI?

MidJourney, DALL-E, Stable Diffusion: these programs are capable of generating an infinite number of clichés. Some are ultra-realistic and related to current events, even confusing. We remember the Pope looking as stylish as ever in a Balenciaga down jacket, Donald Trump being stopped in the street by police officers, or Emmanuel Macron as a garbage collector... Fake, fake, fake. So how do you avoid falling for it? Axel Legay, a specialist in cybersecurity and AI and lecturer at the University of Louvain, told La Voix du Nord: " The first thing to do, if in doubt, is to do a reverse image search on Google. This tool, which has been around for several years, enables us to find the original context of the photograph. Who talked about it for the first time, and in what context. If nothing comes up, that's already a good indication. If the image is linked to a site, make sure that the associated information makes sense. If the photo comes from La Voix du Nord, for example, it's serious. If the photo comes from a blog called "Hate Emmanuel Macron", be wary. "

  • Hands and limbs: Most people have five fingers on each hand, two arms and two legs. Many AI generators are getting a little more excited. The latest technologies are certainly better, and the characteristic six fingers or claws of the first images are now rare. However, be careful with details. In group scenes, for example, pay close attention to characters in the background: it's not impossible to notice extra legs, gnarled hands or an arm slung around a bodiless shoulder.
  • Words: Image generators are not text generators, and creating images with text-like elements is a very different job from creating readable text. Misspelled words, blurred letters and mysterious characters can be good signs. For example, in the photos of Donald Trump's arrest, the lettering on the police officers' hats is illegible.
  • Hair: Human hair is made up of strands that extend downwards from the head. AI hair strands often have a less-defined beginning and end. Up close, they can almost appear painted. Beware, however: sometimes image compression can also make things look inconsistent.
  • Symmetry: Objects can often be found in pairs or groups. Think of earrings or cutlery. Some AI systems can forget what's happening on the left side of a face or table once it's time to render the other side.
  • Textures: Repeated patterns, fabrics and textures are difficult to render. In the real world, bricks tend to be uniform in size and shape throughout a building, while a floral print on wallpaper will be identical line for line each time it is repeated.
  • Geometry: look at the space in which an image is located and the objects it contains. Are the right angles straight? Does a wall blend harmoniously with a shelf in the background? Can you visualize how the sofa fits behind the table that appears to be flush with the wall? Inconsistencies such as these may suggest that the image has been created by a system that has no understanding of 3D space.
  • Consistency: Are there several images claiming to show the same thing? Compare them! Generating multiple images of the same space from different angles and at different times is trivial in the real world and at the cutting edge of AI. Even video generators such as Sora, which can create videos moving through a virtual space, will rarely step back to show something they've moved away from, as this reveals that they've "forgotten" what was there in the first place.

Note that some AIs "sign" their works. This is the case with DALL-E, for example, which automatically generates a multicolored bar on all its images, or Crayion, which places a red-colored pencil.

Don't dwell (too much) on AI

‍Ifan image looks dubious enough for you to examine it to know that it's AI-generated, step back and ask yourself whether you should trust your instincts. Maybe the image isn't AI-generated at all, but it could still be the result of an AI face swap, edited on old-fashioned Photoshop, fully staged, or even just badly captioned (a "cheapfake"). Just because an AI hasn't created something from scratch doesn't mean what you're seeing is true.

How to spot an AI-generated video?

Even if AI is getting better at creating videos, you can still try to determine whether a video is created by artificial intelligence. Look for the following:

  • Strange movements or expressions: watch how the person moves and acts. Real people move naturally and freely. AI-created videos may feature movements that are too fluid or too rigid, missing the subtle complexity of real human behavior.
  • Audio and video don't match: AI-created videos can sometimes mishandle the synchronization between what you see and what you hear, so lip movement and sound don't match well and may be out of sync. While this isn't a telltale sign, as desynchronization and audio editing errors can occur, it can be a warning.
  • Unusual textures and lighting: the visual aesthetics of a video can reveal crucial information about its authenticity. Bright textures and unnatural lighting are common signals in AI-generated content. Whether it's an overly polished skin texture or lighting conditions that deviate from natural norms, these anomalies can be a sign of AI tampering.
  • A little too perfect: beware of videos that look excessively polished. Often, AI content looks shiny and smooth. So if people or objects look a little too perfect, with few or no imperfections, it could be the AI.
  • Objects appearing/disappearing or transforming: Watch for sudden, inexplicable changes in objects or landscapes. These changes within a video, such as objects appearing or disappearing, or scenes undergoing an abrupt transformation, can be good indicators of AI involvement. Indeed, while human-created content tends to maintain its coherence, AI algorithms may struggle to blend scenes or maintain logical transitions seamlessly.
  • Try to find the video's source: you can go a step further by trying to trace the video's origin. "Authentic" content usually has a traceable source, whether it's the person who posted the original content online or the content creator's platform. In contrast, AI-generated videos may lack a clear source. What's more, AI videos are often edited videos, so if you can find the original video, you'll have the answer to your question. So, as with images, don't hesitate to perform a reverse search.

How can you spot a deepfake?

Deepfakes on social networks refer to manipulated videos or images created using AI technology. Detecting this type of content is becoming increasingly difficult, as in the case of the TikTok @deeptomcruise account posing as the American star. And it's amazing!

One way of identifying deepfakes is to look for anomalies in facial expressions or inconsistencies in tone of voice. As with a video or image, analyzing the background and assessing the context of the content can help uncover potential anomalies. Cross-checking with reliable sources or a reverse image search is recommended.

What technologies are being developed to detect AI-generated content?

In this context, distinguishing the real from the fake is becoming ever more pressing, and so is the need for regulation. The EU's AI law, for example, contains provisions requiring users of AI systems to disclose and label their AI-generated content, as well as provisions requiring people to be informed when interacting with AI systems. In the USA, various avenues are being explored, including by the NDDA (National Defense Authorization Act) and the Department of Defense, to integrate content provenance information into the metadata of official audio/video files made public. For its part, the White House announced last summer that it had obtained voluntary commitments from major AI companies to develop " robust technical mechanisms to ensure that users know when content is generated by AI ", such as watermarking or content provenance for audiovisual media. Unfortunately, the commitment seems limited to audiovisual content and excludes language models.

Watermarking, also known as "tattooing", appears to be the most promising approach currently under development.

The watermark

Watermarking (in its various forms) involves embedding an identifiable pattern in a piece of content to track its origin. The simplest digital watermarks add a visible label to an image, a specific sound to an audio clip or a quotation in a text. A simple example is the five colored squares at the bottom of an image generated by DALL-E. Unfortunately, watermarks of this kind are easy to remove and tamper with.

Researchers have recently begun exploring machine learning-based approaches to image watermarking. One approach studied by Meta researchers uses a machine learning model to embed a hidden watermark based on a "key" that can then be detected by another machine learning model. Google, for its part, has been working on SynthID, an experimental watermarking and image identification tool generated by the company's AI models that uses one machine learning model to embed an imperceptible watermark and another model to detect the watermark. Google's SynthID is also supposed to be used to watermark audio, although the tool is still being tested and details of how it works have not been disclosed.

Perhaps the watermarking technique that is attracting the most research interest is "statistical watermarking". Statistical watermarking is one of the most accurate and tamper-proof systems available. Instead of embedding a clearly defined pattern in text or audiovisual content, an algorithm embeds a statistically unusual arrangement of words/pixels/sounds.

While watermarking is perhaps the most studied AI detection approach, three other approaches are attracting researchers' interest.

Post-hoc detectors

The simplest idea for detecting whether content has been produced by humans or by AI is to "fight AI with AI". The assumption is that AI-generated content has systematic (albeit subtle) differences from content created by humans, which can be detected by a machine learning model. This approach requires no intervention when the AI model is generating content, concentrating solely on verifying the content after it has been generated (i.e. post-hoc).

Recovery-based detection

Retrieval-based detection involves storing the content generated by a given model in a database managed by the model's developer, and then comparing any query against this database to check whether it was generated by the AI. In the context of generative AI, a retrieval-based detector wouldn't just look for an exact match as simple plagiarism checkers usually do; instead, it would look for matches in the database based on underlying meaning or content. In the linguistic context, this technique has proved to be more accurate than post-hoc detection, although it may not yet be sufficiently reliable.

Store information on content origin in metadata

Another approach to detecting AI-generated content is to store information about the origin (or provenance) of a piece of content in the metadata. The most important initiative of this type is the Coalition for Content Provenance and Authenticity (C2PA - or Content Authenticity Initiative). C2PA is an open technical standard that enables publishers, companies and others to embed metadata in media that certifies the source and provenance of online content in order to verify its origin and associated information. Adobe, Google, Open AI, Microsoft, Sony and the BBC are all members of C2PA.

Social networks start tagging AI-created content

On social networks, TikTok, YouTube, Meta and now Vimeo are also tracking AI-generated content. While some rely on creators to inform users, others such as TikTok and Meta are starting to automatically tag AI-generated content. How do they do this? By relying on C2PA. Although the technology is not yet stable - some photographers have complained that Meta has applied labels to real photos on which they have used basic editing tools - it offers a first guideline in the fight against fake news.

Read more articles

Receive Le Feuillet
Your weekly marketing newsletter, so you don't miss a thing.
There's a mistake.