Generative AI foundation models and platforms

Models for Generative AI

Artificial neural networks (ANNs)
Supervised learning from labeled data sets is constrained to deliver a predefined response and data is time-consuming and costly to obtain. Deep learning systems are trained on unsupervised or unlabelled data set. They are freer to discover patterns and hierarchies, and produce more efficient, accurate results.

Deep Learning Architectures

These are the building blocks used to process different kinds of data:

1. Convolutional Neural Networks (CNNs)

Best for: Grid-like data (images, video frames).
How they work: Capture local spatial patterns (edges, shapes) through convolution filters.
Use cases: Image classification, object detection, segmentation, video recognition.
Note: Often used inside other models (e.g. VAEs, GANs for images).

2. Recurrent Neural Networks (RNNs)

Best for: Sequential data (text, speech, time series).
How they work: Maintain hidden states (memory) across steps to capture context over time.
Use cases: Speech recognition, translation, sentiment analysis.
Limitation: Struggle with long-range dependencies due to vanishing gradients.

3. Transformer Architectures

Best for: Sequences with long-range dependencies (text, audio, code, etc.).
How they work: Use attention mechanisms instead of recurrence/convolutions. This lets them weigh all parts of the input sequence at once.
Structure: Typically encoder–decoder stacks.
Use cases: Language modeling (GPT, BERT, T5), vision (ViTs), multimodal AI (CLIP, GPT-4V).
Why important: Form the backbone of Large Language Models (LLMs).

Core Generative AI Model Families

These are frameworks for generating new data (text, images, audio, video). They can use different architectures (CNNs, Transformers, etc.) inside them.

1. Variational Autoencoders (VAEs)

How they work: Compress data into a low-dimensional latent space, then reconstruct it with variation.
Use cases: Image synthesis, data compression, anomaly detection.
Architecture inside: Often CNNs.

2. Generative Adversarial Networks (GANs)

How they work: Two networks compete —
- Generator tries to create realistic samples.
- Discriminator tries to detect fakes.
Use cases: Realistic image generation, style transfer, deepfakes, synthetic data for finance/geospatial.
Architecture inside: Typically CNNs.
Challenge: Hard to train, require lots of data, prone to instability.

3. Transformer-Based Generative Models

How they work: Use attention to capture long dependencies and generate coherent sequences.
Use cases: Text generation (GPT series), translation (T5), multimodal tasks (text-to-image, music generation).
Why important: Revolutionised NLP and beyond; foundation of LLMs.

4. Diffusion Models

How they work:
- Forward process: Gradually add noise to data.
- Reverse process: Learn to denoise and reconstruct samples.
Use cases: High-quality image/video generation (DALL·E 2, Stable Diffusion, Imagen).
Note: Training is computationally heavy but produces state-of-the-art results.

Key Takeaway

Architectures = CNNs, RNNs, Transformers (general-purpose neural building blocks).
Generative Model Families = VAEs, GANs, Transformers, Diffusion (frameworks for creating new data).
Transformers bridge both: they’re an architecture and a generative modelling approach.

Foundation models

A new successful paradigm for building AI systems. Train one model on a huge amount of data and adapt it to many applications. We call such a model a foundation model.

GPT 4 and Google Gemini: Multimodal foundational models

The foundational model google/flan-ul2 helps classify text and detects its sentiment and tone.

Distinguishing features of various foundation models

Model	Distinguishing features
gpt-4o-mini	A lightweight, optimized version of GPT-4o, designed to provide faster responses at lower cost while maintaining strong reasoning and language understanding capabilities. It is well-suited for tasks requiring efficiency, such as chat applications, quick content generation, and interactive use cases.
gpt-4-turbo	GPT-4-turbo is an advanced large language model trained on a massive dataset, designed to understand language at a deep level. Its sophisticated architecture allows it to generate highly coherent, contextually accurate, and informative responses across a broad range of prompts. With optimized performance, GPT-4-turbo is well-suited for various applications, including natural language understanding, content creation, code generation, translation, and more.
tiiuae/falcon-7b-instruct	Falcon-7B-Instruct is a ready-to-use chat-based or instruct model that follows instructions in a prompt and provides a detailed response. It is mostly trained on English data and cannot be generalized appropriately to other languages. Based on the instruct model, it may be ideal for most text generation tasks other than further fine-tuning. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
meta–llama/Llama–2–7b–chat–hf	Llama is intended for commercial use and research in English. The scope of this model is limited to the English language and does not generate appropriate completion for other language prompts. It can be further adapted to develop tuned models intended for assistant-like chat and pre-trained models for various natural language generation tasks. However, this is not an instruct model and may further fine-tune.
IBM Granite models	Recognizing the fact that a single model cannot cater to the distinct demands of every business use case, the Granite models, which are specially designed for businesses, are being developed in multiple sizes.These multi-size foundation models apply generative AI to both language and code.

IBM Granite Foundation Models

As this course is put together by IBM it features info about their models, specifically the granite models. There are two models, Granite.13b.instruct and Granite.13b.chat. The instruct one is interesting to me as I learnt a new term, Supervised fine-tuning. The granite-13b-instruct variant is an instruction-tuned Supervised Fine-Tuning (SFT) model, which was further tuned using a combination of the Flan Collection, 15k samples from Dolly, Anthropic's human preference data about helpfulness and harmlessness, Instructv3, and internal synthetic data sets designed explicitly for summarization and dialogue tasks (~700K samples).

Platforms for Generative AI

Text-to-text generation models

Text-to-image generation models

DALL-E is developed by Open AI
Imagen is by Google
Stability AI has built Stable Diffusion
Midjourney can create beautiful AI-generated artwork

Text-to-code generation models

Uses neural code generation, which uses ANNs based on neural networks in the brain
Types of models:
- Seq2seq models:
  - translate from one domain to another.
  - translate NL descriptions into a code sequence
- Transformer models
  - Learn long-range dependencies between words
  - Learn relationships between words in natural language and corresponding code
CodeTG by Google AI is trained on large dataset of text and code.
Code2seq by OpenAI
PanGu-Coder by Microsoft Research

Benefits

Code completion
Code generation
Debugging
Code translation between different programming languages
Code refactoring and application modernisation
Provide recommendations for libraries and frameworks
Test data generation
Code documentation

Code generation lab

GPT-4o Mini is buggy in the labs. I switched to using GPT4.1 Mini.

https://codetester.io/
A tool for run C, C++, Python, Ruby, Javascript and Java code within a browser

IBM watsonx.ai

IBM watsonx.ai is IBM's enterprise AI development platform, part of their broader watsonx suite of AI and data tools. It's designed to help organisations build, train, tune, and deploy both traditional machine learning models and generative AI applications at scale.

Key features include:

Foundation Models: Access to IBM's own foundation models as well as third-party models from companies like Meta, Hugging Face, and others. These can be used for various tasks like text generation, summarisation, and analysis.

Model Development: Tools for training custom models using your own data, fine-tuning existing models, and managing the entire machine learning lifecycle from development to deployment.

Enterprise Focus: Built with enterprise requirements in mind, including governance controls, security features, and compliance capabilities that many businesses need when deploying AI solutions.

Integration: Connects with IBM's other watsonx components (watsonx.data for data management and watsonx.governance for AI governance) as well as hybrid cloud environments.

Prompt Engineering: Interfaces for working with large language models through prompt engineering, allowing users to customise model behaviour without extensive coding.

The platform is positioned as IBM's answer to enterprise AI needs, competing with offerings from Microsoft Azure AI, Google Cloud AI, and AWS AI services. It's particularly aimed at organisations that want to deploy AI solutions whilst maintaining strict governance and security standards.

watsonx.ai's Key USPs

Hybrid Cloud Focus: Unlike purely cloud-native solutions, watsonx.ai is designed to work across on-premises, private cloud, and public cloud environments - crucial for organisations with strict data residency requirements.

Enterprise Governance: Built-in AI governance, explainability, and bias detection tools that are often afterthoughts in other platforms. IBM emphasises "trustworthy AI" throughout the development lifecycle.

Industry Expertise: IBM leverages decades of enterprise consulting experience to offer industry-specific AI solutions and templates, particularly strong in financial services, healthcare, and manufacturing.

Traditional ML + Generative AI: Whilst many newer platforms focus primarily on generative AI, watsonx.ai maintains strong traditional machine learning capabilities alongside foundation models.

Integration with IBM Ecosystem: Seamless integration with IBM's broader technology stack (Red Hat, IBM Cloud, existing enterprise software), which matters for organisations already invested in IBM technologies.

Data Fabric Approach: The watsonx suite's integration between data management (watsonx.data) and AI development is tighter than most competitors offer.

However, IBM faces challenges including perception as legacy technology, pricing competitiveness, and the rapid pace of innovation from cloud-native competitors. The platform's success largely depends on IBM's ability to leverage its enterprise relationships and governance strengths whilst keeping pace with AI innovation.

Hugging Face

Hugging Face is an open source artificial intelligence platform where scientists, developers and businesses collaborate to build personalized machine learning tools. The platform was built with the purpose of creating a hub for the open source AI community to share models, data sets and applications.

They provide educational and learning materials. And the biggest benefit is the thriving community.

Course Quiz, Project and Wrap-up

Term	Definition
Artificial neural networks (ANNs)	A collection of smaller computing units called neurons which are modeled in a manner similar to how a human brain processes information.
Bidirectional autoregressive transformer model (BART)	A text-to-text transfer transformer model developed by Facebook AI with a seq2seq translation architecture with bidirectional encoder representation like BERT and a left-to-right decoder like GPT.
Bidirectional encoder representations from transformers (BERT)	A family of language models by Google that uses pre-training and fine-tuning to create models that can accomplish several tasks.
Chatbot	A computer program that simulates human conversation with an end user. Though not all chatbots are equipped with artificial intelligence (AI), modern chatbots increasingly use conversational AI techniques like natural language processing (NLP) to make sense of the user's questions and automate their responses.
Clustering	An application of unsupervised learning wherein the algorithms group similar instances together based on their inherent properties.
Code2Seq	A text-to-code seq2seq model developed by OpenAI trained on a substantial text and code data set. It leverages the syntactic structure of programming languages to encode source code.
CodeT5	A text-to-code seq2seq model developed by Google AI trained on a large data set of text and code. CodeT5 is the first pre-trained programming language model that is code-aware and encoder-decoder based.
Convolutional neural networks (CNNs)	Deep learning architecture networks that contain a series of layers, each conducting a convolution or mathematical operation on a previous layer.
DALL-E	A text-to-image generation model developed by OpenAI that is trained on a large data set of text and images and can be used to generate realistic images from various text descriptions.
Deep learning	A type of machine learning focused on training computers to perform tasks through learning from data. It uses artificial neural networks.
Diffusion model	A type of generative model that is popularly used for generating high-quality samples and performing various tasks, including image synthesis. It is trained by gradually adding noise to an image and then learning to remove the noise. This process is called diffusion.
Dimensionality reduction	An application of unsupervised learning wherein the algorithms capture the most essential data features while discarding redundant or less informative ones.
Falcon	A large language model developed by the Technology Institute of Innovation (TII). Its variant, falcon-7b-instruct, is a 7-billion-parameter model based on the decoder-only model.
Foundational models	AI models with broad capabilities that can be adapted to create more specialized models or tools for specific use cases.
Generative adversarial network (GAN)	A type of generative model that includes two neural networks: Generator and discriminator. The generator is trained on vast data sets to create samples like text and images. The discriminator tries to distinguish whether the sample is real or fake.
Generative AI models	Models that can understand the context of input content to generate new content. In general, they are used for automated content creation and interactive communication.
Generative pre-trained transformer (GPT)	A series of large language models developed by OpenAI designed to understand language by leveraging a combination of two concepts: Training and transformers.
Google flan	An encoder-decoder foundation model based on the T5 architecture.
Google JAX	A machine learning framework used for transforming numerical functions that combines autograd (automatic obtaining of the gradient function through differentiation of a function) as well as TensorFlow's XLA (accelerated linear algebra).
Hugging Face	An AI platform that allows open-source scientists, entrepreneurs, developers, and individuals to collaborate and build personalized machine learning tools and models.
IBM Granite	Multi-size foundation models that are specially designed for businesses. These models use a decoder architecture to apply generative AI to both language and code.
IBM watsonx	An integrated AI and data platform with a set of AI assistants designed to scale and accelerate the impact of AI with trusted data across businesses.
Imagen	A text-to-image generation model developed by Google AI trained on a large data set of text and images. Imagen is used to generate realistic images from various text descriptions.
Large language models (LLMs)	A deep learning model trained on substantial text data to learn the patterns and structures of language. They can perform language-related tasks, including text generation, translation, summarization, sentiment analysis, and more.
Llama	A large language model from Meta AI.
Natural language processing (NLP)	A subset of artificial intelligence that enables computers to understand, manipulate, and generate human language (natural language).
Neural code generation	A process that uses artificial neural networks like neural networks work in the human brain.
Neural network model	A type of text-to-text generation model that uses artificial neural networks to generate text.
Neural networks	Computational models inspired by the human brain's structure and functioning. They are a fundamental component of deep learning and artificial intelligence.
Open lakehouse architecture	A data lakehouse architecture that combines elements of data lakes and data warehouses.
PanGu-Coder	A text-to-code transformer model developed by Microsoft Research. It is a pre-trained decoder-only language model that generates code from natural language descriptions.
Pre-trained models	A machine learning model trained on an extensive data set before being fine-tuned or adapted for a specific task or application. These models are a type of transfer learning where the knowledge gained from one task (the pre-training task) is leveraged to perform another task (the fine-tuning task).
Pre-training	A technique in which unsupervised algorithms are repeatedly given the liberty to make connections between diverse pieces of information.
Prompt	An instruction or question given to a generative AI model to generate new content.
PyTorch	An open-source machine learning framework based on the Torch library. This framework is used for applications such as computer vision and natural language processing.
Recurrent neural networks (RNNs)	Deep learning architecture designed to handle sequences of data by maintaining hidden states that capture information from previous steps in the sequence.
Seq2seq model	A text-to-text generation model that first encodes the input text into a sequence of numbers and then decodes this sequence into a new one, representing the generated text.
Statistical model	A type of text-to-text generation model that uses statistical techniques to generate text.
Supervised learning	A subset of AI and machine learning that uses labeled data sets to train algorithms to classify data or predict outcomes accurately.
T5	A text-to-text transfer transformer model developed by Google AI trained on a substantial data set of code and text. It can be used for various tasks, including summarization, translation, and question-answering.
TensorFlow	A free and open-source software library used for machine learning and artificial intelligence.
Text-to-code generation model	A type of machine learning model used to generate code from natural language descriptions. It uses generative AI to write code through neural code generation.
Text-to-image generation model	A type of machine learning model used to generate images from text descriptions. It uses generative AI to make meaning out of words and turn them into unique images.
Text-to-text generation model	A type of machine learning model used to generate text from a given input. It is trained on a large text corpus and is taught to learn patterns, grammar, and causal information. Using the given input, the models generate the new text.
Training data	Data (generally, large data sets that also have examples) used to teach a machine learning model.
Transformers	A deep learning architecture that uses an encoder-decoder mechanism. Transformers can generate coherent and contextually relevant text.
Unsupervised learning	A subset of machine learning and artificial intelligence that uses algorithms based on machine learning to analyze and cluster unlabeled data sets. These algorithms can discover hidden patterns or data groupings without human intervention.
Variational autoencoder (VAE)	A generative model that is a neural network model designed to learn the efficient representation of input data by encoding it into a smaller space and decoding it back to the original space.
watsonx.ai	A studio of integrated tools for working with generative AI capabilities powered by foundational models and building machine learning models.
watsonx.data	A massive, curated data repository that can be used to train and fine-tune models with a state-of-the-art data management system.
watsonx.governance	A powerful toolkit to direct, manage, and monitor your organization's AI activities.