ilteris kaplan blog

AI as a Service

January 19, 2024

Creating a platform that provides AI as a service can be a substantial endeavor, similar to what OpenAI offers with its APIs. Here’s a strategy to consider:

  1. Focus on a Niche: Starting with a specific industry or application can help you create tailored solutions that meet specific needs, which can be a significant value proposition for small companies.
  2. Data Management: Offer tools for companies to manage their data, including options for using synthetic data where appropriate, to train models while respecting privacy and security.
  3. Model Training and Tuning: Provide services to train and fine-tune models, including advanced techniques like Reinforcement Learning from Human Feedback (RLHF) or Retrieval-Augmented Generation (RAG).
  4. Customer Service: Develop a strong customer support system that helps clients integrate AI into their workflows and assists with model tuning and problem-solving.
  5. Continuous Innovation: Keep abreast of state-of-the-art AI developments to continuously improve and update the services you offer.
  6. Scalability: Design your system to be scalable, so it can grow with your clients as they expand their AI capabilities.

As a customer, let’s say I run a small online retail business and I’m looking to enhance customer experience through personalized product recommendations. Here’s how I could benefit from your AI-as-a-Service platform:

  1. Define Objective: I want to develop an AI model that analyzes customer behavior and provides personalized product recommendations on my website.
  2. Access the Playground: Use your platform’s playground to upload my customer data and test pre-built recommendation algorithms.
  3. Model Customization: Utilize the platform’s tools to fine-tune the recommendation model based on my specific product categories and customer demographics.
  4. Integrate Feedback: Implement RLHF to improve the model by providing human feedback on the relevance of recommendations made during testing.
  5. Leverage RAG for Content: Use RAG to augment the model’s capability by pulling in relevant product details and reviews to enhance recommendation quality.
  6. Deploy & Monitor: Once satisfied with the performance in the playground, deploy the model to my online store and monitor its impact on sales and customer engagement.
  7. Ongoing Support: Rely on the platform’s customer service for ongoing support and to integrate new features or data sources to continuously improve the recommendation system.

This scenario allows for a hands-on approach to developing an AI model tailored to my business needs with expert support and without the need for extensive in-house AI expertise.

Building an AI-as-a-Service platform on Google Cloud Platform (GCP), while abstracting GCP details from your customers, you would typically utilize a combination of services to manage data, train models, and serve predictions. Here’s a step-by-step guide:

  1. Google Cloud Storage (GCS): Store and manage large amounts of data needed for AI model training and operations.
  2. BigQuery: Analyze data with SQL queries to gain insights and prepare datasets for training.
  3. Vertex AI: Use this unified platform for training, hosting, and managing ML models. It’s essential for applying RLHF and RAG.
  4. AI Platform Data Labeling Service: For customers needing to label data for training models, this service can be instrumental.
  5. Cloud SQL or Firestore: Manage structured data required for your application, like user profiles and interaction logs.
  6. Cloud Run or Kubernetes Engine: Deploy and scale your web application and any stateless API services you develop for your platform.
  7. Cloud Functions: For event-driven, serverless computations in response to changes in data or files in GCS.
  8. Cloud Build and Cloud Deployment Manager: Automate the deployment of your platform and manage resources using infrastructure as code.
  9. API Gateway: Create, secure, and monitor APIs for clients to interact with your platform.
  10. Cloud Endpoints: Develop, deploy, and manage APIs with integrated monitoring, logging, and tracing.
  11. Identity Platform: Manage user authentication and authorization without giving direct access to Google Cloud.
  12. Cloud IAM: Control access to resources on GCP, ensuring customers can only access what they need on your platform.

By combining these services, you can build a comprehensive platform that offers a seamless AI service to your customers while abstracting the complexities of GCP.

Your AI-as-a-Service platform could provide several key advantages over a direct approach with Google Cloud:

  1. Customization and Ease of Use:

    • Tailored UI/UX designed for non-tech-savvy users, simplifying the model training and deployment process.
  2. Integrated Data Solutions:

    • Pre-processing and feature engineering tools specific to retail, which might be more sophisticated than standard offerings.
  3. Domain-Specific Model Tuning:

    • Pre-trained models on retail data that can be further customized, offering a head start compared to training from scratch.
  4. Continuous Updates:

    • Automatic updates with the latest AI advancements without the need for user intervention.
  5. Expert Support:

    • Access to domain experts for model tuning and problem-solving, which Google Cloud doesn’t provide directly.
  6. Cost-Effective Scaling:

    • Optimized resource usage for cost-effective scaling tailored for retail operations.
  7. Security and Compliance:

    • Enhanced security measures and compliance with industry-specific regulations.

By abstracting the complexity of building and deploying AI models, your platform would allow a small retail business to focus on their core operations while leveraging advanced AI capabilities.

Even without a specific domain target, there are valuable aspects of building an AI-as-a-Service platform you can start on:

  1. Core Platform Development:

    • Design a flexible architecture that can adapt to various domains.
  2. General AI Model Research:

    • Explore and experiment with different types of AI models that can serve multiple use cases.
  3. Market Research:

    • Conduct market analysis to identify potential high-impact domains.
  4. Regulatory and Compliance Framework:

    • Develop a robust privacy and data security framework that can adapt to different industries’ regulations.
  5. Business Model Canvas:

    • Outline your business model, including value proposition, customer relationships, channels, and revenue streams.
  6. Networking:

    • Start building relationships with potential customers, partners, and industry experts.
  7. Learning and Development:

    • Enhance your understanding of AI and cloud services through continuous learning.

This groundwork will be instrumental in swiftly adapting your service to a chosen domain once you’ve made that decision.

Designing a flexible architecture for an AI-as-a-Service platform involves creating a system that is modular, scalable, and domain-agnostic. Here’s a high-level view:

1. Microservices Architecture:

  • Design your system as a collection of loosely coupled services. This allows for easier updates, scaling, and maintenance.

2. API-First Approach:

  • Develop RESTful APIs or GraphQL for internal and external communication, ensuring that different parts of your system can interact seamlessly.

3. Containerization:

  • Use Docker or similar tools to containerize your services, which allows for consistent deployment across different environments.

4. Orchestration:

  • Implement an orchestration system like Kubernetes for managing containerized services, handling scaling, and recovery.

5. Data Storage:

  • Choose a mix of databases (SQL for structured data, NoSQL for unstructured data) to provide flexibility in data handling.

6. AI Model Management:

  • Use a platform like MLflow or Kubeflow for managing the lifecycle of machine learning models.

7. CI/CD Pipeline:

  • Integrate a continuous integration and delivery pipeline to automate testing and deployment processes.

8. Security:

  • Design a robust security system with authentication, authorization, encryption, and regular audits.

9. Monitoring and Logging:

  • Implement comprehensive monitoring and logging to keep track of system health and performance.

10. User Interface:

  • Develop a user-friendly front end that allows users to interact with the AI models without needing technical expertise.

11. Scalable Compute:

  • Integrate with cloud services that provide scalable compute resources for training and inferencing models.

12. Experimentation and Testing:

  • Provide an environment for A/B testing and experimentation for new features and models.

This architecture can serve as a foundation, which you can tailor as you define your target domain and specific customer needs.

Building an AI-as-a-Service platform on Google Cloud Platform (GCP) based on the high-level architecture previously described can involve the following services:

  1. Microservices Architecture:

    • Cloud Run or Google Kubernetes Engine (GKE) to manage and deploy microservices.
  2. API-First Approach:

    • API Gateway to create, secure, and manage APIs.
    • Endpoints for developing, deploying, and managing APIs with integrated monitoring.
  3. Containerization:

    • Container Registry to store and manage container images.
    • Artifact Registry to store, manage, and secure your packages.
  4. Orchestration:

    • Google Kubernetes Engine (GKE) for orchestrating containerized applications.
  5. Data Storage:

    • Cloud SQL and Cloud Spanner for relational data.
    • Firestore or Cloud Bigtable for NoSQL data.
  6. AI Model Management:

    • Vertex AI for managing the machine learning model lifecycle.
  7. CI/CD Pipeline:

    • Cloud Build for continuous integration and delivery.
    • Cloud Source Repositories for hosting source code.
  8. Security:

    • Identity-Aware Proxy and Cloud IAM for authentication and access management.
    • Security Command Center for security management and data risk analysis.
  9. Monitoring and Logging:

    • Operations Suite (formerly Stackdriver) for monitoring, logging, and diagnostics.
  10. User Interface:
  11. App Engine for hosting the front end.
  12. Firebase for additional frontend tools and real-time database.
  13. Scalable Compute:
  14. Compute Engine for custom and scalable compute resources.
  15. Experimentation and Testing:
  16. Firebase A/B Testing and Google Optimize for running experiments.

Start by setting up each service’s basic configuration and gradually integrate them as your platform’s needs evolve.

Creating a complete Terraform setup for an AI-as-a-Service platform on Google Cloud would involve several resources and potentially sensitive configurations that should be carefully managed. However, I can provide you with a starting point for your Terraform configuration. This will include the setup for API Gateway and Cloud Run, which are essential components of such a platform.

Please note that for a full setup, you would need to add more details and possibly include other resources like IAM roles, networking, and data storage options.

Here is a basic Terraform script to get you started:

provider "google" {
  credentials = file("path-to-your-service-account-key.json")
  project     = "your-project-id"
  region      = "us-central1"

# API Gateway to manage APIs
resource "google_api_gateway_gateway" "api_gateway" {
  provider   = google
  api_id     =
  project    = "your-project-id"
  location   = "us-central1"
  gateway_id = "my-api-gateway"

resource "google_api_gateway_api" "api" {
  provider = google
  project  = "your-project-id"
  api_id   = "my-api"

resource "google_api_gateway_api_config" "api_config" {
  provider    = google
  project     = "your-project-id"
  api         = google_api_gateway_api.api.api_id
  api_config_id = "my-api-config"
  openapi_documents {
    document {
      path = "path-to-your-openapi-spec.yaml"
      contents = filebase64("path-to-your-openapi-spec.yaml")

# Cloud Run to deploy containerized applications
resource "google_cloud_run_service" "default" {
  provider     = google
  name         = "my-service"
  location     = "us-central1"
  project      = "your-project-id"

  template {
    spec {
      containers {
        image = ""

  traffic {
    percent         = 100
    latest_revision = true

# Ensure the service account has permissions to invoke the service
resource "google_cloud_run_service_iam_member" "invoker" {
  provider  = google
  service   =
  location  = google_cloud_run_service.default.location
  role      = "roles/run.invoker"
  member    = "serviceAccount:service-account-email"

In this script:

  • Replace path-to-your-service-account-key.json with the path to your service account key.
  • Replace your-project-id with your Google Cloud project ID.
  • Replace path-to-your-openapi-spec.yaml with the path to your OpenAPI spec file.
  • Replace with your container image URL.
  • Replace service-account-email with the email of the service account that should have permissions to invoke the Cloud Run service.

This script sets up an API Gateway to manage your APIs and a Cloud Run service to run your containerized application. Remember that this is just a starting point, and you’ll need to expand upon this configuration based on your specific platform requirements.

To run the provided Terraform script, follow these steps:

  1. Install Terraform: If you haven’t already, download and install Terraform.
  2. Google Cloud Project Setup:

  3. Service Account and Permissions:

    • Create a service account in your Google Cloud project and download the JSON key.
    • Assign necessary roles to the service account, such as roles/apikeys.admin for API Gateway and roles/run.admin for Cloud Run.

    you can create a service account and assign roles through the Google Cloud CLI (gcloud). Here are the steps and commands:

  4. Set your project in the gcloud CLI:

    gcloud config set project [PROJECT_ID]
  5. Create the service account:

    gcloud iam service-accounts create [SERVICE_ACCOUNT_NAME] --description="[DESCRIPTION]" --display-name="[DISPLAY_NAME]"
  6. Assign roles to the service account:

    gcloud projects add-iam-policy-binding [PROJECT_ID] --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role="roles/apikeys.admin"
    gcloud projects add-iam-policy-binding [PROJECT_ID] --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role="roles/run.admin"
  7. Generate the key file:

    gcloud iam service-accounts keys create [PATH_TO_SAVE_JSON_KEY] --iam-account [SERVICE_ACCOUNT_NAME]@[PROJECT_ID]

Replace [PROJECT_ID], [SERVICE_ACCOUNT_NAME], [DESCRIPTION], [DISPLAY_NAME], and [PATH_TO_SAVE_JSON_KEY] with your Google Cloud project ID, desired service account name, description, display name, and the path where you want to save the JSON key respectively.

This will create a new service account, assign it the necessary roles, and download the JSON key to the specified path.

  1. Prepare Terraform Files:

    • Save the Terraform script with a .tf extension, such as
    • Update the placeholders (like path-to-your-service-account-key.json, your-project-id, path-to-your-openapi-spec.yaml,, and service-account-email) with actual values.
  2. Initialize Terraform:

    • Open a terminal and navigate to the directory containing your Terraform file.
    • Run terraform init to initialize the directory based on the configuration files.
  3. Create a Terraform Plan:

    • Run terraform plan to see what actions Terraform will perform.
  4. Apply the Terraform Configuration:

    • If the plan looks correct, run terraform apply to apply the configuration.
    • Confirm the actions by typing yes when prompted.

After these steps, Terraform will communicate with Google Cloud to create the resources defined in your Terraform script. Always ensure that your Terraform configurations are secure and do not expose sensitive information, especially when versioning them with source control systems like git.

Written by Ilteris Kaplan who still lives and works in New York. Twitter