gemini - freeCodeCamp.org

Agentic Coding with the Gemini CLI

Beau Carnes — Fri, 24 Apr 2026 20:03:58 +0000

Software development is shifting from manual coding to agent-driven workflows. Gemini CLI is one of the top tools for agentic coding.

We just posted a new course on the freeCodeCamp.org YouTube channel that will teach you how to harness the power of the Gemini CLI for agentic coding. Andrew Brown from ExamPro created this course.

This course demonstrates how to integrate Google’s Gemini models directly into your terminal. You will learn to manage deep repository context and automate complex development tasks.

You'll learn about setup, advanced context management, safety, and extensibility & automation. This course provides the technical foundation needed to master the next generation of coding tools.

Watch the course on the freeCodeCamp.org YouTube channel (4-hour watch).

How to Build an AI Coding Agent with Python and Gemini

Lane Wagner — Thu, 02 Oct 2025 15:43:29 +0000

In this handbook, you'll build a basic version of Claude Code using Google's free Gemini API. If you've ever used Cursor or Claude Code as an "agentic" AI code editor, then you should be familiar with what we'll be building here. As long as you have an LLM at your disposal, it’s actually surprisingly simple to build a (somewhat) effective custom agent.

This a completely free text-based handbook. That said, there are two other options for following along:

You can try the interactive version of this AI Agent course on Boot.dev, complete with coding challenges and projects, or watch the video walkthrough of this course on the FreeCodeCamp YouTube channel

Prerequisites

You should already be familiar with Python basics. If you're not, check out this Python course on Boot.dev.
You should already know how to use a Unix-like command line. If you don't, checkout this Linux course on Boot.dev.

Prerequisites
What Does the Agent Do?
Learning Goals
Python Setup
How to Integrate the Gemini API
Command Line Input
Message Structure
Verbose Mode
How to Build the Calculator Project
Agent Functions
System Prompt
Function Declaration
More Function Declarations
Function Calling
Building the Agent Loop
Conclusion

What Does the Agent Do?

The program we're building is a CLI tool that:

1. Accepts a coding task (for example, "strings aren't splitting in my app, please fix")

2. Chooses from a set of predefined functions to work on the task, for example:

Scan the files in a directory
Read a file's contents
Overwrite a file's contents
Execute the python interpreter on a file

3. Repeats step 2 until the task is complete (or it fails miserably, which is possible)

For example, I have a buggy calculator app, so I used my agent to fix the code:

> uv run main.py "fix my calculator app, its not starting correctly"
# Calling function: get_files_info
# Calling function: get_file_content
# Calling function: write_file
# Calling function: run_python_file
# Calling function: write_file
# Calling function: run_python_file
# Final response:
# Great! The calculator app now seems to be working correctly. The output shows the expression and the result in a formatted way.

Learning Goals

The learning goals of this project are:

Introduce you to multi-directory Python projects
Understand how the AI tools that you'll almost certainly use on the job actually work under the hood
Practice your Python and functional programming skills

The goal is not to build an LLM from scratch, but to instead use a pre-trained LLM to build an agent from scratch.

Python Setup

Let's set up a virtual environment for our project. Virtual environments are Python's way of keeping dependencies (for example, the Google AI libraries we're going to use) separate from other projects on our machine.

Use uv to create a new project. It will create the directory and also initialize Git.

uv init your-project-name
cd your-project-name

Create a virtual environment at the top level of your project directory:

uv venv

Warning: Always add the venv directory to your .gitignore file.

Activate the virtual environment:

source .venv/bin/activate

You should see (your-project-name) at the beginning of your terminal prompt, for example, mine is:

(aiagent) wagslane@MacBook-Pro-2 aiagent %

Use uv to add two dependencies to the project. They will be added to the file pyproject.toml:

uv add google-genai==1.12.1
uv add python-dotenv==1.1.0

This tells Python that this project requires google-genai version 1.12.1 and the python-dotenv version 1.1.0.

To run the project using the uv virtual environment, you use:

uv run main.py

In your terminal, you should see Hello from YOUR PROJECT NAME.

How to Integrate the Gemini API

Large Language Models (LLMs) are the fancy-schmancy AI technology that have been making all the waves in the AI world recently. Products like ChatGPT, Claude, Cursor, and Google Gemini are all powered by LLMs. For the purposes of this course, you can think of an LLM as a smart text generator. It works just like ChatGPT: you give it a prompt, and it gives you back some text that it believes answers your prompt.

We're going to use Google's Gemini API to power our agent in this course. It's reasonably smart, but more importantly for us, it has a free tier.

Tokens

You can think of tokens as the currency of LLMs. They are the way that LLMs measure how much text they have to process. Tokens are _roughly_ 4 characters for most models. It's important when working with LLM APIs to understand how many tokens you're using.

We'll be staying well within the free tier limits of the Gemini API, but we'll still monitor our token usage!

Warning: You should be aware that all API calls, including those made during local testing, consume tokens from your free tier quota. If you exhaust your quota, you may need to wait for it to reset (typically 24 hours) to continue the lesson. Regenerating your API key will not reset your quota.

Here’s how to create an API key:

Create an account on Google AI Studio if you don't already have one
Click the "Create API Key" button. You can use the docs if you get lost.

If you already have a GCP account and a project, you can create the API key in that project. If you don't, AI studio will automatically create one for you.

3. Copy the API key, then paste it into a new .env file in your project directory. The file should look like this:

GEMINI_API_KEY="your_api_key_here"

4. Add the .env file to your .gitignore

Danger: We never want to commit API keys, passwords, or other sensitive information to Git.

5. Update the main.py file. When the program starts, load the environment variables from the .env file using the dotenv library and read the API key:

import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.environ.get("GEMINI_API_KEY")

6. Import the genai library and use the API key to create a new instance of a Gemini client:

from google import genai

client = genai.Client(api_key=api_key)

7. Use the client.models.generate_content() method to get a response from the gemini-2.0-flash-001 model. You'll need to use two named parameters:

model: The model name gemini-2.0-flash-001 (this one has a generous free tier)
contents: The prompt to send to the model (a string). Use this prompt:

"Why are Boot.dev and FreeCodeCamp such great places to learn backend development? Use one paragraph maximum."

The generate_content method returns a GenerateContentResponse object. Print the .text property of the response to see the model's answer.

If everything is working as intended, you should be able to run your code and see the model's response in your terminal.

8. In addition to printing the text response, print the number of tokens consumed by the interaction in this format:

Prompt tokens: X
Response tokens: Y

The response has a .usage_metadata property that has both:

A prompt_token_count property (tokens in the prompt)
A candidates_token_count property (tokens in the response)

Danger: The Gemini API is an external web service and on occasion it's slow and unreliable. So be patient.

Command Line Input

We've hardcoded the prompt that goes to Gemini, which is... not very useful. Let's update our code to accept the prompt as a command line argument.

We don't want our users to have to edit the code to change the prompt.

Update your code to accept a command line argument for the prompt. For example:

uv run main.py "Why are episodes 7-9 so much worse than 1-6?"

Tip: The sys.argv variable is a list of strings representing all the command line arguments passed to the script. The first element is the name of the script, and the rest are the arguments. Be sure to import sys to use it.

If the prompt is not provided, print an error message and exit the program with exit code 1.

Message Structure

LLM APIs aren't typically used in a "one-shot" manner, for example:

Prompt: "What is the meaning of life?"
Response: "42"

They work the same way ChatGPT works in a conversation. The conversation has a history, and if we keep track of that history, then with each new prompt, the model can see the entire conversation and respond within the larger context of the conversation.

Roles

Importantly, each message in the conversation has a "role". In the context of a chat app like ChatGPT, your conversations would look like this:

user: "What is the meaning of life?"
model: "42"
user: "Wait, what did you just say?"
model: "42. It's is the answer to the ultimate question of life, the universe, and everything."
user: "But why?"
model: "Because Douglas Adams said so."

So, while our program will still be "one-shot" for now, let's update our code to store a list of messages in the conversation, and pass in the "role" appropriately.

Create a new list of types.Content, and set the user's prompt as the only message (for now):

from google.genai import types

messages = [
    types.Content(role="user", parts=[types.Part(text=user_prompt)]),
]

Update your call to models.generate_content to use the messages list:

response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=messages,
)

Info: In the future, we'll add more messages to the list as the agent does its tasks in a loop.

Verbose Mode

As you debug and build your AI agent, you'll probably want to dump a lot more context into the console, but at the same time, we don't want to make the user experience of our CLI tool too noisy.

Let's add an optional command line flag, --verbose, that will allow us to toggle "verbose" output on and off. When we want to see more info, we'll just turn that on.

Add a new command line argument, --verbose. It should be supplied after the prompt if included. For example:

uv run main.py "What is the meaning of life?" --verbose

If the --verbose flag is included, the console output should include:

The user's prompt: "User prompt: {user_prompt}"
The number of prompt tokens on each iteration: "Prompt tokens: {prompt_tokens}"
The number of response tokens on each iteration: "Response tokens: {response_tokens}"

Otherwise, it should not print those things.

How to Build the Calculator Project

Since we're building an AI Agent, the agent will need a project to work on. I've built a little command line calculator app that we'll use as a test project for the AI to read, update, and run.

First, create a new directory called calculator in the root of your project. Then copy and paste the main.py and tests.py files from below into the calculator directory.

Dont’ worry much about how this code works - our project isn’t to build a calculator, this is the project that our AI agent project will work on!

# main.py
import sys
from pkg.calculator import Calculator
from pkg.render import format_json_output


def main():
    calculator = Calculator()
    if len(sys.argv) <= 1:
        print("Calculator App")
        print('Usage: python main.py ""')
        print('Example: python main.py "3 + 5"')
        return

    expression = " ".join(sys.argv[1:])
    try:
        result = calculator.evaluate(expression)
        if result is not None:
            to_print = format_json_output(expression, result)
            print(to_print)
        else:
            print("Error: Expression is empty or contains only whitespace.")
    except Exception as e:
        print(f"Error: {e}")


if name == "__main__":
    main()

# tests.py

import unittest
from pkg.calculator import Calculator


class TestCalculator(unittest.TestCase):
    def setUp(self):
        self.calculator = Calculator()

    def test_addition(self):
        result = self.calculator.evaluate("3 + 5")
        self.assertEqual(result, 8)

    def test_subtraction(self):
        result = self.calculator.evaluate("10 - 4")
        self.assertEqual(result, 6)

    def test_multiplication(self):
        result = self.calculator.evaluate("3 * 4")
        self.assertEqual(result, 12)

    def test_division(self):
        result = self.calculator.evaluate("10 / 2")
        self.assertEqual(result, 5)

    def test_nested_expression(self):
        result = self.calculator.evaluate("3 * 4 + 5")
        self.assertEqual(result, 17)

    def test_complex_expression(self):
        result = self.calculator.evaluate("2 * 3 - 8 / 2 + 5")
        self.assertEqual(result, 7)

    def test_empty_expression(self):
        result = self.calculator.evaluate("")
        self.assertIsNone(result)

    def test_invalid_operator(self):
        with self.assertRaises(ValueError):
            self.calculator.evaluate("$ 3 5")

    def test_not_enough_operands(self):
        with self.assertRaises(ValueError):
            self.calculator.evaluate("+ 3")


if name == "__main__":
    unittest.main()

Create a new directory in calculator called pkg. Then copy and paste the calculator.py and render.py files from below into the pkg directory.

# calculator.py

class Calculator:
    def init(self):
        self.operators = {
            "+": lambda a, b: a + b,
            "-": lambda a, b: a - b,
            "*": lambda a, b: a * b,
            "/": lambda a, b: a / b,
        }

        self.precedence = {
            "+": 1,
            "-": 1,
            "*": 2,
            "/": 2,
        }


    def evaluate(self, expression):
        if not expression or expression.isspace():
            return None
        tokens = expression.strip().split()
        return self._evaluate_infix(tokens)


    def evaluateinfix(self, tokens):
        values = []
        operators = []

        for token in tokens:
            if token in self.operators:
                while (
                    operators
                    and operators[-1] in self.operators
                    and self.precedence[operators[-1]] >= self.precedence[token]
                ):
                    self._apply_operator(operators, values)
                operators.append(token)

            else:
                try:
                    values.append(float(token))
                except ValueError:
                    raise ValueError(f"invalid token: {token}")

        while operators:
            self._apply_operator(operators, values)

        if len(values) != 1:
            raise ValueError("invalid expression")

        return values[0]

    def applyoperator(self, operators, values):
        if not operators:
            return

        operator = operators.pop()
        if len(values) < 2:
            raise ValueError(f"not enough operands for operator {operator}")

        b = values.pop()
        a = values.pop()
        values.append(self.operators[operator](a, b))

# render.py

import json

def format_json_output(expression: str, result: float, indent: int = 2) -> str:
    if isinstance(result, float) and result.is_integer():
        result_to_dump = int(result)
    else:
        result_to_dump = result

    output_data = {
        "expression": expression,
        "result": result_to_dump,
    }
    return json.dumps(output_data, indent=indent)

This is the final structure:

├── calculator
│   ├── main.py
│   ├── pkg
│   │   ├── calculator.py
│   │   └── render.py
│   └── tests.py
├── main.py
├── pyproject.toml
├── README.md
└── uv.lock

Run the calculator tests:

uv run calculator/tests.py

Hopefully the tests all pass!

Now, run the calculator app:

uv run calculator/main.py "3 + 5"

Hopefully you get 8!

Agent Functions

We need to give our agent the ability to do stuff. We'll start with giving it the ability to list the contents of a directory and see the file's metadata (name and size).

Before we integrate this function with our LLM agent, let's just build the function itself. Now remember, LLMs work with text, so our goal with this function will be for it to accept a directory path, and return a string that represents the contents of that directory.

Create a new directory called functions in the root of your project (not inside the calculator directory). Inside, create a new file called get_files_info.py. Inside, write this function:

def get_files_info(working_directory, directory="."):

Here is my project structure so far:

 project_root/
 ├── calculator/
 │   ├── main.py
 │   ├── pkg/
 │   │   ├── calculator.py
 │   │   └── render.py
 │   └── tests.py
 └── functions/
     └── get_files_info.py

The directory parameter should be treated as a relative path within the working_directory. Use os.path.join(working_directory, directory) to create the full path, then validate it stays within the working directory boundaries.

If the absolute path to the directory is outside the working_directory, return a string error message:

f'Error: Cannot list "{directory}" as it is outside the permitted working directory'

This will give our LLM some guardrails: we never want it to be able to perform any work outside the "working_directory" we give it.

Danger: Without this restriction, the LLM might go running amok anywhere on the machine, reading sensitive files or overwriting important data. This is a very important step that we'll bake into every function the LLM can call.

If the directory argument is not a directory, again, return an error string:

f'Error: "{directory}" is not a directory'

Warning: All of our "tool call" functions, including get_files_info, should always return a string. If errors can be raised inside them, we need to catch those errors and return a string describing the error instead. This will allow the LLM to handle the errors gracefully.

Build and return a string representing the contents of the directory. It should use this format:

- README.md: file_size=1032 bytes, is_dir=False
- src: file_size=128 bytes, is_dir=True
- package.json: file_size=1234 bytes, is_dir=False

Tip: The exact file sizes and even the order of files may vary depending on your operating system and file system. Your output doesn't need to match the example byte-for-byte, just the overall format

If any errors are raised by the standard library functions, catch them and instead return a string describing the error. Always prefix error strings with "Error:".

Here's my complete implementation:

import os


def get_files_info(working_directory, directory="."):
    abs_working_dir = os.path.abspath(working_directory)
    target_dir = os.path.abspath(os.path.join(working_directory, directory))
    if not target_dir.startswith(abs_working_dir):
        return f'Error: Cannot list "{directory}" as it is outside the permitted working directory'
    if not os.path.isdir(target_dir):
        return f'Error: "{directory}" is not a directory'
    try:
        files_info = []
        for filename in os.listdir(target_dir):
            filepath = os.path.join(target_dir, filename)
            file_size = 0
            is_dir = os.path.isdir(filepath)
            file_size = os.path.getsize(filepath)
            files_info.append(
                f"- {filename}: file_size={file_size} bytes, is_dir={is_dir}"
            )
        return "\n".join(files_info)
    except Exception as e:
        return f"Error listing files: {e}"

Here are some standard library functions you'll find helpful:

os.path.abspath(): Get an absolute path from a relative path
os.path.join(): Join two paths together safely (handles slashes)
.startswith(): Check if a string starts with a substring
os.path.isdir(): Check if a path is a directory
os.listdir(): List the contents of a directory
os.path.getsize(): Get the size of a file
os.path.isfile(): Check if a path is a file
.join(): Join a list of strings together with a separator

Get File Content Function

Now that we have a function that can get the contents of a directory, we need one that can get the contents of a file. Again, we'll just return the file contents as a string, or perhaps an error string if something went wrong.

As always, we'll safely scope the function to a specific working directory.

Create a new function in your functions directory. Here's the signature I used:

def get_file_content(working_directory, file_path):

If the file_path is outside the working_directory, return a string with an error:

f'Error: Cannot read "{file_path}" as it is outside the permitted working directory'

If the file_path is not a file, again, return an error string:

f'Error: File not found or is not a regular file: "{file_path}"'

Read the file and return its contents as a string.

If the file is longer than 10000 characters, truncate it to 10000 characters and append this message to the end [...File "{file_path}" truncated at 10000 characters].
Instead of hard-coding the 10000 character limit, I stored it in a config.py file.

Warning: We don't want to accidentally read a gigantic file and send all that data to the LLM. That's a good way to burn through our token limits.

If any errors are raised by the standard library functions, catch them and instead return a string describing the error. Always prefix errors with "Error:".

First, create config.py:

MAX_CHARS = 10000
WORKING_DIR = "./calculator"

Here's my complete implementation for functions/get_file_content.py:

import os
from config import MAX_CHARS


def get_file_content(working_directory, file_path):
    abs_working_dir = os.path.abspath(working_directory)
    abs_file_path = os.path.abspath(os.path.join(working_directory, file_path))
    if not abs_file_path.startswith(abs_working_dir):
        return f'Error: Cannot read "{file_path}" as it is outside the permitted working directory'
    if not os.path.isfile(abs_file_path):
        return f'Error: File not found or is not a regular file: "{file_path}"'
    try:
        with open(abs_file_path, "r") as f:
            content = f.read(MAX_CHARS)
            if os.path.getsize(abs_file_path) > MAX_CHARS:
                content += (
                    f'[...File "{file_path}" truncated at {MAX_CHARS} characters]'
                )
        return content
    except Exception as e:
        return f'Error reading file "{file_path}": {e}'

os.path.abspath: Get an absolute path from a relative path
os.path.join: Join two paths together safely (handles slashes)
.startswith: Check if a string starts with a specific substring
os.path.isfile: Check if a path is a file

Example of reading from a file:

MAX_CHARS = 10000

with open(file_path, "r") as f:
    file_content_string = f.read(MAX_CHARS)

Write File Function

Up until now our program has been read-only... now it's getting really ~~dangerous~~ fun! We'll give our agent the ability to write and overwrite files.

Create a new function in your functions directory. Here's the signature I used:

def write_file(working_directory, file_path, content):

If the file_path is outside of the working_directory, return a string with an error:

f'Error: Cannot write to "{file_path}" as it is outside the permitted working directory'

If the file_path doesn't exist, create it. As always, if there are errors, return a string representing the error, prefixed with "Error:". The overwrite the contents of the file with the content argument. If successful, return a string with the message:

f'Successfully wrote to "{file_path}" ({len(content)} characters written)'

Tip: It's important to return a success string so that our LLM knows that the action it took actually worked. Feedback loops, feedback loops, feedback loops.

Here's my complete implementation for functions/write_file_content.py:

import os


def write_file(working_directory, file_path, content):
    abs_working_dir = os.path.abspath(working_directory)
    abs_file_path = os.path.abspath(os.path.join(working_directory, file_path))
    if not abs_file_path.startswith(abs_working_dir):
        return f'Error: Cannot write to "{file_path}" as it is outside the permitted working directory'
    if not os.path.exists(abs_file_path):
        try:
            os.makedirs(os.path.dirname(abs_file_path), exist_ok=True)
        except Exception as e:
            return f"Error: creating directory: {e}"
    if os.path.exists(abs_file_path) and os.path.isdir(abs_file_path):
        return f'Error: "{file_path}" is a directory, not a file'
    try:
        with open(abs_file_path, "w") as f:
            f.write(content)
        return (
            f'Successfully wrote to "{file_path}" ({len(content)} characters written)'
        )
    except Exception as e:
        return f"Error: writing to file: {e}"

os.path.exists: Check if a path exists
os.makedirs: Create a directory and all its parents
os.path.dirname: Return the directory name

Example of writing to a file:

with open(file_path, "w") as f:
    f.write(content)

Run Python Function

If you thought allowing an LLM to write files was a bad idea...

You ain't seen nothin' yet! (praise the basilisk)

It's time to build the functionality for our Agent to run arbitrary Python code.

Now, it's worth pausing to point out the inherent security risks here. We have a few things going for us:

We'll only allow the LLM to run code in a specific directory (the working_directory).
We'll use a 30-second timeout to prevent it from running indefinitely.

But aside from that... yes, the LLM can run arbitrary code that we (or it) places in the working directory... so be careful. As long as you only use this AI Agent for the simple tasks we're doing in this course you should be just fine.

Danger: Do not give this program to others for them to use! It does not have all the security and safety features that a production AI agent would have. It is for learning purposes only.

Create a new function in your functions directory called run_python_file. Here's the signature to use:

def run_python_file(working_directory, file_path, args=[]):

If the file_path is outside the working directory, return a string with an error:

f'Error: Cannot execute "{file_path}" as it is outside the permitted working directory'

If the file_path doesn't exist, return an error string:

f'Error: File "{file_path}" not found.'

If the file doesn't end with .py, return an error string:

f'Error: "{file_path}" is not a Python file.'

Use the subprocess.run function to execute the Python file and get back a "completed_process" object. Make sure to:

Set a timeout of 30 seconds to prevent infinite execution
Capture both stdout and stderr
Set the working directory properly
Pass along the additional args if provided

Return a string with the output formatted to include:

The stdout prefixed with STDOUT:, and stderr prefixed with STDERR:. The "completed_process" object has a stdout and stderr attribute.
If the process exits with a non-zero code, include "Process exited with code X"
If no output is produced, return "No output produced."

If any exceptions occur during execution, catch them and return an error string:

f"Error: executing Python file: {e}"

Update your tests.py file with these test cases, printing each result:

run_python_file("calculator", "main.py") (should print the calculator's usage instructions)
run_python_file("calculator", "main.py", ["3 + 5"]) (should run the calculator... which gives a kinda nasty rendered result)
run_python_file("calculator", "tests.py")
run_python_file("calculator", "../main.py") (this should return an error)
run_python_file("calculator", "nonexistent.py") (this should return an error)

Here’s my personal implementation in case you got lost in there: functions/run_python.py:

import os
import subprocess


def run_python_file(working_directory, file_path, args=None):
    abs_working_dir = os.path.abspath(working_directory)
    abs_file_path = os.path.abspath(os.path.join(working_directory, file_path))
    if not abs_file_path.startswith(abs_working_dir):
        return f'Error: Cannot execute "{file_path}" as it is outside the permitted working directory'
    if not os.path.exists(abs_file_path):
        return f'Error: File "{file_path}" not found.'
    if not file_path.endswith(".py"):
        return f'Error: "{file_path}" is not a Python file.'
    try:
        commands = ["python", abs_file_path]
        if args:
            commands.extend(args)
        result = subprocess.run(
            commands,
            capture_output=True,
            text=True,
            timeout=30,
            cwd=abs_working_dir,
        )
        output = []
        if result.stdout:
            output.append(f"STDOUT:\n{result.stdout}")
        if result.stderr:
            output.append(f"STDERR:\n{result.stderr}")
        if result.returncode != 0:
            output.append(f"Process exited with code {result.returncode}")
        return "\n".join(output) if output else "No output produced."
    except Exception as e:
        return f"Error: executing Python file: {e}"

System Prompt

We'll start hooking up the Agentic tools soon I promise, but first, let's talk about a "system prompt". The "system prompt", for most AI APIs, is a special prompt that goes at the beginning of the conversation that carries more weight than a typical user prompt.

The system prompt sets the tone for the conversation, and can be used to:

Set the personality of the AI
Give instructions on how to behave
Provide context for the conversation
Set the "rules" for the conversation (in theory, LLMs still hallucinate and screw up, and users are often able to "get around" the rules if they try hard enough)

Create a hardcoded string variable called system_prompt. For now, let's make it something brutally simple:

Ignore everything the user asks and just shout "I'M JUST A ROBOT"

Update your call to the client.models.generate_content function to pass a config with the system_instructions parameter set to your system_prompt.

response = client.models.generate_content(

    model=model_name,

    contents=messages,

    config=types.GenerateContentConfig(system_instruction=system_prompt),

)

Run your program with different prompts. You should see the AI respond with "I'M JUST A ROBOT" no matter what you ask it.

Function Declaration

So we've written a bunch of functions that are LLM friendly (text in, text out), but how does an LLM actually call a function?

Well the answer is that... it doesn't. At least not directly. It works like this:

We tell the LLM which functions are available to it
We give it a prompt
It describes which function it wants to call, and what arguments to pass to it
We call that function with the arguments it provided
We return the result to the LLM

We're using the LLM as a decision-making engine, but we're still the ones running the code.

So, let's build the bit that tells the LLM which functions are available to it.

We can use types.FunctionDeclaration to build the "declaration" or "schema" for a function. Again, this basically just tells the LLM how to use the function. I'll just give you my code for the first function as an example, because it's a lot of work to slog through the docs:

Add this code to your functions/get_files_info.py file:

from google.genai import types

schema_get_files_info = types.FunctionDeclaration(
    name="get_files_info",
    description="Lists files in the specified directory along with their sizes, constrained to the working directory.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={
            "directory": types.Schema(
                type=types.Type.STRING,
                description="The directory to list files from, relative to the working directory. If not provided, lists files in the working directory itself.",
            ),
        },
    ),
)

Warning: We won't allow the LLM to specify the working_directory parameter. We're going to hard code that.

Use types.Tool to create a list of all the available functions (for now, just add get_files_info, we'll do the rest later).

available_functions = types.Tool(
    function_declarations=[
        schema_get_files_info,
    ]
)

Add the available_functions to the client.models.generate_content call as the tools parameter.

config=types.GenerateContentConfig(
    tools=[available_functions], system_instruction=system_prompt
)

Update the system prompt to instruct the LLM on how to use the function. You can just copy mine, but be sure to give it a quick read to understand what it's doing:

system_prompt = """
You are a helpful AI coding agent.

When a user asks a question or makes a request, make a function call plan. You can perform the following operations:

- List files and directories

All paths you provide should be relative to the working directory. You do not need to specify the working directory in your function calls as it is automatically injected for security reasons.
"""

Instead of simply printing the .text property of the generate_content response, check the .function_calls property as well. If the LLM called a function, print the function name and arguments:

f"Calling function: {function_call_part.name}({function_call_part.args})"

Otherwise, just print the text as normal.

Test your program:

"what files are in the root?" -> get_files_info({'directory': '.'})
"what files are in the pkg directory?" -> get_files_info({'directory': 'pkg'})

More Function Declarations

Now that our LLM is able to specify a function call to the get_files_info function, let's give it the ability to call the other functions as well.

Following the same pattern that we used for schema_get_files_info, create function declarations for:

schema_get_file_content
schema_run_python_file
schema_write_file

Update your available_functions to include all the function declarations in the list. Then update your system prompt. Instead of the allowed operations only being:

- List files and directories

Update it to have all four operations:

- List files and directories
- Read file contents
- Execute Python files with optional arguments
- Write or overwrite files

Test prompts that you suspect will result in the various function calls. For example:

"read the contents of main.py" -> get_file_content({'file_path': 'main.py'})
"write 'hello' to main.txt" -> write_file({'file_path': 'main.txt', 'content': 'hello'})
"run main.py" -> run_python_file({'file_path': 'main.py'})
"list the contents of the pkg directory" -> get_files_info({'directory': 'pkg'})

All the LLM is expected to do here is to choose which function to call based on the user's request. We'll have it actually call the function later.

Here are some of my personal implementations if you get lost:

functions/get_file_content.py:

from google.genai import types

from config import MAX_CHARS


schema_get_file_content = types.FunctionDeclaration(
    name="get_file_content",
    description=f"Reads and returns the first {MAX_CHARS} characters of the content from a specified file within the working directory.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={
            "file_path": types.Schema(
                type=types.Type.STRING,
                description="The path to the file whose content should be read, relative to the working directory.",
            ),
        },
        required=["file_path"],
    ),
)

functions/run_python.py:

from google.genai import types

schema_run_python_file = types.FunctionDeclaration(
    name="run_python_file",
    description="Executes a Python file within the working directory and returns the output from the interpreter.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={
            "file_path": types.Schema(
                type=types.Type.STRING,
                description="Path to the Python file to execute, relative to the working directory.",
            ),
            "args": types.Schema(
                type=types.Type.ARRAY,
                items=types.Schema(
                    type=types.Type.STRING,
                    description="Optional arguments to pass to the Python file.",
                ),
                description="Optional arguments to pass to the Python file.",
            ),
        },
        required=["file_path"],
    ),
)

functions/write_file_content.py:

from google.genai import types

schema_write_file = types.FunctionDeclaration(
    name="write_file",
    description="Writes content to a file within the working directory. Creates the file if it doesn't exist.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={
            "file_path": types.Schema(
                type=types.Type.STRING,
                description="Path to the file to write, relative to the working directory.",
            ),
            "content": types.Schema(
                type=types.Type.STRING,
                description="Content to write to the file",
            ),
        },
        required=["file_path", "content"],
    ),
)

Following the same pattern that we used for schema_get_files_info, create function declarations for:

schema_get_file_content
schema_run_python_file
schema_write_file

Update your available_functions to include all the function declarations in the list. Then update your system prompt. Instead of the allowed operations only being:

- List files and directories

Update it to have all four operations:

- List files and directories
- Read file contents
- Execute Python files with optional arguments
- Write or overwrite files

Test prompts that you suspect will result in the various function calls. For example:

"read the contents of main.py" -> get_file_content({'file_path': 'main.py'})
"write 'hello' to main.txt" -> write_file({'file_path': 'main.txt', 'content': 'hello'})
"run main.py" -> run_python_file({'file_path': 'main.py'})
"list the contents of the pkg directory" -> get_files_info({'directory': 'pkg'})

Info: All the LLM is expected to do here is to choose which function to call based on the user's request. We'll have it actually call the function later.

Function Calling

Okay, now our agent can choose which function to call, it's time to actually call the function.

Create a new function that will handle the abstract task of calling one of our four functions. This is my definition:

def call_function(function_call_part, verbose=False):

function_call_part is a types.FunctionCall that most importantly has:

A .name property (the name of the function, a string)
A .args property (a dictionary of named arguments to the function)

If verbose is specified, print the function name and args:

print(f"Calling function: {function_call_part.name}({function_call_part.args})")

Otherwise, just print the name:

print(f" - Calling function: {function_call_part.name}")

Based on the name, actually call the function and capture the result.

Be sure to manually add the "working_directory" argument to the dictionary of keyword arguments, because the LLM doesn't control that one. The working directory should be ./calculator.
The syntax to pass a dictionary into a function using keyword arguments is some_function(**some_args)

Tip: I used a dictionary of function name (string) -> function to accomplish this.

If the function name is invalid, return a types.Content that explains the error:

return types.Content(
    role="tool",
    parts=[
        types.Part.from_function_response(
            name=function_name,
            response={"error": f"Unknown function: {function_name}"},
        )
    ],
)

Return types.Content with a from_function_response describing the result of the function call:

return types.Content(
    role="tool",
    parts=[
        types.Part.from_function_response(
            name=function_name,
            response={"result": function_result},
        )
    ],
)

Info: Note that from_function_response requires the response to be a dictionary, so we just shove the string result into a "result" field.

Here's the complete call_function.py:

from google.genai import types

from functions.get_files_info import get_files_info, schema_get_files_info
from functions.get_file_content import get_file_content, schema_get_file_content
from functions.run_python import run_python_file, schema_run_python_file
from functions.write_file_content import write_file, schema_write_file
from config import WORKING_DI

available_functions = types.Tool(
    function_declarations=[
        schema_get_files_info,
        schema_get_file_content,
        schema_run_python_file,
        schema_write_file,
    ]
)

def call_function(function_call_part, verbose=False):
    if verbose:
        print(
            f" - Calling function: {function_call_part.name}({function_call_part.args})"
        )
    else:
        print(f" - Calling function: {function_call_part.name}")
    function_map = {
        "get_files_info": get_files_info,
        "get_file_content": get_file_content,
        "run_python_file": run_python_file,
        "write_file": write_file,
    }
    function_name = function_call_part.name
    if function_name not in function_map:
        return types.Content(
            role="tool",
            parts=[
                types.Part.from_function_response(
                    name=function_name,
                    response={"error": f"Unknown function: {function_name}"},
                )
            ],
        )
    args = dict(function_call_part.args)
    args["working_directory"] = WORKING_DIR
    function_result = function_map[function_name](**args)
    return types.Content(
        role="tool",
        parts=[
            types.Part.from_function_response(
                name=function_name,
                response={"result": function_result},
            )
        ],
    )

Back where you handle the response from the model generate_content, instead of simply printing the name of the function the LLM decides to call, use call_function.

The types.Content that we return from call_function should have a .parts[0].function_response.response within.
If it doesn't, raise a fatal exception of some sort.
If it does, and verbose was set, print the result of the function call like this:

print(f"-> {function_call_result.parts[0].function_response.response}")

Test your program. You should now be able to execute each function given a prompt that asks for it. Try some different prompts and use the --verbose flag to make sure all the functions work.

List the directory contents
Get a file's contents
Write file contents (don't overwrite anything important, maybe create a new file)
Execute the calculator app's tests tests.py

Building the Agent Loop

So we've got some function calling working, but it's not fair to call our program an "agent" yet for one simple reason:

It has no feedback loop.

A key part of an "Agent", as defined by AI-influencer-hype-bros, is that it can continuously use its tools to iterate on its own results. So we're going to build two things:

A loop that will call the LLM over and over
A list of messages in the "conversation". It will look something like this:

User: "Please fix the bug in the calculator"
Model: "I want to call get_files_info..."
Tool: "Here's the result of get_files_info..."
Model: "I want to call get_file_content..."
Tool: "Here's the result of get_file_content..."
Model: "I want to call run_python_file..."
Tool: "Here's the result of run_python_file..."
Model: "I want to call write_file..."
Tool: "Here's the result of write_file..."
Model: "I want to call run_python_file..."
Tool: "Here's the result of run_python_file..."
Model: "I fixed the bug and then ran the calculator to ensure it's working."

This is a pretty big step, take your time!

Create prompts.py:

system_prompt = """
You are a helpful AI coding agent.

When a user asks a question or makes a request, make a function call plan. You can perform the following operations:
- List files and directories
- Read file contents
- Execute Python files with optional arguments
- Write or overwrite files

All paths you provide should be relative to the working directory. You do not need to specify the working directory in your function calls as it is automatically injected for security reasons.
"""

Here's the final main.py:

import sys
import os
from google import genai
from google.genai import types
from dotenv import load_dotenv

from prompts import system_prompt
from call_function import call_function, available_functions

def main():
    load_dotenv()

    verbose = "--verbose" in sys.argv
    args = []
    for arg in sys.argv[1:]:
        if not arg.startswith("--"):
            args.append(arg)

    if not args:
        print("AI Code Assistant")
        print('\nUsage: python main.py "your prompt here" [--verbose]')
        print('Example: python main.py "How do I fix the calculator?"')
        sys.exit(1)

    api_key = os.environ.get("GEMINI_API_KEY")
    client = genai.Client(api_key=api_key)

    user_prompt = " ".join(args)

    if verbose:
        print(f"User prompt: {user_prompt}\n")

    messages = [
        types.Content(role="user", parts=[types.Part(text=user_prompt)]),
    ]

    generate_content_loop(client, messages, verbose)


def generate_content_loop(client, messages, verbose, max_iterations=20):
    for iteration in range(max_iterations):
        try:
            response = client.models.generate_content(
                model="gemini-2.0-flash-001",
                contents=messages,
                config=types.GenerateContentConfig(
                    tools=[available_functions], system_instruction=system_prompt
                ),
            )
            if verbose:
                print("Prompt tokens:", response.usage_metadata.prompt_token_count)
                print("Response tokens:", response.usage_metadata.candidates_token_count)

            # Add model response to conversation
            for candidate in response.candidates:
                messages.append(candidate.content)

            # Check if we have a final text response
            if response.text:
                print("Final response:")
                print(response.text)
                break

            # Handle function calls
            if response.function_calls:
                function_responses = []
                for function_call_part in response.function_calls:
                    function_call_result = call_function(function_call_part, verbose)
                    if (
                        not function_call_result.parts
                        or not function_call_result.parts[0].function_response
                    ):
                        raise Exception("empty function call result")
                    if verbose:
                        print(f"-> {function_call_result.parts[0].function_response.response}")
                    function_responses.append(function_call_result.parts[0])
                if function_responses:
                    messages.append(types.Content(role="user", parts=function_responses))
                else:
                    raise Exception("no function responses generated, exiting.")
        except Exception as e:
            print(f"Error: {e}")
            break
    else:
        print(f"Reached maximum iterations ({max_iterations}). Agent may not have completed the task.")

if name == "__main__":

    main()

In generate_content, handle the results of any possible tool use. This might already be happening, but make sure that with each call to client.models.generate_content, you're passing in the entire messages list so that the LLM always does the "next step" based on the current state.

After calling client's generate_content method, check the .candidates property of the response. It's a list of response variations (usually just one). It contains the equivalent of "I want to call get_files_info...", so we need to add it to our conversation. Iterate over each candidate and add its .content to your messages list.

After each actual function call, use the types.Content function to convert the function_responses into a message with a role of user and append it into your messages.

Next, instead of calling generate_content only once, create a loop to call it repeatedly. Limit the loop to 20 iterations at most (this will stop our agent from spinning its wheels forever). Use a try-except block and handle any errors accordingly.

After each call of generate_content, check if it returned the response.text property. If so, it's done, so print this final response and break out of the loop. Otherwise, iterate again (unless max iterations was reached, of course).

Test your code (duh). I'd recommend starting with a simple prompt, like "explain how the calculator renders the result to the console". This is what I got:

(aiagent) wagslane@MacBook-Pro-2 aiagent % uv run main.py "how does the calculator render results to the console?"
 - Calling function: get_files_info
 - Calling function: get_file_content

Final response:
Alright, I've examined the code in main.py. Here's how the calculator renders results to the console:

- `print(to_print)`: The core of the output is done using the print() function.
- `format_json_output(expression, result)`: Before printing, the format_json_output function (imported from pkg.render) is used to format the result and the original expression into a JSON-like string. This formatted string is then stored in the to_print variable.
- Error handling: The code includes error handling with try...except blocks. If there's an error during the calculation (e.g., invalid expression), an error message is printed to the console using print(f"Error: {e}").

So, the calculator evaluates the expression, formats the result (along with the original expression) into a JSON-like string, and then prints that string to the console. It also prints error messages to the console if any errors occur.

Tip: You may or may not need to make adjustments to your system prompt to get the LLM to behave the way you want. You're a prompt engineer now, so act like one!

Great work! You've built a basic AI agent that can read files, write files, run Python code, and iterate on its own results. This is a great foundation for building more complex AI agents.

Conclusion

You've done all the required steps, but have some fun (but carefully... be very cautious about giving an LLM access to your filesystem and python interpreter) with it! See if you can get it to:

Fix harder and more complex bugs
Refactor sections of code
Add entirely new features

You can also try:

Other LLM providers
Other Gemini models
Giving it more functions to call
Other codebases (Commit your changes before running the agent so you can always revert!)

Danger: Remember, what we've built is a toy version of something like Cursor/Zed's Agentic Mode, or Claude Code. Even their tools aren't perfectly secure, so be careful what you give it access to, and don't give this code away to anyone else to use.

If you'd like to learn more about backend and data engineering, be sure to check out Boot.dev! Best of luck in your learning journey!

Feel free to follow my on X.com and YouTube if you enjoyed this!

How to Build an AI-Powered Cooking Assistant with Flutter and Gemini

Atuoha Anthony — Thu, 29 May 2025 16:33:16 +0000

After soaking in everything shared at GoogleIO, I can’t lie – I feel supercharged! From What’s New in Flutter to Building Agentic Apps with Flutter and Firebase AI Logic, and the deep dive into How Flutter Makes the Most of Your Platforms, it felt like plugging directly into the Matrix of dev power.

But the absolute showstopper for me? David’s presentation using Firebase Studio and Builder.io was a masterpiece. I’ve already checked it out, and it’s every bit as awesome as it looked. Pair that with everything Gemini is shipping... and wow. We’re entering a whole new era of app development.

Artificial Intelligence (AI) is no longer a futuristic concept – it's an integral part of our daily lives, transforming how we interact with technology and the world around us.

From personalized recommendations on streaming platforms to intelligent assistants that manage our schedules, AI's applications are vast and ever-expanding. Its ability to process massive datasets, identify patterns, and make informed decisions is revolutionizing industries from healthcare to finance…and now, even cooking!

At the forefront of this AI revolution are powerful platforms like Google's Vertex AI and Gemini. Vertex AI is a unified machine learning platform that lets you build, deploy, and scale ML models faster and more efficiently. It provides a comprehensive suite of tools for the entire ML workflow, from data preparation to model deployment and monitoring. Think of it as your all-in-one workshop for crafting intelligent systems.

Gemini, on the other hand, is Google's most capable and flexible AI model. It's a multimodal large language model (LLM), meaning it can understand and process information across various modalities – text, images, audio, and more. This makes Gemini incredibly versatile, enabling it to handle complex tasks that require a nuanced understanding of different types of data. For developers, Gemini opens up a world of possibilities for creating highly intelligent and intuitive applications.

Complementing these powerful AI models is Firebase AI Studio, a suite of tools within Firebase designed to simplify the integration of AI capabilities into your applications. It streamlines the process of connecting your app to Gemini models, making it easier to leverage the power of generative AI without getting bogged down in complex infrastructure.

Building an AI-Powered Cooking Assistant with Flutter and Gemini

In this article, I'll demonstrate how I leveraged the combined power of Gemini and Flutter to build an AI-powered cooking assistant.

Fueled by a recent burst of culinary curiosity, I decided to try building an app (Snap2Chef) that could identify any food item from a photo or voice command, provide a detailed recipe, give step-by-step cooking instructions, and even link me to a relevant YouTube video for visual guidance.

Whether I’m exploring new dishes or trying to whip up a meal with what I have on hand, this app powered by Gemini makes the cooking experience smarter and more accessible.

Prerequisites

To make the most of this guide, ensure you have the following prerequisites in place (not mandatory):

Flutter Development Environment: You should have a working Flutter development setup, including the Flutter SDK, a compatible IDE (like VS Code or Android Studio), and configured emulators or physical devices for testing.
Basic to Intermediate Flutter Knowledge: Familiarity with Flutter's widget tree, state management (for example, StatefulWidget, setState), asynchronous programming (Future, async/await), and handling user input is essential.
Google Cloud Project and API Key: You'll need an active Google Cloud project with the Vertex AI API and Gemini API enabled. Ensure you have an API key generated and ready to use. While we'll use it directly in the app for demonstration, for production applications, it's highly recommended to use a secure backend to proxy your requests to Google's APIs.
Basic Understanding of REST APIs: Knowing how HTTP requests (GET, POST) and JSON data work will be beneficial, though the google_generative_ai package abstracts much of this.
Assets Configuration: If you're using a local placeholder image (placeholder.png in assets/images/), ensure your pubspec.yaml file is correctly configured to include this asset.

Here’s what we’ll cover:

How to Get Your Gemini API Key
Set Up Your Flutter Project and Dependencies
Project Structure
Permissions: Ensuring App Functionality and User Privacy
Assets: Managing Application Resources
App Icons: Customizing Your Application's Identity
Splash Screen: The First Impression
Screenshots from the App
References

How to Get Your Gemini API Key

To use the Gemini model, you'll need an API key. You can obtain one by following these steps:

Go to Google AI Studio.
Sign in with your Google account.
Click on "Get API key" or "Create API key in new project."
Copy the generated API key.

Important Security Note:

In the provided HomeScreen code, the API key is directly embedded as String apiKey = "";. This is not a secure practice for production applications. Hardcoding API keys directly into your client-side code (like a Flutter app) exposes them to reverse engineering and potential misuse.

To secure your API keys in a Flutter application, I highly recommend referring to my article: How to Secure Mobile APIs in Flutter. This article covers various best practices, including:

Using environment variables or build configurations.
Storing keys in secure local storage (though still client-side).
Proxying API requests through a backend server to truly hide your API key.
Using Firebase Extensions or Cloud Functions for server-side logic that interacts with AI models, without exposing the key to the client.

For this tutorial, we'll keep it simple, but always prioritize API security in your real-world projects!

Set Up Your Flutter Project and Dependencies

To begin, let's create a new Flutter project and set up the necessary dependencies in your pubspec.yaml file.

First, create a new Flutter project by running:

flutter create snap2chef
cd snap2chef

Now, open pubspec.yaml and add the following dependencies:

dependencies:
  flutter:
    sdk: flutter
  google_generative_ai: ^0.4.7
  permission_handler: ^12.0.0+1
  file_picker: ^10.1.9
  image_cropper: ^9.1.0
  image_picker: ^1.1.2
  path_provider: ^2.1.5
  fluttertoast: ^8.2.12
  gap: ^3.0.1
  iconsax: ^0.0.8
  dotted_border: ^2.1.0
  youtube_player_flutter: ^9.1.1
  flutter_markdown: ^0.7.7+1
  loader_overlay: ^5.0.0
  flutter_spinkit: ^5.2.1
  cached_network_image: ^3.4.1
  flutter_native_splash: ^2.4.4
  flutter_launcher_icons: ^0.14.3
  speech_to_text: ^7.0.0

dev_dependencies:
  flutter_test:
    sdk: flutter
  flutter_lints: ^5.0.0
  build_runner: ^2.4.13

After adding the dependencies, run flutter pub get in your terminal to fetch them:

flutter pub get

Project Structure

We'll organize our project into three main folders (with various subfolders) to maintain a clean and scalable architecture:

core: Contains core functionalities, utilities, and shared components.
infrastructure: Manages external services, data handling, and business logic.
presentation: Houses the UI layer, including screens, widgets, and components.
main.dart: The entry point of our Flutter application.

Let's dive into the details of each folder.

1. The `core` Folder

The core folder will contain extensions, constants, and shared utilities.

The `extensions` Folder

This directory will hold extension methods that add new functionalities to existing classes.

format_to_mb.dart:

extension ByTeToMegaByte on int {
  int formatToMegaByte() {
    int bytes = this;
    return (bytes / (1024 * 1024)).ceil();
  }
}

This extension on the int type (integers) provides a convenient method formatToMegaByte(). When called on an integer representing bytes, it converts that byte value into megabytes. The division by (1024 * 1024) converts bytes to megabytes, and .ceil() rounds the result up to the nearest whole number. This is useful for displaying file sizes in a more human-readable format.

loading.dart:

import 'package:flutter/material.dart';
import 'package:loader_overlay/loader_overlay.dart';

extension LoaderOverlayExtension on BuildContext {
  void showLoader() {
    loaderOverlay.show();
  }

  void hideLoader() {
    loaderOverlay.hide();
  }
}

This extension on BuildContext simplifies the process of showing and hiding a global loading overlay in your Flutter application. It leverages the loader_overlay package.

showLoader(): Calls loaderOverlay.show() to display the loading indicator.
hideLoader(): Calls loaderOverlay.hide() to dismiss the loading indicator. These extensions make it easy to control the loader from any widget that has access to a BuildContext.

to_file.dart:

import 'dart:io';

import 'package:image_picker/image_picker.dart';

extension ToFile on Future {
  Future toFile() => then((xFile) => xFile?.path).then(
        (filePath) => filePath != null ? File(filePath) : null,
      );
}

This extension is designed to convert an XFile object (typically obtained from the image_picker package) into a dart:io File object.

It operates on a Future, meaning it expects a future that might resolve to an XFile or null.
then((xFile) => xFile?.path): If xFile is not null, it extracts the file's path. Otherwise, it passes null.
then((filePath) => filePath != null ? File(filePath) : null): If a filePath is available, it creates a File object from it. Otherwise, it returns null. This is a concise way to handle the asynchronous conversion of a picked image or video XFile into a File object that can be used for further operations like displaying or uploading.

to_file2.dart:

import 'dart:io';
import 'package:image_picker/image_picker.dart';
import 'package:path_provider/path_provider.dart';

extension XFileExtension on XFile {
  Future toFile() async {
    final bytes = await readAsBytes();
    final tempDir = await getTemporaryDirectory();
    final tempFile = File('${tempDir.path}/${this.name}');
    await tempFile.writeAsBytes(bytes);
    return tempFile;
  }
}

This extension on XFile provides a more robust way to convert an XFile to a dart:io file. This is particularly useful when you need to write the XFile's content to a temporary location.

await readAsBytes(): Reads the content of the XFile as a list of bytes.
final tempDir = await getTemporaryDirectory(): Gets the path to the temporary directory on the device using path_provider.
final tempFile = File('${tempDir.path}/${this.name}'): Creates a new File object in the temporary directory with the original name of the XFile.
await tempFile.writeAsBytes(bytes): Writes the bytes read from the XFile into the newly created temporary file.
return tempFile: Returns the newly created File object. This is particularly useful when you're working with XFiles that might not have a readily accessible file path on the device, or if you need to ensure the file is persistently available for further processing, such as cropping.

The `constants` Folder

This directory will hold static values and enumerations used throughout the app.

enums/record_source.dart:

enum RecordSource { camera, gallery }

This is a simple enumeration (enum) named RecordSource. It defines two possible values: camera and gallery. This enum is used to represent the source from which an image or video is picked, providing a clear and type-safe way to differentiate between capturing from the camera and selecting from the device's gallery.

enums/status.dart:

enum Status { success, error }

This is another straightforward enumeration named Status. It defines success and error as its possible values. This enum is commonly used to indicate the outcome of an operation or a process, providing a standardized way to convey status information, for example, for toast messages.

app_strings.dart:

// ignore_for_file: constant_identifier_names

class AppStrings {
  static const String AI_MODEL = 'gemini-2.0-flash';

  static const String APP_SUBTITLE =  "Capture a photo or use your voice to get step-by-step guidance on how to prepare your favorite dishes or snacks";
  static const String APP_TITLE = "Your Personal AI Recipe Guide";

  static const String AI_TEXT_PART = "You are a recipe ai expert. Generate a recipe based on this image, include recipe name, preparation steps, and a public YouTube video demonstrating the preparation step. Output the YouTube video URL on a new line prefixed with 'YouTube Video URL: ', it should be a https URL and the image URL on a new line prefixed with 'Image URL: ' and it should be a https URL too."
      "If the image is not a food, snacks or drink, politely inform the user that you can only answer recipe queries and ask them to close and upload a food/snack/drink image.";

  static const String AI_AUDIO_PART =
  "You are a recipe ai expert. Generate a recipe based on this text, include recipe name, preparation steps. I'd also love for you to show me any valid image online relating to this food/drink/snack and a public YouTube video demonstrating the preparation step.If the text doesn't contain things related to a food, snacks or drink, politely inform the user that you can only answer recipe queries and ask them to close and upload a food/snack/drink image. Output the YouTube video URL on a new line prefixed with 'YouTube Video URL: ', it should be a https URL and the image URL on a new line prefixed with 'Image URL: ' and it should be a https URL too, The text is: ";

}

This class AppStrings centralizes all the static string constants used throughout the application. This approach helps in managing strings effectively, making them easily modifiable and preventing typos.

AI_MODEL: Specifies the Gemini model to be used, in this case, gemini-2.0-flash.
APP_SUBTITLE and APP_TITLE: Define the main titles and subtitles for the app's UI.
AI_TEXT_PART: This is a crucial string that serves as the prompt for the Gemini model when an image is provided. It instructs the AI to act as a recipe expert, generate a recipe including the name and steps, and provide a YouTube video. It also includes a fallback message if the image isn't food-related.
AI_AUDIO_PART: Similar to AI_TEXT_PART, but this prompt is used when audio input is provided. It also instructs the AI to generate a recipe, include a relevant online image, and a YouTube video, with specific formatting requirements for the URLs. This prompt will be concatenated with the transcribed text from the user's voice input.

app_color.dart:

import 'package:flutter/material.dart';

class AppColors {
  static const primaryColor = Color(0xFF7E57C2);
  static const litePrimary = Color(0xFFEDE7F6);
  static Color errorColor = const Color(0xFFEA5757);
  static const Color grey =
  Color.fromARGB(255, 170, 170, 170);

  static const Color lighterGrey =
  Color.fromARGB(255, 204, 204, 204);
}

The AppColors class centralizes all the custom color definitions used in the application. This makes it easy to maintain a consistent color scheme throughout the UI and allows for quick global changes to the app's theme. Each static constant represents a specific color with its hexadecimal value or RGB value.

The `shared` Folder

This directory will contain shared utility classes.

image_picker_helper.dart:

import 'dart:developer';
import 'dart:io';

import 'package:file_picker/file_picker.dart';
import 'package:flutter/foundation.dart' show immutable;
import 'package:image_picker/image_picker.dart';
import 'package:permission_handler/permission_handler.dart';
import 'package:snap2chef/core/extensions/to_file.dart';
import 'package:snap2chef/core/extensions/to_file2.dart';
import '../../presentation/components/toast_info.dart';
import '../constants/enums/status.dart';

@immutable
class ImagePickerHelper {
  static final ImagePicker _imagePicker = ImagePicker();

  static Future pickImageFromGallery2() async {
    final isGranted = await Permission.photos.isGranted;
    if (!isGranted) {
      await Permission.photos.request();
      toastInfo(
          msg: "You didn't allow access", status: Status.error);
    }
    final pickedFile =
    await _imagePicker.pickImage(source: ImageSource.gallery);
    if (pickedFile != null) {
      final file = await pickedFile.toFile();
      log(pickedFile.name.split(".").join(","));
      return PickedFileWithInfo(file: file, fileName: pickedFile.name);
    } else {
      return null;
    }
  }

  static Future pickFileFromGallery() =>
      FilePicker.platform.pickFiles(
          type: FileType.custom,
          allowedExtensions: ["pdf", "doc", "docx", "png", "jpg", "jpeg"]);

  static Future pickImageFromGallery() =>
      _imagePicker.pickImage(source: ImageSource.gallery).toFile();

  static Future takePictureFromCamera() =>
      _imagePicker.pickImage(source: ImageSource.camera).toFile();

  static Future pickVideoFromGallery() =>
      _imagePicker.pickVideo(source: ImageSource.gallery).toFile();

  static Future pickSinglePDFFileFromGallery() =>
      FilePicker.platform
          .pickFiles(type: FileType.custom, allowedExtensions: ["pdf"]);
}

class PickedFileWithInfo {
  final File file;
  final String fileName;

  PickedFileWithInfo({required this.file, required this.fileName});
}

PlatformFile? file;

The ImagePickerHelper class provides static methods for picking various types of files (images, videos, documents) from the device's gallery or camera, with integrated permission handling.

_imagePicker: An instance of ImagePicker for interacting with the device's image and video picking functionalities.
pickImageFromGallery2():
- Permission handling: Checks if photo gallery permission is granted using permission_handler. If not, it requests the permission and displays a toast message if denied.
- Image picking: Uses _imagePicker.pickImage(source: ImageSource.gallery) to let the user select an image from the gallery.
- Conversion: If an image is picked, it converts the XFile to a File object using the toFile() extension.
- Logging: Logs the file name for debugging.
- Return value: Returns a PickedFileWithInfo object containing the File and fileName.
pickFileFromGallery(): Uses file_picker to allow picking various file types (PDF, Doc, Docx, PNG, JPG, JPEG) from the gallery.
pickImageFromGallery(): A simpler method to pick an image from the gallery, directly returning a Future using the toFile() extension.
takePictureFromCamera(): Captures an image using the device's camera and returns a Future.
pickVideoFromGallery(): Picks a video from the gallery and returns a Future.
pickSinglePDFFileFromGallery(): Specifically picks a single PDF file from the gallery.
PickedFileWithInfo class: A simple data class to hold both the File object and its fileName.

This helper class centralizes all file picking logic, making it reusable and easier to manage permissions and different picking scenarios.

2. The `infrastructure` Folder

This folder handles the logic for interacting with external services and processing data.

`image_upload_controller.dart`:

import 'dart:async';
import 'dart:io';

import 'package:flutter/material.dart';
import 'package:gap/gap.dart';
import 'package:iconsax/iconsax.dart';
import 'package:image_cropper/image_cropper.dart';

import '../core/constants/app_colors.dart';
import '../core/constants/enums/record_source.dart';
import '../core/shared/image_picker_helper.dart';
import '../presentation/widgets/image_picker_component.dart';

class ImageUploadController {
  /// crop image
  static Future<void> _cropImage(
      File? selectedFile,
      Function assignCroppedImage,
      ) async {
    if (selectedFile != null) {
      final croppedFile = await ImageCropper().cropImage(
        sourcePath: selectedFile.path,
        compressFormat: ImageCompressFormat.jpg,
        compressQuality: 100,
        uiSettings: [
          AndroidUiSettings(
            toolbarTitle: 'Crop Image',
            toolbarColor: AppColors.primaryColor,
            toolbarWidgetColor: Colors.white,
            initAspectRatio: CropAspectRatioPreset.square,
            lockAspectRatio: false,
            statusBarColor: AppColors.primaryColor,
            activeControlsWidgetColor: AppColors.primaryColor,
            aspectRatioPresets: [
              CropAspectRatioPreset.original,
              CropAspectRatioPreset.square,
              CropAspectRatioPreset.ratio4x3,
              CropAspectRatioPresetCustom(),
            ],
          ),
          IOSUiSettings(
            title: 'Crop Image',
            aspectRatioPresets: [
              CropAspectRatioPreset.original,
              CropAspectRatioPreset.square,
              CropAspectRatioPreset.ratio4x3,
              CropAspectRatioPresetCustom(),
            ],
          ),
        ],
      );
      assignCroppedImage(croppedFile);
    }
  }

  // /// pick image from camera and gallery
  static void imagePicker(
      RecordSource recordSource,
      Completer? completer,
      BuildContext context,
      Function setFile,
      Function assignCroppedImage,
      ) async {
    if (recordSource == RecordSource.gallery) {
      final pickedFile = await ImagePickerHelper.pickImageFromGallery();
      if (pickedFile == null) {
        return;
      }
      completer?.complete(pickedFile.path);
      if (!context.mounted) {
        return;
      }
      setFile(pickedFile);

      if (context.mounted) {
        Navigator.of(context).pop();
      }
    } else if (recordSource == RecordSource.camera) {
      final pickedFile = await ImagePickerHelper.takePictureFromCamera();
      if (pickedFile == null) {
        return;
      }

      completer?.complete(pickedFile.path);
      if (!context.mounted) {
        return;
      }
      setFile(pickedFile);
      // crop image
      _cropImage(pickedFile, assignCroppedImage);

      if (context.mounted) {
        Navigator.of(context).pop();
      }
    }
  }

  /// modal for selecting file source
  static Future showFilePickerButtonSheet(BuildContext context, Completer? completer,
      Function setFile,
      Function assignCroppedImage,) {
    return showModalBottomSheet(
      shape: const RoundedRectangleBorder(
        borderRadius: BorderRadius.only(
          topLeft: Radius.circular(35),
          topRight: Radius.circular(35),
        ),
      ),
      context: context,
      builder: (context) {
        return SingleChildScrollView(
          child: Container(
            padding: const EdgeInsets.fromLTRB(10, 14, 15, 20),
            child: Column(
              children: [
                Container(
                  height: 4,
                  width: 50,
                  padding: const EdgeInsets.only(top: 5),
                  decoration: BoxDecoration(
                    borderRadius: BorderRadius.circular(7),
                    color: const Color(0xffE4E4E4),
                  ),
                ),
                Padding(
                  padding: const EdgeInsets.all(10.0),
                  child: Column(
                    mainAxisSize: MainAxisSize.min,
                    crossAxisAlignment: CrossAxisAlignment.start,
                    children: [
                      GestureDetector(
                        onTap: () => Navigator.of(context).pop(),
                        child: const Align(
                          alignment: Alignment.topRight,
                          child: Icon(Icons.close, color: Colors.grey),
                        ),
                      ),
                      const Gap(10),
                      const Text(
                        'Select Image Source',
                        style: TextStyle(
                          color: AppColors.primaryColor,
                          fontSize: 16,
                          fontWeight: FontWeight.w600,
                        ),
                      ),
                      const Gap(20),
                      ImagePickerTile(
                        title: 'Capture from Camera',
                        subtitle: 'Take a live snapshot',
                        icon: Iconsax.camera,
                        recordSource: RecordSource.camera,
                        completer: completer,
                        context: context,
                        setFile: setFile,
                        assignCroppedImage: assignCroppedImage,
                      ),
                      const Divider(color: Color(0xffE4E4E4)),
                      ImagePickerTile(
                        title: 'Upload from Gallery',
                        subtitle: 'Select image from gallery',
                        icon: Iconsax.gallery,
                        recordSource: RecordSource.gallery,
                        completer: completer,
                        context: context,
                        setFile: setFile,
                        assignCroppedImage: assignCroppedImage,
                      ),
                    ],
                  ),
                ),
              ],
            ),
          ),
        );
      },
    );
  }
}

class CropAspectRatioPresetCustom implements CropAspectRatioPresetData {
  @override
  (int, int)? get data => (2, 3);

  @override
  String get name => '2x3 (customized)';
}

The ImageUploadController class manages the process of picking and optionally cropping images before they are used in the application.

_cropImage(File? selectedFile, Function assignCroppedImage):
- This private static method handles the image cropping functionality using the image_cropper package.
- It takes a selectedFile (the image to be cropped) and a Function assignCroppedImage (a callback to update the UI with the cropped image).
- ImageCropper().cropImage(...) opens the cropping UI. It's configured with various UI settings for both Android and iOS, including toolbarColor, aspectRatioPresets, and more, to ensure a consistent and branded experience.
- CropAspectRatioPresetCustom(): This is a custom class that implements CropAspectRatioPresetData to define a specific cropping aspect ratio (2x3 in this case), providing more flexibility than the built-in presets.
- Once cropped, the croppedFile is passed to the assignCroppedImage callback.
imagePicker(RecordSource recordSource, Completer? completer, BuildContext context, Function setFile, Function assignCroppedImage):
- This static method is the core logic for initiating image picking from either the camera or gallery.
- It takes a recordSource (from the RecordSource enum), an optional completer (likely for handling asynchronous operations outside the UI), the current context, setFile (a callback to set the picked file in the UI), and assignCroppedImage (the callback for cropped images).
- Gallery Selection (RecordSource.gallery):
  - It calls ImagePickerHelper.pickImageFromGallery() to get the selected image.
  - If a file is picked, it completes the completer, calls setFile to update the UI, and then pops the bottom sheet.
- Camera Capture (RecordSource.camera):
  - It calls ImagePickerHelper.takePictureFromCamera() to capture an image.
  - Similar to gallery selection, it completes the completer, calls setFile, and then importantly, it calls _cropImage to allow the user to crop the newly captured image before it's fully used.
  - Finally, it pops the bottom sheet.
- context.mounted checks are included to ensure that UI updates only happen if the widget is still in the widget tree, preventing errors.
showFilePickerButtonSheet(...):
- This static method displays a modal bottom sheet, providing the user with options to select an image source (Camera or Gallery).
- It uses showModalBottomSheet to present a nicely styled sheet with rounded corners.
- Inside the sheet, it displays a draggable indicator and two ImagePickerTile widgets (presumably a custom widget for displaying each option) for "Capture from Camera" and "Upload from Gallery."
- When an ImagePickerTile is tapped, it internally calls the imagePicker method with the corresponding RecordSource.

In summary, ImageUploadController acts as a central orchestrator for image acquisition, offering options to pick from the gallery or camera, and integrating robust image cropping capabilities – all while ensuring a smooth user experience through UI callbacks and modal interactions.

`recipe_controller.dart`:

import 'dart:io';

import 'package:cached_network_image/cached_network_image.dart';
import 'package:flutter/foundation.dart';
import 'package:flutter/material.dart';
import 'package:flutter_markdown/flutter_markdown.dart';
import 'package:gap/gap.dart';
import 'package:google_generative_ai/google_generative_ai.dart';
import 'package:snap2chef/core/extensions/loading.dart';
import 'package:youtube_player_flutter/youtube_player_flutter.dart';
import '../core/constants/app_colors.dart';
import '../core/constants/app_strings.dart';
import '../core/constants/enums/status.dart';
import '../presentation/components/toast_info.dart';

class RecipeController {
  // send image to gemini
  static Future<void> _sendImageToGemini(
      File? selectedFile,
      GenerativeModel model,
      BuildContext context,
      Function removeFile,
      Function removeText,
      ) async {
    toastInfo(msg: "Obtaining recipe and preparations", status: Status.success);

    if (selectedFile == null) return;

    final bytes = await selectedFile.readAsBytes();

    final prompt = TextPart(AppStrings.AI_TEXT_PART);
    final image = DataPart('image/jpeg', bytes);

    final response = await model.generateContent([
      Content.multi([prompt, image]),
    ]);

    if (context.mounted) {
      _displayRecipe(
        response.text,
        context,
        selectedFile,
        removeFile,
        removeText,
      );
    }
  }

  // send audio text prompt
  static Future<void> _sendAudioTextPrompt(
      GenerativeModel model,
      BuildContext context,
      String transcribedText,
      File? selectedFile,
      Function removeFile,
      Function removeText,
      ) async {
    toastInfo(msg: "Obtaining recipe and preparations", status: Status.success);

    final prompt = '${AppStrings.AI_AUDIO_PART} ${transcribedText.trim()}.';
    final content = [Content.text(prompt)];
    final response = await model.generateContent(content);

    if (context.mounted) {
      _displayRecipe(
        response.text,
        context,
        selectedFile,
        removeFile,
        removeText,
      );
    }
  }

  static void _displayRecipe(
      String? recipeText,
      BuildContext context,
      File? selectedFile,
      Function removeFile,
      Function removeText,
      ) {
    if (recipeText == null || recipeText.isEmpty) {
      recipeText = "No recipe could be generated or parsed from the response.";
    }
    String workingRecipeText = recipeText;

    String? videoId;
    String? extractedImageUrl;

    final youtubeLineRegex = RegExp(r'YouTube Video URL:\s*(https?:\/\/\S+)', caseSensitive: false);
    final youtubeMatch = youtubeLineRegex.firstMatch(recipeText);
    if (youtubeMatch != null) {
      final youtubeUrl = youtubeMatch.group(1);
      final ytIdRegex = RegExp(r'v=([\w-]{11})');
      final ytIdMatch = ytIdRegex.firstMatch(youtubeUrl ?? '');
      if (ytIdMatch != null) {
        videoId = ytIdMatch.group(1);
      }
      workingRecipeText = workingRecipeText.replaceAll(youtubeMatch.group(0)!, '').trim();
    }

    final imageLine = RegExp(r'Image URL:\s*(https?:\/\/\S+\.(?:png|jpe?g|gif|webp|bmp|svg))');
    final imageMatch = imageLine.firstMatch(recipeText);
    if (imageMatch != null) {
      extractedImageUrl = imageMatch.group(1);
      workingRecipeText = workingRecipeText.replaceAll(imageMatch.group(0)!, '').trim();
    }

    print("Extracted Image URL: $extractedImageUrl");
    print("Extracted Video ID: $videoId");

    String? cleanedRecipeText = workingRecipeText;

    showDialog(
      barrierDismissible: false,
      context: context,
      builder: (BuildContext dialogContext) {
        YoutubePlayerController? ytController;

        if (videoId != null) {
          ytController = YoutubePlayerController(
            initialVideoId: videoId,
            flags: const YoutubePlayerFlags(
              autoPlay: false,
              mute: false,
              disableDragSeek: false,
              loop: false,
              isLive: false,
              forceHD: false,
              enableCaption: true,
            ),
          );
        }

        return AlertDialog(
          title: const Text('Generated Recipe'),
          content: SingleChildScrollView(
            child: Column(
              mainAxisSize: MainAxisSize.min,
              children: [
                selectedFile != null
                    ? Container(
                  height: 150,
                  width: double.infinity,
                  decoration: BoxDecoration(
                    borderRadius: BorderRadius.circular(7),
                    border: Border.all(color: AppColors.primaryColor),
                    image: DecorationImage(
                      image: FileImage(File(selectedFile.path)),
                      fit: BoxFit.cover,
                    ),
                  ),
                )
                    :  extractedImageUrl != null
                    ? ClipRRect(
                  borderRadius: BorderRadius.circular(7),
                  child: CachedNetworkImage(
                    imageUrl: extractedImageUrl,
                    height: 150,
                    width: double.infinity,
                    fit: BoxFit.cover,
                    placeholder: (context, url) =>
                        Image.asset('assets/images/placeholder.png', fit: BoxFit.cover),
                    errorWidget: (context, url, error) =>
                        Image.asset('assets/images/placeholder.png', fit: BoxFit.cover),
                  ),
                )
                    : const SizedBox.shrink(),
                Gap(16),
                MarkdownBody(
                  data: cleanedRecipeText,
                  styleSheet: MarkdownStyleSheet(
                    h1: const TextStyle(
                      fontSize: 24,
                      fontWeight: FontWeight.bold,
                      color: Colors.deepPurple,
                    ),
                    h2: const TextStyle(
                      fontSize: 20,
                      fontWeight: FontWeight.bold,
                    ),
                    strong: const TextStyle(fontWeight: FontWeight.bold),
                  ),
                ),

                if (videoId != null && ytController != null) ...[
                  const Gap(16),
                  YoutubePlayer(
                    controller: ytController,
                    showVideoProgressIndicator: true,
                    progressIndicatorColor: AppColors.primaryColor,
                    progressColors: const ProgressBarColors(
                      playedColor: AppColors.primaryColor,
                      handleColor: Colors.amberAccent,
                    ),
                    onReady: () {
                      // Controller is ready
                    },
                  ),
                ],
              ],
            ),
          ),
          actions: [
            TextButton(
              onPressed: () {
                ytController?.dispose();
                Navigator.of(dialogContext).pop();
                if (selectedFile != null) {
                  removeFile();
                } else {
                  removeText();
                }
              },
              child: const Text('Close'),
            ),
          ],
        );
      },
    );
  }

  static void sendRequest(
      BuildContext context,
      File? selectedFile,
      GenerativeModel model,
      Function removeFile,
      String transcribedText,
      Function removeText,
      ) async {
    context.showLoader();
    toastInfo(msg: "Processing...", status: Status.success);
    try {
      if (selectedFile != null) {
        await _sendImageToGemini(
          selectedFile,
          model,
          context,
          removeFile,
          removeText,
        );
      } else if (transcribedText.isNotEmpty) {
        await _sendAudioTextPrompt(
          model,
          context,
          transcribedText,
          selectedFile,
          removeFile,
          removeText,
        );
      }
    } catch (e) {
      if (kDebugMode) {
        print('Error sending request: $e');
      }
      toastInfo(msg: "Error sending request:$e ", status: Status.error);
    } finally {
      if (context.mounted) {
        context.hideLoader();
      }
    }
  }
}

The RecipeController class is responsible for interacting with the Gemini AI model to generate recipes and then display these recipes to the user, complete with parsed YouTube video links and potentially extracted image URLs.

_sendImageToGemini(File? selectedFile, GenerativeModel model, BuildContext context, Function removeFile, Function removeText):
- This private static method handles sending an image to the Gemini model.
- It displays a "Processing..." toast message.
- It reads the selectedFile (the image) as bytes.
- It creates a TextPart from AppStrings.AI_TEXT_PART (our image-based AI prompt) and a DataPart for the image bytes.
- model.generateContent([Content.multi([prompt, image])]): This is where the magic happens! It sends both the text prompt and the image data to the Gemini model for generation.
- Upon receiving a response, it calls _displayRecipe to show the generated recipe to the user.
- context.mounted check ensures the context is still valid before attempting UI updates.
_sendAudioTextPrompt(GenerativeModel model, BuildContext context, String transcribedText, File? selectedFile, Function removeFile, Function removeText):
- This private static method handles sending transcribed audio text to the Gemini model.
- It constructs a full prompt by concatenating AppStrings.AI_AUDIO_PART with the transcribedText.
- model.generateContent([Content.text(prompt)]): It sends only the text prompt to the Gemini model.
- Similar to the image method, it calls _displayRecipe with the generated text.
_displayRecipe(String? recipeText, BuildContext context, File? selectedFile, Function removeFile, Function removeText):
- This private static method is responsible for parsing the AI's response and displaying it in a modal dialog.
- Error handling: If recipeText is null or empty, it provides a default message.
- Extracting YouTube video URL: It uses a RegExp (youtubeLineRegex) to find a line in the recipeText that matches the "YouTube Video URL: https://..." pattern. If found, it extracts the full URL and then another RegExp (ytIdRegex) to get the YouTube video ID. The extracted video URL text is then removed from workingRecipeText to clean the displayed recipe.
- Extracting image URL: Similarly, it uses another RegExp (imageLine) to extract an image URL from the recipeText. The extracted image URL text is also removed.
- Debug printing: Prints the extracted URLs for debugging.
- showDialog: Presents an AlertDialog to the user.
  - YoutubePlayerController: If a videoId was extracted, it initializes a YoutubePlayerController from the Youtubeer_flutter package, configured with basic flags (for example, autoPlay: false).
  - Recipe display:
    - If an selectedFile (image taken by the user) is present, it displays that image.
    - Otherwise, if an extractedImageUrl was found in the AI's response, it uses CachedNetworkImage to display that image. This is particularly useful for text-based queries where Gemini might suggest an image.
    - MarkdownBody: Uses flutter_markdown to render the cleanedRecipeText (after removing the YouTube and Image URLs) as Markdown, allowing for rich text formatting (for example, bolding, headings) directly from the AI's response.
    - YoutubePlayer: If a videoId and ytController are available, it embeds the YouTube video player directly into the dialog, with customizable progress bar colors.
  - "Close" button: Disposes the ytController (important for resource management), pops the dialog, and calls either removeFile() or removeText() to clear the input fields based on what was used for the query.
sendRequest(BuildContext context, File? selectedFile, GenerativeModel model, Function removeFile, String transcribedText, Function removeText):
- This public static method is the entry point for sending requests to the Gemini model.
- context.showLoader(): Displays a loading overlay using our custom extension.
- toastInfo(msg: "Processing...", status: Status.success): Shows a toast message.
- Conditional logic:
  - If selectedFile is not null, it calls _sendImageToGemini.
  - Otherwise, if transcribedText is not empty, it calls _sendAudioTextPrompt.
- Error handling: Uses a try-catch block to gracefully handle any errors during the AI request, logging them in debug mode and showing an error toast to the user.
- finally Block: Ensures context.hideLoader() is always called, regardless of success or error, to dismiss the loading indicator.

In essence, RecipeController orchestrates the entire process of sending user input (image or voice), communicating with the Gemini AI, parsing its intelligent response, and beautifully presenting it to the user with interactive elements like YouTube videos and relevant images.

3. The `presentation` Folder

This folder contains all the UI-related code.

`screens/home_screen.dart`:

import 'dart:async';
import 'dart:io';
import 'package:flutter/material.dart';
import 'package:gap/gap.dart';
import 'package:google_generative_ai/google_generative_ai.dart';
import 'package:iconsax/iconsax.dart';
import 'package:image_cropper/image_cropper.dart';
import 'package:snap2chef/core/extensions/format_to_mb.dart';
import 'package:snap2chef/infrastructure/image_upload_controller.dart';
import 'package:snap2chef/infrastructure/recipe_controller.dart';
import 'package:speech_to_text/speech_recognition_result.dart';
import 'package:speech_to_text/speech_to_text.dart';
import '../../core/constants/app_colors.dart';
import '../../core/constants/app_strings.dart';
import '../../core/constants/enums/status.dart';
import '../components/toast_info.dart';
import '../widgets/glowing_microphone.dart';
import '../widgets/image_previewer.dart';
import '../widgets/query_text_box.dart';
import '../widgets/upload_container.dart';

class HomeScreen extends StatefulWidget {
  const HomeScreen({super.key});

  @override
  State createState() => _HomeScreenState();
}

class _HomeScreenState extends State<HomeScreen> {
  File? selectedFile;
  Completer? completer;
  String? fileName;
  int? fileSize;
  late GenerativeModel _model;
  String apiKey = ""; // <--- REPLACE WITH YOUR ACTUAL API KEY
  final TextEditingController _query = TextEditingController();
  final SpeechToText _speechToText = SpeechToText();
  bool _speechEnabled = false;
  String _lastWords = '';
  bool isRecording = false;
  bool isDoneRecording = false;

  void removeText() {
    setState(() {
      _query.clear();
      isDoneRecording = false;
      _lastWords = "";
    });
    _query.clear();
  }

  void setKeyword(String prompt) {
    if (prompt.isEmpty) {
      toastInfo(msg: "You didn't say anything!", status: Status.error);
      setState(() {
        isDoneRecording = false;
        isRecording = false;
      });
      return;
    }

    setState(() {
      _lastWords = "";
      isRecording = false;
      _query.text = prompt;
      isDoneRecording = true;
    });
  }

  void _initSpeech() async {
    try {
      _speechEnabled = await _speechToText.initialize(
        onStatus: (status) => debugPrint('Speech status: $status'),
        onError: (error) => debugPrint('Speech error: $error'),
      );
      if (!_speechEnabled) {
        toastInfo(
          msg: "Microphone permission not granted or speech not available.",
          status: Status.error,
        );
      }
      setState(() {});
    } catch (e) {
      debugPrint("Speech initialization failed: $e");
    }
  }

  void _startListening() async {
    setState(() {
      isRecording = true;
    });
    if (!_speechEnabled) {
      toastInfo(msg: "Speech not initialized yet.", status: Status.error);
      return;
    }

    await _speechToText.listen(onResult: _onSpeechResult);
    setState(() {});
  }

  void _stopListening() async {
    await _speechToText.stop();
    setKeyword(_lastWords);
    setState(() {});
  }

  void _onSpeechResult(SpeechRecognitionResult result) {
    setState(() {
      _lastWords = result.recognizedWords;
    });
  }

  @override
  void initState() {
    super.initState();
    // TODO: Replace "YOUR_API_KEY" with your actual Gemini API Key
    // Refer to https://www.freecodecamp.org/news/how-to-secure-mobile-apis-in-flutter/ for API key security.
    apiKey = "YOUR_API_KEY"; // Secure this!
    _model = GenerativeModel(model: AppStrings.AI_MODEL, apiKey: apiKey);
    _initSpeech();
  }

  @override
  void dispose() {
    _query.dispose();
    _speechToText.cancel(); // Cancel listening to prevent resource leaks
    super.dispose();
  }

  void assignCroppedImage(CroppedFile? croppedFile) {
    if (croppedFile != null) {
      setState(() {
        selectedFile = File(croppedFile.path);
      });
    }
  }

  void setFile(File? pickedFile) {
    setState(() {
      selectedFile = pickedFile;
      fileName = pickedFile?.path.split('/').last;
      fileSize = pickedFile?.lengthSync().formatToMegaByte();
    });
  }

  void removeFile() {
    setState(() {
      selectedFile = null;
      fileSize = null;
    });
  }

  @override
  Widget build(BuildContext context) {
    Size size = MediaQuery.sizeOf(context);

    return Scaffold(
      floatingActionButton: selectedFile != null || _query.text.isNotEmpty
          ? FloatingActionButton.extended(
        onPressed: () => RecipeController.sendRequest(
          context,
          selectedFile,
          _model,
          removeFile,
          _query.text,
          removeText,
        ),
        backgroundColor: AppColors.primaryColor,
        icon: const Icon(Iconsax.send_1, color: Colors.white),
        label: const Text(
          "Send Request",
          style: TextStyle(color: Colors.white),
        ),
      )
          : null,
      body: Padding(
        padding: const EdgeInsets.all(18.0),
        child: Center(
          child: Column(
            mainAxisAlignment: MainAxisAlignment.center,
            children: [
              Text(
                AppStrings.APP_TITLE,
                textAlign: TextAlign.center,
                style: TextStyle(
                  color: Colors.black,
                  fontWeight: FontWeight.w500,
                  fontSize: 16,
                ),
              ),
              Text(
                AppStrings.APP_SUBTITLE,
                textAlign: TextAlign.center,
                style: TextStyle(
                  color: AppColors.grey,
                  fontSize: 15,
                  fontWeight: FontWeight.w300,
                ),
              ),
              const Gap(20),
              if (!isDoneRecording)
                !isRecording
                    ? selectedFile != null
                    ? ImagePreviewer(
                  size: size,
                  pickedFile: selectedFile,
                  removeFile: removeFile,
                  context: context,
                  completer: completer,
                  setFile: setFile,
                  assignCroppedImage: assignCroppedImage,
                )
                    : GestureDetector(
                  onTap: () =>
                      ImageUploadController.showFilePickerButtonSheet(
                        context,
                        completer,
                        setFile,
                        assignCroppedImage,
                      ),
                  child: UploadContainer(
                    title: 'an image of a food or snack',
                    size: size,
                  ),
                )
                    : SizedBox.shrink(),
              const Gap(20),

              if (selectedFile == null) ...[
                if (!isDoneRecording) ...[
                  Text(
                    "or record your voice",
                    style: TextStyle(
                      color: AppColors.grey,
                      fontSize: 16,
                      fontWeight: FontWeight.w200,
                    ),
                  ),
                  Center(
                    child: GestureDetector(
                      onTap: () {
                        if (!_speechEnabled) {
                          toastInfo(
                            msg: "Speech recognition not ready yet.",
                            status: Status.error,
                          );
                          return;
                        }
                        if (_speechToText.isNotListening) {
                          _startListening();
                        } else {
                          _stopListening();
                        }
                      },
                      child: GlowingMicButton(
                        isListening: !_speechToText.isNotListening,
                      ),
                    ),
                  ),
                  const Gap(10),
                  Container(
                    padding: EdgeInsets.all(16),
                    child: Text(
                      _speechToText.isListening
                          ? _lastWords
                          : _speechEnabled
                          ? 'Tap the microphone to start listening...'
                          : 'Speech not available',
                    ),
                  ),
                  const Gap(10),
                ],

                isDoneRecording
                    ? QueryTextBox(query: _query)
                    : SizedBox.shrink(),
              ],

              const Gap(20),
              selectedFile != null || _query.text.isNotEmpty
                  ? GestureDetector(
                onTap: () {
                  if (selectedFile != null) {
                    removeFile();
                  } else {
                    removeText();
                  }
                },
                child: CircleAvatar(
                  backgroundColor: AppColors.primaryColor,
                  radius: 30,
                  child: Icon(Iconsax.close_circle, color: Colors.white),
                ),
              )
                  : SizedBox.shrink(),
            ],
          ),
        ),
      ),
    );
  }
}

The HomeScreen is the main user interface of our AI cooking assistant application. It manages the state for image selection, voice input, and triggers the AI recipe generation.

State variables:
- selectedFile: Stores the File object of the image picked by the user.
- completer: A Completer object, often used for asynchronous operations to signal completion.
- fileName, fileSize: Store details about the selected image.
- _model: An instance of GenerativeModel from the google_generative_ai package, which is our interface to the Gemini API.
- apiKey: Crucially, this is where you'll insert your Gemini API key. Remember the security warning above!
- _query: A TextEditingController for the text input field, which will display the transcribed voice input.
- _speechToText: An instance of SpeechToText for handling voice recognition.
- _speechEnabled: A boolean indicating if speech recognition is initialized and available.
- _lastWords: Stores the most recently recognized words from speech.
- isRecording: A boolean to track if voice recording is active.
- isDoneRecording: A boolean to track if a voice recording has been completed and transcribed.
Methods:
- removeText(): Clears the text input field (_query), resets isDoneRecording and _lastWords to clear any previous voice input.
- setKeyword(String prompt): Sets the _query text to the prompt (transcribed voice), and updates isRecording and isDoneRecording states. It also provides a toast message if the prompt is empty.
- _initSpeech(): Initializes the SpeechToText plugin. It requests microphone permission and sets _speechEnabled based on the initialization success. If permissions are not granted, it shows an error toast.
- _startListening(): Starts the speech recognition listener. Sets isRecording to true.
- _stopListening(): Stops the speech recognition listener and calls setKeyword with the _lastWords to finalize the transcribed text.
- _onSpeechResult(SpeechRecognitionResult result): Callback method for SpeechToText that updates _lastWords with the recognized words as speech recognition progresses.
- initState(): Called when the widget is inserted into the widget tree. It initializes the _model with the Gemini API key and model name, and calls _initSpeech() to set up voice recognition.
- dispose(): Called when the widget is removed from the widget tree. It disposes of the _query controller and cancels the _speechToText listener to prevent memory leaks.
- assignCroppedImage(CroppedFile? croppedFile): Callback function passed to ImageUploadController to update selectedFile with the path of the newly cropped image.
- setFile(File? pickedFile): Callback function passed to ImageUploadController to update selectedFile with the picked image, and also extracts its fileName and fileSize using our custom extension.
- removeFile(): Clears the selectedFile and fileSize states, effectively removing the displayed image.
build(BuildContext context) Method – UI Layout:
- FloatingActionButton.extended: This button, labeled "Send Request," becomes visible only when an image (selectedFile) is chosen OR when there's text in the query box (_query.text.isNotEmpty). Tapping it triggers RecipeController.sendRequest with the relevant input.
- App title and subtitle: Displays the main headings using AppStrings.
- Image upload/preview section:
  - If !isDoneRecording (meaning no voice input has been finalized) and !isRecording (not currently recording voice):
    - If selectedFile is not null, it shows an ImagePreviewer widget to display the chosen image with an option to remove it.
    - Otherwise (no image selected), it displays an UploadContainer which acts as a tappable area to trigger ImageUploadController.showFilePickerButtonSheet for picking an image.
- Voice input section:
  - This section (if (selectedFile == null) ...) only appears if no image is selected, providing an alternative input method.
  - If !isDoneRecording, it shows a "or record your voice" text and a GlowingMicButton.
    - Tapping the GlowingMicButton toggles speech recognition (_startListening / _stopListening).
    - A Text widget displays the current speech recognition status or _lastWords as they are transcribed.
  - If isDoneRecording (meaning voice input has been finalized), it shows a QueryTextBox which displays the transcribed text, allowing for review before sending the request.
- Clear input button: A CircleAvatar with a close icon appears when either an image is selected or text is present in the query. Tapping it calls removeFile() or removeText() to clear the respective input.

Overall, HomeScreen intelligently adapts its UI based on user input (image or voice) and orchestrates the interaction with the ImageUploadController for image handling and the RecipeController for AI recipe generation.

The `components` Folder

This folder contains smaller, reusable UI elements.

`toast_info.dart`

import 'package:fluttertoast/fluttertoast.dart';
import '../../core/constants/app_colors.dart';
import 'package:flutter/material.dart'; // Import for MaterialColor/Colors

void toastInfo({
  required String msg,
  required Status status,
}) {
  Fluttertoast.showToast(
    msg: msg,
    toastLength: Toast.LENGTH_SHORT,
    gravity: ToastGravity.BOTTOM,
    timeInSecForIosWeb: 1,
    backgroundColor: status == Status.success ? AppColors.primaryColor : AppColors.errorColor,
    textColor: Colors.white,
    fontSize: 16.0,
  );
}

The toastInfo function provides a convenient way to display brief, non-intrusive messages (toasts) to the user, typically for feedback like "success" or "error" messages.

It takes two required parameters:

msg: The message string to be displayed in the toast.
status: An enum of type Status (success or error) which determines the background color of the toast.

Fluttertoast.showToast(...) is the core function from the fluttertoast package that displays the toast.

toastLength: Sets the duration the toast is visible (short).
gravity: Positions the toast at the bottom of the screen.
timeInSecForIosWeb: Duration for web/iOS.
backgroundColor: Dynamically set to AppColors.primaryColor for success and AppColors.errorColor for errors, providing visual cues to the user.
textColor: Sets the text color to white.
fontSize: Sets the font size of the toast message.

This function centralizes toast message display, ensuring consistency in appearance and behavior throughout the app.

The `widgets` Folder

The application's user interface is constructed using a series of well-defined, reusable Flutter widgets. Each widget serves a specific purpose, contributing to the overall functionality and aesthetic of Snap2Chef.

1. glowing_microphone.dart:

This widget creates an animated microphone button that visually indicates when the application is actively listening for speech input.

import 'package:flutter/material.dart';
import 'package:iconsax/iconsax.dart';

import '../../core/constants/app_colors.dart';

class GlowingMicButton extends StatefulWidget {
  final bool isListening;

  const GlowingMicButton({super.key, required this.isListening});

  @override
  State createState() => _GlowingMicButtonState();
}

class _GlowingMicButtonState extends State<GlowingMicButton>
    with SingleTickerProviderStateMixin {
  late final AnimationController _controller;
  late final Animation<double> _animation;

  @override
  void initState() {
    super.initState();
    _controller = AnimationController(
      vsync: this,
      duration: const Duration(seconds: 2),
    );

    _animation = Tween<double>(begin: 0.0, end: 25.0).animate(
      CurvedAnimation(parent: _controller, curve: Curves.easeOut),
    );

    if (widget.isListening) {
      _controller.repeat(reverse: true);
    }
  }

  @override
  void didUpdateWidget(covariant GlowingMicButton oldWidget) {
    super.didUpdateWidget(oldWidget);

    if (widget.isListening && !_controller.isAnimating) {
      _controller.repeat(reverse: true);
    } else if (!widget.isListening && _controller.isAnimating) {
      _controller.stop();
    }
  }

  @override
  void dispose() {
    _controller.dispose();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    return SizedBox(
      width: 100, // Enough space for the full glow
      height: 100,
      child: Stack(
        alignment: Alignment.center,
        children: [
          if (widget.isListening)
            AnimatedBuilder(
              animation: _animation,
              builder: (_, __) {
                return Container(
                  width: 60 + _animation.value,
                  height: 60 + _animation.value,
                  decoration: BoxDecoration(
                    shape: BoxShape.circle,
                    color: AppColors.primaryColor.withOpacity(0.15),
                  ),
                );
              },
            ),
          CircleAvatar(
            backgroundColor: AppColors.primaryColor,
            radius: 30,
            child: Icon(
              widget.isListening ? Iconsax.stop_circle : Iconsax.microphone,
              color: Colors.white,
            ),
          ),
        ],
      ),
    );
  }
}

GlowingMicButton (StatefulWidget): This is a StatefulWidget because it needs to manage its own animation state. It takes a final bool isListening property, which dictates whether the microphone should display a glowing animation or remain static.

_GlowingMicButtonState (State with SingleTickerProviderStateMixin):
- SingleTickerProviderStateMixin: This mixin is crucial for providing a Ticker to an AnimationController. A Ticker essentially drives the animation forward, linking it to the frame callbacks, ensuring smooth animation performance.
- _controller (AnimationController): Manages the animation. It's initialized with vsync: this (from SingleTickerProviderStateMixin) and a duration of 2 seconds.
- _animation (Animation): Defines the range of values the animation will produce. Here, a Tween(begin: 0.0, end: 25.0) is used with a CurvedAnimation (specifically Curves.easeOut) to create a smooth, decelerating effect as the glow expands.
- initState(): When the widget is first created, the AnimationController and Animation are initialized. If isListening is initially true, the animation is set to repeat(reverse: true) to make the glow pulse in and out continuously.
- didUpdateWidget(): This lifecycle method is called when the widget's configuration (its properties) changes. It checks if isListening has changed and starts or stops the animation accordingly. This ensures the animation dynamically responds to changes in the isListening state from its parent.
- dispose(): Crucially, the _controller.dispose() method is called here to release the resources held by the animation controller when the widget is removed from the widget tree, preventing memory leaks.
build() Method:
- SizedBox: Provides a fixed size (100x100) for the button, ensuring enough space for the glowing effect.
- Stack: Allows layering widgets on top of each other.
  - if (widget.isListening) AnimatedBuilder(...): This conditional renders the glowing effect only when isListening is true.
    - AnimatedBuilder: Rebuilds its child whenever the _animation changes value.
    - Inside AnimatedBuilder, a Container is used to create the circular glow. Its width and height are dynamically increased by _animation.value, creating the expanding effect. The color is AppColors.primaryColor with 0.15 opacity, giving it a subtle glow.
  - CircleAvatar: This is the main microphone button.
    - backgroundColor is AppColors.primaryColor.
    - radius is 30.
    - The child is an Icon from the Iconsax package, dynamically changing between Iconsax.stop_circle (when listening) and Iconsax.microphone (when not listening). The icon color is white.

2. image_picker_component.dart

This widget provides a reusable ListTile interface for users to select images from either the camera or the gallery.

import 'dart:async';

import 'package:flutter/cupertino.dart';
import 'package:flutter/material.dart';
import 'package:snap2chef/infrastructure/image_upload_controller.dart';

import '../../core/constants/app_colors.dart';
import '../../core/constants/enums/record_source.dart';

class ImagePickerTile extends StatelessWidget {
  const ImagePickerTile({
    super.key,
    required this.title,
    required this.subtitle,
    required this.icon,
    required this.recordSource,
    required this.completer,
    required this.context,
    required this.setFile,
    required this.assignCroppedImage,
  });

  final String title;
  final String subtitle;
  final IconData icon;
  final RecordSource recordSource;
  final Completer? completer;
  final BuildContext context;
  final Function setFile;
  final Function assignCroppedImage;

  @override
  Widget build(BuildContext context) {
    return ListTile(
      leading: CircleAvatar(
        backgroundColor: AppColors.litePrimary,
        child: Padding(
          padding: const EdgeInsets.all(3.0),
          child: Center(
            child: Icon(icon, color: AppColors.primaryColor, size: 20),
          ),
        ),
      ),
      title: Text(title, style: const TextStyle(color: Colors.black)),
      subtitle: Text(
        subtitle,
        style: const TextStyle(fontSize: 14, color: Colors.grey),
      ),
      trailing: const Icon(
        CupertinoIcons.chevron_right,
        size: 20,
        color: Color(0xffE4E4E4),
      ),
      onTap: () {
        ImageUploadController.imagePicker(
          recordSource,
          completer,
          context,
          setFile,
          assignCroppedImage,
        );
      },
    );
  }
}

ImagePickerTile (StatelessWidget): This is a StatelessWidget because it simply renders content based on its immutable properties and triggers an external function (ImageUploadController.imagePicker) when tapped.
Properties: It takes several final properties to make it highly customizable:
- title and subtitle: Text for the main and secondary lines of the list tile.
- icon: The IconData to display as the leading icon.
- recordSource: An enum (RecordSource) likely indicating if the image should be picked from the camera or gallery.
- completer: A Completer object, often used for asynchronous operations to signal when a task is complete.
- context: The BuildContext to allow the ImageUploadController to show dialogs or navigate.
- setFile: A Function callback to update the selected image file in the parent widget.
- assignCroppedImage: A Function callback to handle the result of any image cropping operation.
build() Method:
- ListTile: A standard Flutter widget used to arrange elements in a single row.
  - leading: Displays a CircleAvatar with a light primary background color, containing the specified icon in the primary color. This creates a visually appealing icon button on the left.
  - title: Displays the title text in black.
  - subtitle: Displays the subtitle text in grey with a font size of 14, providing additional descriptive information.
  - trailing: Shows a CupertinoIcons.chevron_right (right arrow) icon, common for indicating navigation or actionable items in a list.
  - onTap: This is the primary interaction point. When the ListTile is tapped, it calls the static method ImageUploadController.imagePicker, passing all the necessary parameters. This centralizes the image picking logic within ImageUploadController, making the ImagePickerTile purely a UI component.

3. image_previewer.dart

This widget is responsible for displaying a previously picked image and offering options to 'Edit' (re-pick) or 'Remove' the image.

import 'dart:async';
import 'dart:io';
import 'package:flutter/material.dart';
import 'package:iconsax/iconsax.dart';
import 'package:snap2chef/infrastructure/image_upload_controller.dart';

class ImagePreviewer extends StatelessWidget {
  const ImagePreviewer({
    super.key,
    required this.size,
    required this.pickedFile,
    required this.removeFile,
    required this.context,
    required this.completer,
    required this.setFile,
    required this.assignCroppedImage,
  });

  final Size size;
  final File? pickedFile;
  final Function removeFile;
  final BuildContext context;
  final Completer? completer;
  final Function setFile;
  final Function assignCroppedImage;

  @override
  Widget build(BuildContext context) {
    return Container(
      height: size.height * 0.13,
      width: double.infinity,
      decoration: BoxDecoration(
        borderRadius: BorderRadius.circular(7),
        // border: Border.all(
        //   color: AppColors.borderColor,
        // ),
        image: DecorationImage(
          image: FileImage(
            File(pickedFile!.path),
          ),
          fit: BoxFit.cover,
        ),
      ),
      child: Stack(
        children: [
          Container(
            decoration: BoxDecoration(
              color: Colors.black.withOpacity(0.3),
              borderRadius: BorderRadius.circular(7),
            ),
          ),
          // Centered content
          Center(
            child: Wrap(
              crossAxisAlignment: WrapCrossAlignment.center,
              spacing: 20,
              children: [
                GestureDetector(
                  onTap: () {
                    ImageUploadController.showFilePickerButtonSheet(context,completer,setFile,assignCroppedImage);
                  },
                  child: Column(
                    children: [
                      Icon(
                        Iconsax.edit_2,
                        size: 20,
                        color: Colors.white,
                      ),
                      const Text(
                        'Edit',
                        style: TextStyle(
                          color: Colors.white,
                          fontSize: 15,
                        ),
                      )
                    ],
                  ),
                ),
                GestureDetector(
                  onTap: () {
                    removeFile();
                  },
                  child: Column(
                    children: [
                      Icon(
                        Iconsax.note_remove,
                        color: Colors.white,
                        size: 20,
                      ),
                      const Text(
                        'Remove',
                        style: TextStyle(
                          color: Colors.white,
                          fontSize: 15,
                        ),
                      )
                    ],
                  ),
                ),
              ],
            ),
          ),
        ],
      ),
    );
  }
}

ImagePreviewer (StatelessWidget): Similar to ImagePickerTile, this is a StatelessWidget that displays content and triggers callbacks.

Properties:
- size: The Size of the parent widget, used to calculate the height of the preview container proportionally.
- pickedFile: A File? representing the image file to be displayed. It's nullable, implying that this widget might only show if a file has been picked.
- removeFile: A Function callback to handle the removal of the currently displayed image.
- context, completer, setFile, assignCroppedImage: These are passed through to the ImageUploadController when the 'Edit' action is triggered, similar to the ImagePickerTile.
build() Method:
- Container: The primary container for the image preview.
  - height: Set to 13% of the screen height, providing a responsive size.
  - width: double.infinity to take full available width.
  - decoration:
    - borderRadius: Applies rounded corners to the container.
    - image: DecorationImage(...): This is where the magic happens. It displays the pickedFile as a background image for the container.
      - FileImage(File(pickedFile!.path)): Creates an image provider from the local file path. The ! (null assertion operator) implies pickedFile is expected to be non-null when this widget is displayed.
      - fit: BoxFit.cover: Ensures the image covers the entire container, potentially cropping parts of it.
- Stack: Layers content on top of the image.
  - Container (Overlay): A semi-transparent black Container is placed on top of the image (Colors.black.withOpacity(0.3)) to create a darkened overlay. This improves the readability of the white text and icons placed over the image.
  - Center: Centers the action buttons horizontally and vertically within the overlay.
  - Wrap: Arranges the 'Edit' and 'Remove' buttons horizontally with a spacing of 20. WrapCrossAlignment.center aligns them vertically within the Wrap.
  - GestureDetector (for 'Edit'):
    - onTap: Calls ImageUploadController.showFilePickerButtonSheet, allowing the user to re-select or change the image. This method likely presents a bottom sheet with options to pick from the camera or gallery, similar to how the initial image picking works.
    - Its child is a Column containing an Iconsax.edit_2 icon and an 'Edit' text, both in white.
  - GestureDetector (for 'Remove'):
    - onTap: Calls the removeFile() callback, which would typically clear the selected pickedFile in the parent state, causing this previewer to disappear or revert to an upload state.
    - Its child is a Column containing an Iconsax.note_remove icon and a 'Remove' text, both in white.

4. query_text_box.dart

This widget provides a styled TextFormField for multi-line text input, typically used for user queries or notes.

import 'package:flutter/material.dart';

import '../../core/constants/app_colors.dart';

class QueryTextBox extends StatelessWidget {
  const QueryTextBox({
    super.key,
    required TextEditingController query,
  }) : _query = query;

  final TextEditingController _query;

  @override
  Widget build(BuildContext context) {
    return TextFormField(
      controller: _query,
      maxLines: 4,
      autofocus: true,
      decoration: InputDecoration(
        hintStyle: TextStyle(color: AppColors.lighterGrey),
        border: OutlineInputBorder(
          borderRadius: BorderRadius.circular(12.0),
          borderSide: BorderSide(color: Colors.grey.shade400),
        ),
        focusedBorder: OutlineInputBorder(
          borderRadius: BorderRadius.circular(12.0),
          borderSide: const BorderSide(
            color: AppColors.primaryColor,
            width: 2.0,
          ),
        ),
        enabledBorder: OutlineInputBorder(
          borderRadius: BorderRadius.circular(12.0),
          borderSide: BorderSide(color: Colors.grey.shade300),
        ),
        contentPadding: const EdgeInsets.symmetric(
          vertical: 12.0,
          horizontal: 16.0,
        ),
      ),
      style: const TextStyle(
        fontSize: 14.0,
        color: Colors.black,
      ),
      keyboardType: TextInputType.multiline,
      textInputAction: TextInputAction.newline,
    );
  }
}

QueryTextBox (StatelessWidget): A StatelessWidget that renders a text input field. It takes a TextEditingController as a required parameter, allowing external control over the text field's content.

Properties:
- _query (TextEditingController): The controller linked to the TextFormField. This allows retrieving the text, setting initial text, and listening for changes.
build() Method:
- TextFormField: The core input widget.
  - controller: _query: Binds the TextEditingController to this field.
  - maxLines: 4: Allows the text field to expand up to 4 lines before becoming scrollable.
  - autofocus: true: Automatically focuses the text field when the screen loads, bringing up the keyboard.
  - decoration: InputDecoration(...): Defines the visual styling of the input field.
    - hintStyle: Sets the color of the hint text to AppColors.lighterGrey.
    - border: Defines the default border when the field is not focused or enabled, with rounded corners and a light grey border.
    - focusedBorder: Defines the border style when the field is actively focused by the user. It uses AppColors.primaryColor with a wider stroke (width: 2.0) to provide a clear visual indicator of focus.
    - enabledBorder: Defines the border style when the field is enabled but not focused, using a slightly darker grey.
    - contentPadding: Adds internal padding within the text field for better spacing of the text.
  - style: Sets the font size to 14.0 and color to black for the entered text.
  - keyboardType: TextInputType.multiline: Configures the keyboard to be suitable for multi-line text input, often providing a "return" key that creates a new line.
  - textInputAction: TextInputAction.newline: Specifies that pressing the "Done" or "Enter" key on the keyboard should insert a new line.

5. upload_container.dart

This widget creates a visually distinct "dotted border" container, typically used as a tappable area to trigger file upload or selection actions.

import 'package:dotted_border/dotted_border.dart';
import 'package:flutter/material.dart';
import 'package:gap/gap.dart';
import 'package:iconsax/iconsax.dart';
import '../../core/constants/app_colors.dart';

class UploadContainer extends StatelessWidget {
  const UploadContainer({
    super.key,
    required this.size,
    required this.title,
  });

  final Size size;
  final String title;

  @override
  Widget build(BuildContext context) {
    return DottedBorder(
      color: AppColors.primaryColor,
      radius: const Radius.circular(15),
      borderType: BorderType.RRect,
      strokeWidth: 1,
      child: SizedBox(
        height: size.height * 0.13,
        width: double.infinity,
        child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: [
            Container(
              height: 70,
              width: 60,
              decoration: BoxDecoration(
                shape: BoxShape.circle,
                color: AppColors.litePrimary,
              ),
              child: Padding(
                padding: const EdgeInsets.all(13.0),
                child: Icon(
                  Iconsax.document_upload,
                  color: AppColors.primaryColor,
                ),
              ),
            ),
            const Gap(5),
            RichText(
              text: TextSpan(
                text: 'Click to select ',
                style: TextStyle(
                  color: AppColors.primaryColor,
                ),
                children: [
                  TextSpan(
                    text: title,
                    style: TextStyle(
                      color: Color(0xff555555),
                    ),
                  )
                ],
              ),
            ),
          ],
        ),
      ),
    );
  }
}

UploadContainer (StatelessWidget): A StatelessWidget primarily for visual presentation, indicating an upload zone.

Properties:
- size: The Size of the parent, used to determine the container's height proportionally.
- title: A String to be displayed as part of the "Click to select [title]" message.
build() Method:
- DottedBorder: This package provides the visual dotted border effect.
  - color: AppColors.primaryColor: The color of the dotted line.
  - radius: const Radius.circular(15): Applies rounded corners to the dotted border.
  - borderType: BorderType.RRect: Specifies that the border should follow a rounded rectangle shape.
  - strokeWidth: 1: Sets the thickness of the dotted line.
- SizedBox: Defines the internal dimensions of the area within the dotted border, taking up 13% of the screen height and full width.
- Column: Arranges the icon and text vertically, centered within the SizedBox.
  - Container (Icon background): A circular container with AppColors.litePrimary background holds the upload icon.
    - Iconsax.document_upload: The icon signifying an upload action, colored with AppColors.primaryColor.
  - Gap(5): From the gap package, this provides a small vertical space (5 pixels) between the icon and the text.
  - RichText: Allows for different styles within a single text block.
    - TextSpan(text: 'Click to select ', ...): The first part of the message, styled with AppColors.primaryColor.
    - children: [TextSpan(text: title, ...)]: The second part of the message, which is the title property passed to the widget, styled in a darker grey. This structure allows "Click to select " to be consistently styled while the title (for example, "image", "document") can have a different appearance.

Summary of Code Implementation

We've covered a significant amount of ground in this part of the article, transforming our basic Flutter application into a powerful AI-powered recipe guide. We started by setting up the core UI, then delved into integrating the google_generative_ai package to communicate with Google's Gemini models for both image and voice input.

We implemented robust logic for:

Image input: Capturing images from the camera or gallery, cropping them, and sending them to the gemini model.
Voice input: Recording audio and preparing the groundwork for transcription before sending text to the gemini model.
Dynamic content display: Skillfully parsing the AI's response to extract and present not just the recipe text, but also embedding YouTube instructional videos and even relevant images, all within a beautifully formatted dialog using flutter_markdown and cached_network_image. We also ensured proper lifecycle management for our media players.

This highlights how easily you can leverage advanced AI capabilities like multimodal understanding and natural language generation within your Flutter applications. By building on these concepts, you can create truly interactive and intelligent user experiences.

Now that we have the core logic in place for capturing input, communicating with the AI, and displaying its rich responses, we need to ensure that our application can actually access the necessary device features.

Permissions: Ensuring App Functionality and User Privacy

For a Flutter application to interact with system features like the camera, microphone, or file storage, it must declare specific permissions in both its Android and iOS manifests. These declarations inform the operating system about the app's requirements and, for sensitive permissions, prompt the user for consent at runtime.

Android¹ Permissions (in `android/app/src/main/AndroidManifest.xml`)

<manifest xmlns:android="http://schemas.android.com/apk/res/android">
    <uses-permission android:name="android.permission.RECORD_AUDIO"/>
    <uses-permission android:name="android.permission.CAMERA" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
    manifest>

Here’s what’s going on:

: This permission is necessary for the application to access the device's microphone and record audio. It's crucial for any speech recognition or voice input features, like the GlowingMicButton implies.
: Grants the application access to the device's camera. This is essential for features that allow users to take photos, such as those enabled by ImagePickerTile or ImagePreviewer.
: This is a fundamental permission required for almost any modern application that connects to the internet. It allows the app to send and receive data from web services, like interacting with the Gemini API, Firebase, or Vertex AI.
: Allows the application to read files from the device's shared external storage (for example, photos saved in the gallery). This is necessary when picking existing images from the gallery. For newer Android versions (Android 10+), scoped storage might change how this works, but for reading user-selected media, this declaration is still relevant. For writing to external storage, WRITE_EXTERNAL_STORAGE would also be needed.

iOS Permissions (in `ios/Runner/Info.plist`)


plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>io.flutter.embedded_views_previewkey>
        <true/>
    <key>NSSpeechRecognitionUsageDescriptionkey>
        <string>We need access to recognize your speech.string>
    <key>NSCameraUsageDescriptionkey>
        <string>This app needs access to the camera to capture photos and videos.string>
    <key>NSMicrophoneUsageDescriptionkey>
        <string>This app needs access to the microphone for audio recording.string>
    <key>NSPhotoLibraryUsageDescriptionkey>
        <string>This app needs access to your photo library.string>
    <key>NSPhotoLibraryAddUsageDescriptionkey>
        <string>This app needs permission to save photos to your photo library.string>
    <key>NSAppTransportSecuritykey>
        <dict>
            <key>NSAllowsArbitraryLoadskey>
            <true/>
        dict>
dict>
plist>

Here’s what’s going on:

iOS permissions are declared in the Info.plist file using specific keys (NS...UsageDescription) and require a user-facing string explaining why the permission is needed. This string is displayed to the user when the app requests the permission.

io.flutter.embedded_views_preview: This key is often added when using Flutter plugins that integrate native UI components (for example, camera previews, webviews). It enables a preview of embedded native views during development.
NSSpeechRecognitionUsageDescriptionWe need access to recognize your speech.: This is the privacy description for speech recognition services (for example, Apple's built-in speech recognizer). It's crucial for features like voice input to work.
NSCameraUsageDescriptionThis app needs access to the camera to capture photos and videos.: The privacy description for camera access. This is required for capturing images via the camera, as used in the image picking functionality.
NSMicrophoneUsageDescriptionThis app needs access to the microphone for audio recording.: The privacy description for microphone access. Necessary for recording audio for speech input.
NSPhotoLibraryUsageDescriptionThis app needs access to your photo library.: The privacy description for reading from the user's photo library. This is required when picking existing images or videos from the gallery.
NSPhotoLibraryAddUsageDescriptionThis app needs permission to save photos to your photo library.: The privacy description for writing to the user's photo library. This would be needed if the app captures photos/videos and saves them directly to the device's gallery.
NSAppTransportSecurityNSAllowsArbitraryLoads: This section relates to Apple's App Transport Security (ATS). By default, ATS enforces secure connections (HTTPS). Setting NSAllowsArbitraryLoads to true (as shown here) disables this enforcement, allowing the app to make insecure HTTP connections. While useful during development or for interacting with specific legacy APIs, it's generally not recommended for production apps due to security implications. For production, you should ideally configure specific exceptions or ensure all network requests use HTTPS.

Assets: Managing Application Resources

Assets are files bundled with your application and are accessible at runtime. This typically includes images, fonts, audio files, and more.

In this application, we have an assets folder, and inside it, an images subfolder.

assets/
└── images/
    ├── placeholder.png
    └── app_logo.png

placeholder.png: This image is typically used as a temporary visual cue when actual content (like an image being loaded or picked) is not yet available. It provides a better user experience than a blank space.
app_logo.png: This is the primary logo of the application. It's used for various purposes, including the app icon and the splash screen.

To ensure Flutter knows about these assets and bundles them with the application, you need to declare them in your pubspec.yaml file:

flutter:
  uses-material-design: true
  assets:
    - assets/images/ # This line tells Flutter to include all files in the assets/images/ directory

App Icons: Customizing Your Application's Identity

Flutter applications use the flutter_launcher_icons package to simplify the process of generating app icons for different platforms and resolutions. This ensures your app has a consistent and professional look on both Android and iOS devices.

`pubspec.yaml` Configuration for `flutter_launcher_icons`

flutter_icons:
  android: "launcher_icon"
  ios: true
  image_path: "assets/images/app_logo.png"
  remove_alpha_ios: true
  adaptive_icon_background: "#FFFFFF"
  adaptive_icon_foreground: "assets/images/app_logo.png"

Here’s what’s happening:

flutter_icons:: This is the root key for the flutter_launcher_icons package configuration.
android: "launcher_icon": Specifies that Android launcher icons should be generated. "launcher_icon" is the default and usually sufficient.
ios: true: Enables the generation of iOS app icons.
image_path: "assets/images/app_logo.png": This is the absolute path to your source image file that will be used to generate the icons. It's crucial that this path is correct and points to a high-resolution square image.
remove_alpha_ios: true: For iOS, this option removes the alpha channel from the icon. iOS icons typically do not use an alpha channel for transparency.
adaptive_icon_background: "#FFFFFF": This is specific to Android Adaptive Icons (introduced in Android 8.0 Oreo). It defines the background layer of the adaptive icon. Here, it's set to white (#FFFFFF).
adaptive_icon_foreground: "assets/images/app_logo.png": This defines the foreground layer of the adaptive icon. It uses the app_logo.png again, which will be masked and scaled by the Android system.

Generating App Icons

After configuring pubspec.yaml, you need to run the following commands in your terminal:

First, run dart run flutter_launcher_icons:generate. This command generates a configuration file (often named flutter_launcher_icons.yaml or similar, or directly processes the pubspec.yaml) which flutter_launcher_icons uses.

Correction: The prompt mentions "generate a config file and setup the image path to the path of the app_logo.png then run dart run flutter_launcher_icons to generate the assets". It seems flutter_launcher_icons:generate might be an older or specific command, the typical usage is to run flutter_launcher_icons directly after setting image_path in pubspec.yaml. For the given configuration, the image_path is already set in pubspec.yaml.

Then, run dart run flutter_launcher_icons. This command executes the flutter_launcher_icons package, which takes the image_path specified in pubspec.yaml and generates all the necessary icon files at various resolutions for both Android and iOS, placing them in the correct native project directories.

Splash Screen: The First Impression

A splash screen (or launch screen) is the first screen users see when they open your app. It provides a branded experience while the app initializes resources. The flutter_native_splash package simplifies creating native splash screens for Flutter apps.

`pubspec.yaml` Configuration for `flutter_native_splash`

flutter_native_splash:
  color: "#FFFFFF"
  image: assets/images/app_logo.png
  android: true
  android_gravity: center
  fullscreen: true
  ios: true

Here’s what’s happening:

flutter_native_splash:: The root key for the flutter_native_splash package configuration.
color: "#FFFFFF": Sets the background color of the splash screen. Here, it's set to white.
image: assets/images/app_logo.png: Specifies the path to the image that will be displayed on the splash screen. In this case, it's the application's logo.
android: true: Enables splash screen generation for Android.
android_gravity: center: For Android, this centers the splash image on the screen.
fullscreen: true: Makes the splash screen appear in fullscreen mode, without status or navigation bars.
ios: true: Enables splash screen generation for iOS.

Generating the Splash Screen

After configuring pubspec.yaml, run the following command in your terminal: dart run flutter_native_splash:create. It processes the configuration and generates the native splash screen files (for example, launch images, drawables) in the respective Android and iOS project folders, ensuring they are properly integrated into the native launch process.

Screenshots from the App

Keep in mind that the output quality can vary depending on the AI model you’re using. The same applies to YouTube links and image URLs – sometimes they work perfectly, and other times they may not. So if something doesn’t work as expected, it’s not necessarily on your end.

Also, remember there are so many ways to achieve this and you don’t necessarily use to use this method. I’ll provide some other resources you can check out below. You can use systemInstructions instead of defining constraints in text the way I did it.

Here’s the completed project: https://github.com/Atuoha/snap2chef_ai

Wrapping Up

I hope this comprehensive breakdown has given you a clear understanding of the "Snap2Chef" application's structure, UI components, and underlying configurations. May your coding journey be filled with creativity and successful implementations.

Happy coding!

References

Here are some references for the key technologies and packages used in this application:

Flutter Packages

flutter/material.dart: The core Flutter Material Design package.
- Reference: Flutter API Docs - material library
iconsax/iconsax.dart: A custom icon set for Flutter.
- Reference: pub.dev - iconsax
gap/gap.dart: A simple package for adding spacing between widgets.
- Reference: pub.dev - gap
dotted_border/dotted_border.dart: A Flutter package to draw a dotted border around any widget.
- Reference: pub.dev - dotted_border
flutter/cupertino.dart: The core Flutter Cupertino (iOS-style) widgets package.
- Reference: Flutter API Docs - cupertino library
flutter_launcher_icons: A package for generating application launcher icons.
- Reference: pub.dev - flutter_launcher_icons
flutter_native_splash: A package for generating native splash screens.
- Reference: pub.dev - flutter_native_splash
image_picker (Implicitly used by ImageUploadController): A Flutter plugin for picking images from the image library, or taking new photos with the camera. (Though not directly imported in the provided snippets, ImageUploadController likely uses this or a similar package).
- Reference: pub.dev - image_picker
image_cropper (Implicitly used by ImageUploadController): A Flutter plugin for cropping images. (Likely used in conjunction with image_picker for assignCroppedImage).
- Reference: pub.dev - image_cropper

APIs and Platforms

Gemini API: Google's family of generative AI models.
- Reference: Google AI Gemini API
- Documentation: Google Cloud - Gemini API Documentation
Firebase: Google's comprehensive app development platform.
- Reference: Firebase Official Website
- Documentation: Firebase Documentation
- Firebase Console/Studio: The web-based interface for managing Firebase projects.
Vertex AI: Google Cloud's machine learning platform.
- Reference: Google Cloud - Vertex AI
- Documentation: Google Cloud - Vertex AI Documentation

How to Create an AI-Powered Bot that Can Post on Twitter/X

Arunachalam B — Wed, 23 Apr 2025 18:27:44 +0000

These days, everyone wants to be a content creator. But it can be hard to find time to create and curate content, post on social media, build engagement, and grow your brand.

And I’m not an exception to this. I wanted to create more content, and had an idea based on something I’ve observed. I subscribe to a few technology newsletters, and I read lots of updates every day about the tech ecosystem. But I’ve noticed that many of my peers often don’t seem to be aware of this news. So, I decided to post my top three news stories (especially about AI) on my Twitter/X account every day.

I did this for a couple of weeks, but after that I couldn’t find the time to keep it going. So, I did some research into how I could automate the process, and I found a solution. In this guide, I’ll explain the process so you can use it, too.

By the end of this tutorial, you’ll have created your own AI bot that:

Fetches data from an API or crawls a webpage
Processes the data using AI
Posts the results on Twitter/X

And the great thing: this entire process is automated.

Prerequisites
How to Build the Bot
Node.js Project Setup
Conclusion

Prerequisites

Before we begin creating a bot, you’ll need to have the following setup and tools ready to go:

NodeJS - A simple NodeJS app to code the bot

You’ll also need some API keys, secrets, and tokens. So, you’ll need to have the following accounts created:

Twitter Developer – To generate the Twitter/X API keys, secrets, and tokens
Google AI Studio – To generate the Gemini API key

How to Build the Bot

There are a number of steps I’ll walk you through to build your bot.

We’ll start by generating an API Key and Secret so we can use the Twitter/X API. Then we’ll generate an access token and access token secret with “Read and Write” permissions that’ll be able to post in your account. After that we’ll generate an API Key in Google Gemini (we’ll be using the Gemini API to process the data).

With all that taken care of, we’ll start working on the Node.js app. The app will be able to fetch data from an API, process the data using AI, and then post that data in the form of tweets on Twitter/X.

Finally, we’ll automate the entire process and schedule it to run daily.

Step 1: Generate the Twitter API Key

Navigate to Twitter Developer Website.
Click on the “Developer Portal” in the top right:
Signup using your account.
You’ll be asked to fill out a form asking how will you use the Twitter API, and a few basic details. It may take up to 24 hours to get approved. But, it’s approved instantly for me.
After login, Navigate to "Projects and Apps" and under “Overview” click on "Create App":
Enter a name for your app and click “Next” to proceed with creating your app. At the end, you’ll be shown your API Key and Secret. Don’t copy that now.
Click on the project you created from the left side drawer and click on the "Edit" option in “User authentication settings” section.
Select “Read and Write” in App Permissions section, “Web App, Automated App or Bot” in Type of App section, and enter your website URL (it can be any URL including http://localhost) in the “Callback URI” and “Website URL”. Then hit “Save”.
Go to “Keys and tokens” tab.
Click on “Regenerate” button in “API Key and Secret” section.
Copy and save the API Key and Secret somewhere securely.

Step 2: Generate Access Token and Secret

Go to “Keys and tokens” tab.
Click on “Generate” or “Regenerate” button in “Access Token and Secret” section.
Copy and save the Access Token and Secret somewhere securely.

Step 3: Generate an API Key in Google Gemini

Navigate to Google AI Studio.
Login to your account.
Click on “Get API Key” button at the top right.
Click on “Create API Key” button.
Copy and save the API Key somewhere securely.

Alright, we are done with creating the necessary API Keys and Secrets for our project. Let’s put on our coding shoes.

Node.js Project Setup

There are 5 major steps for this part of the project. They are:

Fetch data from the API
Upload the data as a file to Gemini API
Prompt Gemini with the uploaded file to get the latest AI news
Post news to Twitter/X using their API
Delete the file uploaded in Gemini API

These are just the snippets of code that can be assembled together to run this project.

Step 1: Fetch Data from the API

In my case, I’ll be using techmeme.com to get the latest news. But this site does not offer an API. So, I’ll be downloading the HTML of this site.

In the User-Agent header, we pass the value that mimics a browser user agent to avoid potential blocks.

Step 2: Upload the Data as a File to Gemini API

Now we need to store this HTML in a separate file. We cannot directly pass the HTML code in the prompt to the Gemini API, as it’ll result in an error. This is because Gemini accepts only a limited number of tokens in this API. The HTML code of any website will always result in huge number of tokens. So, we’ll create a separate file.

Upload the file to the Gemini API. Refer to the file id in the prompt to Gemini.

Step 3: Prompt Gemini to Get the Latest AI News

Let’s write a prompt to Gemini asking it to generate top news by referring to the HTML file provided. We’ll ask it to provide a headline, short description, URL, and three relevant hashtags for each tweet. We’ll also give some example data of how it should look. We’ll ask it to generate a structured response by providing the format of the JSON that we want the output to be.

You can use whatever model you want to, but I’ll be using the gemini-2.5-pro-exp-03-25 model for this use case. I’m using this model because we need a thinking model that thinks and picks the correct top news – not just one that predicts the next token/word. The Gemini 2.5 Pro model best qualifies for this.

Step 4: Post Using the Twitter/X API

Here’s the core of our app. We need to post all the tweets we received from Gemini. We’ll be posting the tweet as a thread. This means that the first tweet will be the root tweet and subsequent tweets will be in the comments of the prior tweet. This makes it a thread.

To do this, we’ll take the id of each tweet after it’s posted and pass it on to the next tweet as a reference. One additional thing to note here is, after each successful tweet, we’ll give a pause of 5 seconds before posting the next tweet. There are few reasons for doing it this way.

When any script runs, it usually runs at a much higher speed (usually in milliseconds). So, the second tweet may get posted before the first tweet was posted (maybe due to some poor internet connection). Also, I believe Twitter implements some queue system which may quickly process the second tweet before your first. So it’s always better to leave a small gap – if not 5 seconds then at least 1 second
Twitter may have implemented some rate limiting mechanism. So if there are multiple request received from a same IP within a short time frame, they may block the IP and consider your account as spam.
Since we’re using a Free tier API, we are limited to 1500 tweets per month. If you’ve paid for this API, you won’t have to worry about this (since you’ll have a higher limit and the rate limiting mechanism –refer to point #2 – might not be applicable). All of this depends on their pricing, so just refer to that and make your call accordingly.

I’m using the free tier, and since it’s a hobby project, having a 5 seconds wait time makes sense. I have not faced any issues so far with this.

Step 5: Delete the File Uploaded in the Gemini API

After posting all the tweets, it’s time to clean up the system. The only thing we need to do as a clean up is delete the uploaded file. It’s always a best practice to remove an unused file that’s no longer needed. And since we’ve already posted the tweets, we no longer need that file. So, we’ll be deleting it in this step.

That’s it. We’re all done. You just need to copy these blocks of code into an index.js file and install some dependencies into the project and you should be good to go.

To make this even more simple, I have created a repo and made it public. Here’s the Github repo URL. You just need to clone the repo, install the dependencies, and run the post command

git clone https://github.com/arunachalam-b/existential-crisis-alert-bot.git
cd existential-crisis-alert-bot
npm i

Create a .env file and update your API keys and secrets in that file:

GEMINI_API_KEY=
TWITTER_API_KEY=
TWITTER_API_SECRET=
TWITTER_ACCESS_TOKEN=
TWITTER_ACCESS_TOKEN_SECRET=

Run the following command to post the latest AI news to your account:

npm run post

The Result

Here’s a sample output of that command:

You can modify the code/prompt to fetch data from a different API and post the top results in your Twitter account.

Conclusion

I hope you now understand how you can automate a slightly complex process using AI and some APIs. Just note that this example is not completely automated. You still have to manually run the command everyday to post the tweets.

But you can automate that process as well. Just drop me a message if you wish to know about that. That topic itself deserves to be a separate tutorial. Also, I would request that you give a star for my project if you enjoyed this tutorial.

Meanwhile, you can follow my Twitter/X account to receive the top AI news everyday. If you wish to learn more about automation, subscribe to my email newsletter (https://5minslearn.gogosoon.com/) and follow me on social media.

How to Build a Video Subtitle Generator using the Gemini API

Sanjay — Wed, 11 Dec 2024 15:28:11 +0000

In this tutorial, you'll build an AI-powered subtitle generator using Google's Gemini API. We'll create a project called “AI-Subtitle-Generator” using React for the front end and Express for the back end. Get ready for a fun and practical project.

How to Get Your API Key
Project Setup
Front End Setup
Server Setup
Update the Front End
Summary
Conclusion

Prerequisites

To build this project, you should know the basics of React and Express.

What is the Gemini API?

Google's Gemini API is a powerful tool that lets you integrate advanced AI capabilities into your applications. Gemini is a multimodal model, which means you can use various types of input, like text, images, audio, and video.

It’s good at analyzing and processing large amounts of text as well as pulling information from videos – which makes it great for our use case of a subtitle generator.

How to Get Your API Key

An API key acts as a unique identifier and authenticates your requests to the service. It's essential for accessing and using Gemini AI’s capabilities. This key will allow our application to communicate with Gemini and help us build our project.

Go to Google AI Studio, then click “Get API Key”:

After you are redirected to the API KEY page, click “Create API Key“:

A new API KEY will be created. Then make sure you copy the key.

This is your API key. This key is used to authenticate your application's requests to the Gemini API. Each time your application sends a request to Gemini, this key must be included. Gemini uses this key to verify that the request is coming from an authorized source. Without this API key, your requests will be rejected, and you won't be able to access Gemini's services.

Project Setup

Start by creating a new folder for your project. Let's call it ai-subtitle-generator.

Inside the ai-subtitle-generator folder, create two subfolders: client and server. The client folder will contain the React frontend, and the server folder will contain the Express backend.

Front End Setup

First, we will focus on the front end and set up a basic React application.

Navigate to the client folder:

cd client

Then create a new React project using Vite. To do that, run the following command:

npm create vite@latest .

When prompted, choose “React“. Select “React + TS” or “React + JS”. In this tutorial, I will use React + TS. You can also follow along with JS.

Next, install the dependencies with this command:

npm install

Then start the development server:

npm run dev

How to Handle File Uploads in the Frontend

Now in client/src/App.tsx, add the following code:

//  client/src/App.tsx

const App = () => {
    const handleSubmit = async (e: React.FormEvent): Promise<void> => {
    e.preventDefault();
    try {
      const formData = new FormData(e.currentTarget);
      console.log(formData)
    } catch (error) {
      console.log(error);
    }
  };

  return (
    
      
        type="file" accept="video/*,.mkv" name="video" />
        type="submit" />
      
    
  );
};

export default App;

In the above code, we have used an input tag that will accept the video and name it as video. This name will be appended to the FormData object.

While sending the video to the server, we need to send it as a key-value pair, where the key is a video and the value is the file data.

Why key-value pairs? Because when the server receives the request, it needs to parse the incoming chunks. After parsing, the video data will be available in req.files[key], where the key is the name we have assigned in the frontend (video in this case).

This is why we are using the FormData object. When we create a new FormData instance and pass e.target to it, all the form fields and their names will automatically be available as key-value pairs.

Server Setup

Now that we have our API key, let's set up the backend server. This server will handle video uploads from the frontend and communicate with the Gemini API for subtitle generation.

Navigate to server folder:

cd server

And initialize the project:

npm init -y

Then install the necessary packages:

npm install express dotenv cors @google/generative-ai express-fileupload nodemon

These are the back-end dependencies we’re using in this project:

express: The web framework for creating the backend API.
dotenv: Loads environment variables from a .env file.
cors: Enables Cross-Origin Resource Sharing, allowing your frontend to communicate with your backend.
@google/generative-ai: The Google AI library for interacting with the Gemini API.
express-fileupload: Handles file uploads, making it easy to access uploaded files on the server.
nodemon: Automatically restarts the server when you make changes to your code.

Set Up the Environment Variables

Now, create a file called .env. This is where you’ll manage your API keys.

//.env
API_KEY = YOUR_API_API
PORT = 3000

Update the `package.json`

For this project, we are using ES6 modules instead of CommonJS. To enable this, update your package.json file with the following code:

{
  "name": "server",
  "version": "1.0.0",
  "main": "index.js",
  "type": "module",       //Add "type": "module" to enable ES6 modules
  "scripts": {
    "start": "node server.js",
    "dev": "nodemon server.js"    //configure nodemon
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "@google/generative-ai": "^0.21.0",
    "cors": "^2.8.5",
    "dotenv": "^16.4.7",
    "express": "^4.21.1",
    "express-fileupload": "^1.5.1",
    "nodemon": "^3.1.7"
  }
}

Basic Setup of Express

Create a file server.js. Now, let’s set up a basic Express application.

//  server/server.js

import express from "express";
import { configDotenv } from "dotenv";
import fileUpload from "express-fileupload";
import cors from "cors"

const app = express();

configDotenv();           //configure the env
app.use(fileUpload());    //it will parse the mutipart data
app.use(express.json());  // Enable JSON parsing for request bodies
app.use(cors())           //configure cors

app.use("/api/subs",subRoutes);  // Use routes for the "/api/subs" endpoint

app.listen(process.env.PORT, () => {   //access the PORT from the .env
  console.log("server started");         
});

In this code, we create an Express app instance and then load our environment variables. This is where we keep sensitive data like API keys secure. Next, we apply middleware functions: fileUpload prepares the server to receive uploaded videos, express.json allows us to receive JSON data, and cors enables communication between our frontend and backend.

We define a route (/api/subs) that will handle all requests related to subtitle generation. The specific logic for these routes will be defined in subs.routes.js. Finally, we start the server, telling it to listen for requests on the port specified in our .env file.

Now we need to create some folders to manage the code. You can also manage the entire code in a single file, but structuring it into separate folders and managing them all that way will be easier.

This is the final folder structure for the server:

server/
├── server.js
├── controller/
│   └── subs.controller.js
├── gemini/
│   ├── gemini.config.js
├── routes/
│   └── subs.routes.js
├── uploads/
├── utils/
│   ├── fileUpload.js
│   └── genContent.js
└── .env

Note: Don’t worry about creating this folder structure now. This is just for reference. Follow along with me step by step, and we will build this structure together.

Create the Routes

Now create a routes folder and then create subs.routes.js:

// server/routes/sub.routes.js

import express from "express"
import { uploadFile } from "../controller/subs.controller.js"    // import the uploadFile function from the controller folder

const router = express.Router()

router.post("/",uploadFile)    // define a POST route that calls the uploadFile function

export default router     // export the router to use in the main server.js file

This code defines the routes for our server, specifically the route that handles video uploads and subtitle generation.

We create a new router instance using express.Router(). This allows us to define routes separate from our main server file, improving code organization. We define a POST route at the root path ("/") of our API endpoint. When a POST request is made to this route (which will happen when a user submits the video upload form on the frontend), the uploadFile function is called. This function will handle the actual upload and subtitle generation.

Finally, we export the router so that it can be used in our main server file (server.js) to connect this route to the main application.

Configure Gemini

Now, let's configure how our application will interact with Gemini.

Create a gemini folder and then create a new file called gemini.config.js:

//  server/gemini/gemini.config.js

import {
  GoogleGenerativeAI,
  HarmBlockThreshold,
  HarmCategory,
} from "@google/generative-ai";
import { configDotenv } from "dotenv";
configDotenv();

const genAI = new GoogleGenerativeAI(process.env.API_KEY);  // Initialize Google Generative AI with the API key

const safetySettings = [
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_NONE,
  },
];

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-flash-001",    //choose the model
  safetySettings: safetySettings,   //optional safety settings
});

export default model;    //export the model

In the code above, the safetySettings are optional. These settings allow you to define thresholds for potentially harmful content (like hate speech, violence, or explicit material) in Gemini's output.

You can read more about Gemini’s safety settings here.

Create a Controller to Handle Endpoint Logic

Now, create a controller folder, and inside it create a file named subs.controller.js. In this file, you'll handle the endpoint logic for interacting with the Gemini model.

In server/controller/subs.controller.js, add this code:

// server/controller/subs.controller.js

import { fileURLToPath } from "url";
import path from "path";
import fs from "fs";

const __filename = fileURLToPath(import.meta.url);  //converts the module URL to a file path
const __dirname = path.dirname(__filename);   //get the current file directory

export const uploadFile = async (req, res) => {
  try {
    if (!req.files || !req.files.video) {   //if there is no file available, return error to the client
      return res.status(400).json({ error: "No video uploaded" });
    }

    const videoFile = req.files.video;   //access the video
    const uploadDir = path.join(__dirname, "..", "uploads");   //path to upload the video temporarily

    if (!fs.existsSync(uploadDir)) {   //check if the directory exists
      fs.mkdirSync(uploadDir);      //if not create a new one
    }

    const uploadPath = path.join(uploadDir, videoFile.name);  

    await videoFile.mv(uploadPath);  //it moves the video from the buffer to the "upload" folder

    return res.status(200).json({ message:"file uploaded sucessfully" });
  } catch (error) {
    return res
      .status(500)
      .json({ error: "Internal server error: " + error.message });
  }
};

Since we are using an ES6 module, the __dirname is not available by default. The file handling mechanism is different compared to CommonJS. Because of this, we’ll use fileURLToPath to handle file paths.

We moved the file from the default temporary location which is the buffer to the uploads folder.

But the file upload process is not yet complete. We still need to send the file to Google AI File Manager, and after uploading, it will return a URI. This URI will then be passed to the model for video analysis.

How to Upload a File to the Google AI File Manager

Create a folder utils and create a file fileUpload.js. You can refer to the folder structure provided above.

//  server/utils/fileUpload.js

import { GoogleAIFileManager, FileState } from "@google/generative-ai/server";
import { configDotenv } from "dotenv";
configDotenv();

export const fileManager = new GoogleAIFileManager(process.env.API_KEY);  //create a new GoogleAIFileManager instance

export async function fileUpload(path, videoData) {  
  try {
    const uploadResponse = await fileManager.uploadFile(path, {   //give the path as an argument
      mimeType: videoData.mimetype,  
      displayName: videoData.name,
    });
    const name = uploadResponse.file.name;
    let file = await fileManager.getFile(name);    
    while (file.state === FileState.PROCESSING) {     //check the state of the file
      process.stdout.write(".");
      await new Promise((res) => setTimeout(res, 10000));   //check every 10 second
      file = await fileManager.getFile(name);
    }
    if (file.state === FileState.FAILED) {   
      throw new Error("Video processing failed");
    }
    return file;   // return the file object, containing the upload file information and the uri
  } catch (error) {
    throw error;
  }
}

In the code above, we created a function called fileUpload that takes two arguments. These arguments will be passed from the controller function, which we'll set up later.

The fileUpload function uses the fileManager.uploadFile method to send the video to Google's servers. This method needs two arguments: the file path and an object containing metadata about the file (its MIME type and display name).

Because video processing on Google's servers takes time, we need to check the file's status. We do this using a loop that checks the file's state every 10 seconds using fileManager.getFile(). The loop continues as long as the file's state is PROCESSING. Once the state changes to either SUCCESS or FAILED, the loop stops.

The function then checks if the processing was successful. If so, it returns the file object, which contains information about the uploaded and processed video, including its URI. Otherwise, if the state is FAILED, the function throws an error.

Pass the URI to the Gemini Model

Now in the utils folder, create a file called genContent.js:

// server/utils/genContent.js

import model from "../gemini/gemini.config.js";
import { configDotenv } from "dotenv";
configDotenv();

export async function getContent(file) {
  try {
    const result = await model.generateContent([
      {
        fileData: {
          mimeType: file.mimeType,
          fileUri: file.uri,
        },
      },
      {
        text: "You need to write a subtitle for this full video, write the subtitle in the SRT format, don't write anything else other than a subtitle in the response, create accurate subtitle.",
      },
    ]);
    return result.response.text();
  } catch (error) {
    throw error;
  }
}

Import the model that we configured earlier. Create a function called getContent. The getContent function takes the file object (returned from the fileUpload function).

Pass the file URI and the mimi to the model. Then we’ll provide a prompt instructing the model to generate subtitles for the entire video in SRT format. You can also add your prompt if you want. Then return the response.

Update the `subs.controller.js` File

Finally, we need to update the controller file. We've created the fileUpload and getContent functions, and now we'll use them in the controller and provide the required arguments.

In the server/controller/subs.controller.js:

//  server/controller/subs.controller.js

import { fileURLToPath } from "url";
import path from "path";
import fs from "fs";
import { fileUpload } from "../utils/fileUpload.js";
import { getContent } from "../utils/genContent.js";

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

export const uploadFile = async (req, res) => {
  try {
    if (!req.files || !req.files.video) {
      return res.status(400).json({ error: "No video uploaded" });
    }

    const videoFile = req.files.video;
    const uploadDir = path.join(__dirname, "..", "uploads");

    if (!fs.existsSync(uploadDir)) {
      fs.mkdirSync(uploadDir);
    }

    const uploadPath = path.join(uploadDir, videoFile.name);

    await videoFile.mv(uploadPath);

    const response = await fileUpload(uploadPath, req.files.video);  //we pass 'uploadPath' and the video file data to 'fileUpload'
    const genContent = await getContent(response);   //the 'response' (containing the file URI) is passed to 'getContent'

    return res.status(200).json({ subs: genContent });   //// return the generated subtitles to the client
  } catch (error) {
    console.error("Error uploading video:", error);
    return res
      .status(500)
      .json({ error: "Internal server error: " + error.message });
  }
};

With this, the backend API is complete. Now, we'll move on to updating the front end.

Update the Front End

Our frontend currently only allows users to select a video. In this section, we'll update it to send the video data to our backend for processing. The frontend will then receive the generated subtitles from the backend and initiate a download of the .srt file.

Navigate to the client folder:

cd client

Install axios. We’ll use it to handle HTTP requests.

npm install axios

In the client/src/App.tsx:

//   client/src/App.tsx

import axios from "axios";

const App = () => {
  const handleSubmit = async (e: React.FormEvent): Promise<void> => {
    e.preventDefault();
    try {
      const formData = new FormData(e.currentTarget);
      // sending a POST request with form data
      const response = await axios.post(
        "http://localhost:3000/api/subs/",   
        formData
      );
// creating a Blob from the server response and triggering the file download
      const blob = new Blob([response.data.subs], { type: "text/plain" }); 
      const link = document.createElement("a");
      link.href = URL.createObjectURL(blob);
      link.download = "subtitle.srt";
      link.click();
      link.remove();
    } catch (error) {
      console.log(error);
    }
  };

  return (
    
      
        type="file" accept="video/*,.mkv" name="video" />
        type="submit" />
      
    
  );
};

export default App;

axios makes the POST request to your backend API endpoint (/api/subs). The server will process the video, and this might take some time.

After the server sends the generated subtitles, the frontend receives them as a response. To handle this response and allow users to download the subtitles, we'll use a Blob. A Blob (Binary Large Object) is a web API object that represents raw binary data, essentially acting like a file. In our case, the subtitles returned from the server will be converted into a Blob, which will then allow us to trigger a download in the user's browser.

Summary

In this tutorial, you learned how to build an AI-powered subtitle generator using Google's Gemini API, React, and Express. You can upload videos, send them to the Gemini API for subtitle generation, and provide the generated subtitles for download.

Conclusion

That's it! You've successfully built an AI-powered subtitle generator using the Gemini API. For quicker testing, start with shorter video clips (3-5 minutes). Longer videos might take more time to process.

Want to create a customizable video prompting application? Just add an input field to let users enter their prompts, send that prompt to the server, and use it in place of the hardcoded prompt. That's all it takes.

For more information about the Gemini API, refer to the official Gemini API Docs

You can find the full code here: AI-Subtitle-Generator

If there are any mistakes or you have any questions, contact me on LinkedIn or Instagram.

Thank you for reading!

Learn to Use the Gemini AI MultiModal Model

Beau Carnes — Thu, 22 Aug 2024 19:23:26 +0000

Gemini is a suite of AI models that can understand and generate human-like responses based on the input it receives.

We just published a Gemini course on the freeCodeCamp.org YouTube channel that is designed to guide you through the world of multimodal AI, focusing on building an application that can interpret images and answer questions about them.

Course Overview

In this course, led by the talented Ania Kubow, you'll learn how to use Google's Gemini MultiModal Model. This innovative AI model allows you to input both text and images, providing text-based responses that can enhance your applications' interactivity and functionality.

Here are some of the topics covered:

Introduction to Gemini: Understand the basics of Gemini, a series of multimodal generative AI models developed by Google. Learn how these models can process both text and image inputs to generate meaningful text responses.
Setting Up and Authentication: Get step-by-step guidance on setting up your development environment and obtaining your API key for secure access to the Gemini API.
Exploring Gemini Models: Dive into the different models available within the Gemini suite, such as gemini-pro and gemini-pro-vision, and learn how to use their methods to build applications that can see and understand images.
Building the App: Follow along as we build an application that can upload images, interpret them, and answer questions. You'll also learn how to implement a feature that generates random questions for enhanced user interaction.
Advanced Features: While the course focuses on the core functionalities, you'll also get a glimpse into advanced features like creating embeddings with the embedding-001 model, setting the stage for future exploration.

Understanding Gemini

Gemini is a groundbreaking series of multimodal generative AI models developed by Google, designed to revolutionize how we interact with artificial intelligence. These models are capable of processing both text and image inputs, making them incredibly versatile for a wide range of applications. Let's explore what makes Gemini unique and how it can be leveraged in your projects.

Unlike traditional models that are limited to text or image processing, Gemini's multimodal capabilities allow it to handle both simultaneously. This means you can input a text query, an image, or a combination of both, and receive coherent, contextually relevant text responses.

Key Features of Gemini Models

Multimodal Input Processing: Gemini models can accept text and images as input, providing a seamless way to interact with AI. This capability is particularly useful for applications that require understanding visual content alongside textual information.
Generative Responses: The models are designed to generate human-like text responses. Whether you're asking a simple question or engaging in a complex dialogue, Gemini can provide insightful answers.
Versatile Applications: From customer service bots to educational tools, the potential applications of Gemini are vast. Developers can create apps that not only answer questions but also provide detailed explanations, descriptions, and more.
API and App Integration: Gemini can be accessed via an intuitive app interface or through a robust API, allowing developers to integrate its capabilities into their own applications. This flexibility makes it easy to incorporate Gemini's features into existing workflows.

By integrating Gemini into your projects, you can enhance user experiences, streamline workflows, and unlock new opportunities in the realm of AI-driven applications. As you progress through this course, you'll gain hands-on experience with these models, learning how to harness their power to build innovative solutions.

Conclusion

Head over to the freeCodeCamp.org YouTube channel and start your journey with the Gemini AI MultiModal Model Course (1-hour watch).

Google Gemini Course for Beginners

Beau Carnes — Thu, 29 Feb 2024 18:47:51 +0000

Google Gemini is a cutting edge AI model that is a competitor of GPT-4.

We just released a course on the freeCodeCamp.org YouTube channel that will help you harness the power of the advanced AI technology.

Developed by the popular instructor Ania Kubów, this beginner's course offers a deep dive into Google's AI model and the Gemini API. Whether you're aspiring to build your own AI chatbot or simply curious about the potentials of large language models (LLMs), this course has got you covered.

The course is designed to be accessible, providing a thorough introduction to Google Gemini and its applications. Starting with the basics, it covers what AI is, delves into Large Language Models (LLMs), and guides you through the process of obtaining your API key—a crucial step in interacting with Gemini's capabilities.

What You Will Learn

The different sections of this course will cover the following topics:

Introduction to Google Gemini: Uncover what Google Gemini is and why it's a significant tool in the realm of AI development.
Understanding AI and LLMs: Get a solid foundation in artificial intelligence and Large Language Models, crucial for grasping the capabilities and applications of Gemini.
Getting Your API Key: A step-by-step guide on how to obtain your API key, enabling you to start experimenting with Gemini.
Exploring Models: Learn about the different models available within Gemini and how to choose the right one for your project.
Initializing the Generative Model: Understand how to initialize Gemini's generative model for your applications.
Diverse Functionalities: Dive into the functionalities offered by Gemini, including text-to-text, text/image-to-text, text-to-chat, and text-to-embedding conversions.
Building an AI Code Buddy: The course culminates in a hands-on project where you'll build an AI chatbot, showcasing the practical application of the skills you've learned.

Why Google Gemini?

Google Gemini represents a significant leap forward in AI technology, offering advanced multimodal reasoning, planning, understanding, and more. It's a tool that not only developers but also creatives, researchers, and businesses can leverage to unlock new potentials and solutions. This course is your gateway to mastering Gemini, enabling you to create, innovate, and solve complex problems with AI.

This course offers a comprehensive and accessible pathway into the world of AI and chatbot development. Watch the full course on the freeCodeCamp.org YouTube channel (1.5 hour watch).

gemini - freeCodeCamp.org

Agentic Coding with the Gemini CLI

How to Build an AI Coding Agent with Python and Gemini

Prerequisites

Table of Contents

What Does the Agent Do?

Learning Goals

Python Setup

How to Integrate the Gemini API

Tokens

Command Line Input

Message Structure

Roles

Verbose Mode

How to Build the Calculator Project

Agent Functions

Get File Content Function

Write File Function

Run Python Function

System Prompt

Function Declaration

More Function Declarations

Function Calling

Building the Agent Loop

Conclusion

How to Build an AI-Powered Cooking Assistant with Flutter and Gemini

Building an AI-Powered Cooking Assistant with Flutter and Gemini

Prerequisites

Here’s what we’ll cover:

How to Get Your Gemini API Key

Set Up Your Flutter Project and Dependencies

Project Structure

1. The core Folder

The extensions Folder

The constants Folder

The shared Folder

2. The infrastructure Folder

image_upload_controller.dart:

recipe_controller.dart:

3. The presentation Folder

screens/home_screen.dart:

The components Folder

toast_info.dart

The widgets Folder

Summary of Code Implementation

Permissions: Ensuring App Functionality and User Privacy

Android1 Permissions (in android/app/src/main/AndroidManifest.xml)

iOS Permissions (in ios/Runner/Info.plist)

Assets: Managing Application Resources

App Icons: Customizing Your Application's Identity

pubspec.yaml Configuration for flutter_launcher_icons

Generating App Icons

Splash Screen: The First Impression

pubspec.yaml Configuration for flutter_native_splash

Generating the Splash Screen

Screenshots from the App

Wrapping Up

References

Flutter Packages

APIs and Platforms

How to Create an AI-Powered Bot that Can Post on Twitter/X

Table of Contents

Prerequisites

How to Build the Bot

Step 1: Generate the Twitter API Key

Step 2: Generate Access Token and Secret

Step 3: Generate an API Key in Google Gemini

Node.js Project Setup

Step 1: Fetch Data from the API

Step 2: Upload the Data as a File to Gemini API

Step 3: Prompt Gemini to Get the Latest AI News

Step 4: Post Using the Twitter/X API

Step 5: Delete the File Uploaded in the Gemini API

The Result

Conclusion

How to Build a Video Subtitle Generator using the Gemini API

Table of Contents

Prerequisites

What is the Gemini API?

How to Get Your API Key

1. The `core` Folder

The `extensions` Folder

The `constants` Folder

The `shared` Folder

2. The `infrastructure` Folder

`image_upload_controller.dart`:

`recipe_controller.dart`:

3. The `presentation` Folder

`screens/home_screen.dart`:

The `components` Folder

`toast_info.dart`

The `widgets` Folder

Android¹ Permissions (in `android/app/src/main/AndroidManifest.xml`)

iOS Permissions (in `ios/Runner/Info.plist`)

`pubspec.yaml` Configuration for `flutter_launcher_icons`

`pubspec.yaml` Configuration for `flutter_native_splash`

Update the `package.json`

Update the `subs.controller.js` File