Friday, 14 February 2025

Setting Up an NES Emulator for AI reinforcement learning: research

In this article, I explore what the process might look like of setting up an NES emulator and integrating it with an AI model for reinforcement learning. 

This post is based on a conversation with the Llama3.1 Large Language model to assist me in getting started. This blog post is not written by the AI model, I spent a significant amount of time going over the conversation and my own subsequent actions to write this content.

During my evenings research and documenting, I bounced around a lot between scoping, environment set up and planning of next steps, so please bear with me. Here, I am trying strike a balance between documenting "how I got here" and setting out a more logical approach.

Due to my complete lack of knowledge of data science, any kind of learning models, or indeed real-world programming, I started looking at "Off-the-shelf" applications and tools that can assist you in the training process:

1. OpenAI Gym: A popular, open-source toolkit for developing and testing reinforcement learning models. 

2. TensorFlow: A widely used machine learning library developed by Google. You can leverage TensorFlow's built-in support for neural networks, including LLaMA3.

3. Keras: A high-level neural networks API, written in Python, capable of running on top of TensorFlow or Theano. Keras provides an easy-to-use interface for building and training models.

4. PyTorch: Another popular machine learning library, known for its simplicity and ease of use. PyTorch has built-in support for LLaMA3 and other neural network architectures.

5. RLlib: A reinforcement learning library developed by the Google Brain team. RLlib provides a range of tools and algorithms for training agents in complex environments, including those with video game-like settings.

PyTorch

I opted for PyTorch as I have the most amount or experience with Python (little, compared to none with the other options) and my PC already has Python components installed.

According to Llama, the first step in creating a neural network-based model to learn how to play Super Mario Bros. 1 using LLaMA3 with PyTorch would be:

Install PyTorch, relevant libraries and Verify the installation

With PyTorch and LLaMA3 installed, you can start exploring the libraries' documentation and examples to learn how to create a neural network-based model for Super Mario Bros. 1. You'll likely need to:

  • Set up an environment for Super Mario Bros. 1.
  • Prepare the necessary data (e.g., screen captures, game states) to train your LLaMA3 model.
  • Define a neural network architecture using PyTorch and LLaMA3.

I went back and forth with Llama for a while about package managers and environment setup. The first was Conda Vs venv as I've used venv in the past, but not Conda. Here's what it had to say:

The main difference between conda and venv lies in their design goals, features, and underlying technology.

`conda` (Continuum Analytics)

  • Designed primarily for data science and scientific computing
  • Focuses on reproducibility, dependency management, and package installation
  • Uses a package manager system (similar to pip) to install packages
  • Supports Python 3.6+ and provides binary wheels for popular packages
  • Can manage multiple environments simultaneously (e.g., `conda env list`)
  • Provides additional features like environment activation, updating, and removal

`venv` (Python Standard Library)

  • Designed specifically for Python development
  • Focuses on creating isolated Python environments with minimal dependencies
  • Uses the `site-packages` directory to store installed packages
  • Supports Python 3.3+ and provides a simple, lightweight way to create virtual environments
  • Can be used to create multiple environments (e.g., `python -m venv my_env1`, `python -m venv my_env2`)
  • Provides basic features like environment activation, updating, and removal

Key differences:

  1. Package manager: Conda uses a package manager system, while `venv` relies on pip to install packages.
  2. Reproducibility: Conda emphasizes reproducibility by storing package versions and dependencies, making it easier to recreate environments.
  3. Binary wheels: Conda provides binary wheels for popular packages, which can speed up installation times.
  4. Multi-environment support: Conda allows you to manage multiple environments simultaneously, while `venv` requires separate commands to create and manage each environment.

When to use each:

Use `conda` if:

  • You're working with data science or scientific computing projects
  • You need advanced package management features
  • You want to focus on reproducibility and dependency management

Use `venv` if:

  • You're working on a Python development project (e.g., web scraping, scripting)
  • You prefer a lightweight, simple way to create virtual environments
  • You don't need advanced package management features


Getting Started with PyTorch: A Low-Barrier Entry Point

Considering you have limited real-world programming experience, I recommend starting with a simple yet powerful PyTorch concept: Tensors.

Tensors are the fundamental building blocks of PyTorch. They're multi-dimensional arrays that can be used to represent various types of data, such as images, audio, or text. By working with tensors, you'll get familiar with PyTorch's core functionality and syntax.


Here's why I suggest starting with tensors:

  1. Easy to grasp: Tensors are a natural fit for beginners, as they're similar to Python lists or NumPy arrays.
  2. Immediate results: You can see the effects of your code quickly, making it an excellent way to get started.
  3. Foundation for more complex concepts: Understanding tensors is essential for working with PyTorch's more advanced features, such as neural networks and optimizers.

Based on this information, I opted for Conda as it seemed like a more sensible choice, given the nature of the project.

I started looking at the dependencies and requirements for Anaconda, particularly Disk requirements as I don't have a great deal of space on my main drive, normally opting to have data on a separate drive and linking back with configuration parameters, or symlinks. During this investigation, I came across Miniconda.

Miniconda vs Anaconda: A Brief Comparison

Difference in scope and size:


1. Anaconda: A full-fledged distribution that includes:

        * Over 7,000 packages from the Anaconda Cloud (including popular ones like NumPy, pandas, scikit-learn, etc.)

        * Various tools and libraries for data science, machine learning, and more

        Total size: around 5 GB

2. Miniconda: A lightweight version of Anaconda that includes:

        * Only the most essential packages (e.g., NumPy, pandas) to get you started with minimal overhead

        Total size: around 440 MB


Given that I wont require all 7000 packages and I certainly don't want them on my main drive, I opted for Miniconda, happy that I can install components as I need them.


Setting Up a Python Environment

Open a terminal or command prompt and run the following command:


conda create --name gamelearning

This will create a new environment named gamelearning. To start using the new environment, activate it by running:


coda activate gamelearning


You should see `(gamelearning)` at the beginning of your command prompt or terminal. This indicates that you're now working within the new environment.


Install the required packages

Once you've activated the new environment, install the necessary packages using `pip` (or `conda` if you prefer). For this project, you'll need:


conda install -c conda-forge pytorch torchvision


This will install PyTorch and its dependencies within the new environment.

Tips and Variations:

  • You can create multiple environments for different projects or purposes.
  • To list all available environments, run `conda info --envs`.
  • To remove an environment, run `conda env remove --name my_env` (replace `my_env` with your desired environment name).
  • Consider using a virtual environment manager like `venv` if you're not comfortable with `conda`.


By following these steps, you'll have a dedicated Python environment for your project, keeping things organised and separate from the global Python environment.


Its at this point I took a step back to clarify some things. This is what Llama had to say:


Next Steps for the Project


Define your project scope: Clarify what you want to achieve with the project. Is it a classification problem, regression task, or something else?

Choose your dataset: Select a suitable dataset for your project. You can use publicly available datasets like MNIST, CIFAR-10, or IMDB, or download a dataset specific to your domain.

Design your neural network architecture: Based on your project scope and dataset, design a neural network architecture that's suitable for the task. This might involve choosing the right number of layers, hidden units, and activation functions.

Implement your model: Write Python code using PyTorch to implement your chosen architecture. You can start with a simple model and then modify it as needed.

Train and evaluate your model: Use PyTorch's built-in functionality (like `torch.nn.CrossEntropyLoss()` and `torch.optim.Adam()`) to train your model on the dataset, and evaluate its performance using metrics like accuracy, precision, or F1-score.

Experiment with hyperparameters: Tune hyperparameters like learning rate, batch size, and number of epochs to improve your model's performance.


Tips:

Start simple and gradually add complexity as needed.

  • Use Jupyter Notebooks to visualize and explore your data, models, and results.
  • Leverage PyTorch's built-in functionality and pre-trained models (like torchvision.models) to speed up development.
  • Don't hesitate to ask for help or search online resources if you encounter issues or need guidance.
  • Remember, the key to success is patience, persistence, and experimentation!


Project Scope: Training an OLLAMA Model to Play Super Mario Bros.

I want to train an existing OLLAMA model (LLaMA3) to play Super Mario Bros. 1 on an emulated environment within the same machine. I plan to document the process and screen record the progress of the model when it is learning to play the game.


To clarify, your project scope involves:

  • Using LLaMA3: You'll be training a language model using the LLaMA3 architecture, which is designed for text-to-text tasks.
  • Playing Super Mario Bros. 1: Your goal is to train the OLLAMA model to play Super Mario Bros. 1, a classic platformer game. This requires generating text commands that control the game's actions (e.g., movement, jumping, and power-ups).
  • Emulation: You'll emulate the Super Mario Bros. 1 game on the same machine where you're training the model. This will allow you to interact with the game and provide input for the OLLAMA model.
  • Documentation and screen recording: You want to document your progress, including any challenges or successes, and create a screen record of the training process.


To get started, let's outline some key steps:

Prepare the environment: Set up an emulator for Super Mario Bros. 1 on your machine.

Design the input and output formats: Determine how you'll provide input to the OLLAMA model (e.g., game state information, user commands). Also, decide on the format of the output text commands that control the game's actions.

Prepare the training data: Collect a dataset of relevant text inputs and corresponding outputs (game states) for training the OLLAMA model. This might involve playing the game manually, taking notes, or using existing resources (e.g., Super Mario Bros. 1 level maps).

Train the OLLAMA model: Use LLaMA3 to train your OLLAMA model on the prepared dataset.

Test and refine the model: Evaluate the trained model's performance by playing Super Mario Bros. 1 with it. Refine the model as needed to improve its control of the game.


Some potential challenges you might encounter:

  • Game complexity: Super Mario Bros. 1 is a challenging game, especially when considering the vast number of possible game states and actions.
  • LLaMA3 limitations: The OLLAMA3 model may not be well-suited for this task due to its original design for text-to-text tasks rather than controlling game actions.
  • Emulation quirks: You might encounter issues with the emulator, such as compatibility problems or lag, which could affect your training process.


Now that we have a clear project scope, are you ready to start preparing the environment and designing the input and output formats?

Given what you have mentioned about Llama3s limitations, do you believe I should use something different and if so, what would you suggest?


Considering Alternative Models

Given LLaMA3's limitations in controlling game actions, it might not be the best choice for this task. Let's explore alternative models that could be more suitable:

Reinforcement Learning (RL) models: RL models are designed to learn from interactions with an environment and make decisions based on rewards or penalties. They're well-suited for tasks like game playing, where the goal is to maximize a reward function. You can use libraries like PyTorch or TensorFlow to implement RL models.

Generative Adversarial Networks (GANs): GANs are capable of generating data that resembles a specific distribution. In this case, you could use a GAN to generate game actions based on the game's state. This approach might require more work in terms of designing the generator and discriminator networks.

Deep Q-Networks (DQN) or Policy Gradient Methods: DQNs and policy gradient methods are both variants of RL models that can be used for control tasks like playing games. They're designed to learn a policy that maximizes rewards.


When choosing an alternative model, consider the following factors:


  • Model complexity: Some models (like GANs) might require more complex architecture design and training procedures compared to simpler models like DQN.
  • Training data requirements: You may need to collect additional training data for some models, such as RL or GANs, whereas others (like DQN) can learn from relatively small datasets.
  • * **Computational resources**: Some models might require more computational resources than others due to their complexity or the amount of data they process.


Before committing to a specific model, I recommend exploring each option's strengths and limitations. You may also want to consider your own expertise and the resources available to you when deciding which model to use.

So now I have a design brief to set parameters for the project, albeit fairly loosely. I have decided on a development environment and largely decided what modules I'm going to use, at least for the time being.

It was at this point Llama wanted to get down and dirty:

"How do you feel about starting with tensors? Are you ready to dive in and get a feel for PyTorch's core functionality?"

 I really wasn't ready for that yet. Before that I would like to discuss the method with which the model will control the game.

 

Control Method: How Will the Model Interact with the Game?


Before diving into the Python script, let's discuss how you want the model to control the game. This is crucial for designing an effective interaction mechanism.


Here are some possible methods:


  1. Action Space: Define a set of predefined actions (e.g., move left, jump, fire) and assign probabilities or weights to each action based on the model's output. The agent will choose an action based on these probabilities.
  2. Continuous Control: Use a continuous control approach, where the model outputs a vector representing the game state (e.g., position, velocity, power-ups). This vector can be used to directly control the game.
  3. Discrete State-Based Control: Divide the game into discrete states (e.g., Mario's position on the map) and have the model output a probability distribution over these states. The agent will choose the next state based on these probabilities.
  4. Hybrid Approach: Combine two or more of the above methods to create a hybrid control scheme.


Consider the following factors when choosing a method:


  • Game complexity: If the game has many possible actions or complex interactions, a discrete action space might be more suitable.
  • Model complexity: If your model is simple and outputs a small number of possible actions, an action space approach could work well. If your model is more complex and outputs


continuous values, a continuous control method might be better.

  • Training data availability: If you have limited training data, a discrete state-based control approach can help reduce the dimensionality of the problem.


Which control method do you think would be most suitable for controlling Super Mario Bros.?


 As far as training data goes, I assume I will be starting from zero. 

 The controls will be:

  1. move left
  2. move right
  3. jump 
  4. fire (when available with a powerup)


I guess, from analysing my own actions from playing the game, that there should only be 1-2 actions performed at the same time. like:

  1. moving forward is 1
  2. moving forward and jumping over a gap or up a hill will be two


I also assume that the model will have to learn not only those controls, but also a combination and a pattern for common behaviours. 


Starting from Scratch: Training Data Considerations


You're correct in assuming you'll start from zero training data. Given your control scheme and game mechanics, let's revisit the considerations:

1. Action Space: With only 4 actions (move left/right, jump, fire with power-up), a discrete action space approach seems suitable.

2. State Representation: Since the model needs to learn patterns and combinations of controls, you might want to represent the game state as a combination of:
  • Mario's position (x, y)
  • Velocity (dx, dy)
  • Power-ups available
  • Obstacles or enemies present
      
3. Episode Structure: Define how each episode will be structured:
  • Start with Mario at a random position
  • The model takes actions to navigate the level
  • Episode ends when Mario reaches a goal (e.g., flagpole) or dies
     
4. Reward Function: Design a reward function that encourages the model to learn optimal control patterns. For example:
  • Reach the goal: +1 reward
  • Die: -1 reward
  • Collect power-ups: +0.5 reward
 
5. Exploration vs. Exploitation: Since the model needs to learn complex patterns and  combinations, you might want to use exploration techniques (e.g., epsilon-greedy) to balance exploitation (choosing actions based on current knowledge) with exploration (trying new actions to learn more).


Considering these points, your approach seems well-suited for learning control patterns and combinations. By representing the game state in a meaningful way and using a reward function that encourages optimal behaviour, you can train an effective model.

How do you envision generating training data? Will you use a custom script, record human game play, or employ a different method?

Given what Llama has said, I assume the majority of the training data would be provided by the model playing the game. This sounds like it would take a long time to get going properly. It would probably be better to give the model a few YouTube videos of game play to start with.                                                    

Self-Play vs. Data-Augmented Training                                                                                                    

Training solely through self-play could be time-consuming and potentially slow progress. Incorporating prerecorded gameplay data from YouTube videos or other sources can significantly speed up the training process. This approach is known as **data-augmented training**.        


By providing the model with a dataset of labelled or unlabelled game states, you can:

  1. Reduce exploration time: The model can learn from the provided data and reduce the need for self-play, which can be computationally expensive.                                     
  2. Focus on learning patterns: With a dataset to work from, the model can focus on learning patterns and combinations of controls, rather than spending time exploring the game environment.

To incorporate pre-recorded game play data:

  1. Extract relevant frames: Use tools like OpenCV or Pillow to extract specific frames from the videos that show Mario's position, power-ups, obstacles, and other relevant features.
  2. Create a dataset: Organise the extracted frames into a dataset with possible labels (e.g., actions taken, game state).
  3. Train the model: Use this dataset as input for your model, in addition to or instead of self-play.


Keep in mind that:

  1. Data quality is crucial: Ensure that the pre-recorded data is accurate and representative of the game play experience.
  2. Model architecture matters: Choose a suitable model architecture that can learn from the provided data and generalise well to new situations.


In your case, using YouTube videos or other sources for data-augmented training could be an effective way to speed up the learning process and provide a solid foundation for self-play.


What do you think about this approach? Would you like to explore other options or refine this idea further?


So far I like this approach. Give the model some videos on other people playing the first level of the game (I'm thinking something along the lines of a record holding speed run) and allow it to use that to augment its own learning. 


Now, how do I get the model to actually "see" the game to control Mario so that it can learn to play the game?


Observation Method: How Will the Model "See" the Game World?


With data-augmented training, you'll need a way for the model to observe and process the game environment. Here are some options:


1. Pixel-based observation: Treat each frame as a 2D array of pixels (e.g., RGB values) and feed them into your model. This approach is simple but might not provide rich information about the game state.

2. Game state extraction: Use libraries or tools to extract relevant game state features, such as:

  •  Mario's position (x, y)
  •  Power-ups available
  •  Obstacles or enemies present
  •  Velocity and direction of movement

3. Scene understanding: Employ a computer vision approach to understand the scene, using techniques like:

  • Object detection: identify specific objects like power-ups, coins, or enemies
  • Scene segmentation: separate game elements into distinct regions (e.g., platforms, pipes, etc.)

4. Text-based observation: If you're comfortable with text-based representations, you can describe the game state as a sequence of text tokens, such as:

  • "Mario at (x, y) with power-up A"
  • "Enemy B present at (x, y)"
  • "Platform C at (x, y)"


Choose an observation method that best fits your needs and model architecture. Pixel-based observation is relatively simple, but it might not provide rich information about the game state. 

Game state extraction or scene understanding approaches can provide more context, but may require additional processing and computation.


Which observation method do you think would be most effective for your project?


I'm trying to maintain realistic expectations, given the limitations of both my knowledge and the fact that I have a modest pc (Intel i5 11400, 16GB of RAM, NVIDIA RTX 4060TI with 16 GB of RAM). 

I am aware that this will take a significant amount of time affect the probability of success when using these different methods


PC Capabilities: Impact on Probability of Success


Considering your modest PC specifications and acknowledging that this project will require significant computational resources, let's assess the likelihood of success for each method:


  • Pixel-based observation: This approach is relatively lightweight in terms of computational requirements. Your RTX 4060TI GPU should be able to handle pixel-based processing efficiently. Probability of success: High
  • Game state extraction: This method requires more computation power, as you'll need to process and extract relevant game state features from the frame data. Your PC's i5 CPU and 16GB RAM might struggle with this approach, especially if you're dealing with high-resolution images or large datasets. Probability of success: Medium
  • Scene understanding: This approach is likely to be the most computationally intensive, as it requires complex computer vision techniques and object detection algorithms. Your PC's hardware might not be sufficient for handling these demands efficiently. Probability of success: Low
  • Text-based observation: This method should be relatively lightweight in terms of computational requirements, as you're only processing text data. However, the quality of your text-based representation will depend on how well you can describe the game state accurately and consistently. Probability of success: Medium


Keep in mind that these probability assessments are based on your PC's specifications and the complexity of each approach. The actual performance may vary depending on the specific implementation, imitation, and the amount of data processed.


To increase the chances of success, consider:


  • Optimising your code for efficiency
  • Using lightweight libraries or frameworks
  • Reducing the size of your datasets or frames
  • Focusing on a specific aspect of the game state (e.g., object detection instead of scene understanding)
  • Considering cloud-based solutions or distributed computing approaches to offload computationally intensive tasks


How do you plan to address the computational challenges and optimise your approach for success?


I'm open to exploring any or all of these options, predominantly as I have little frame of reference for these. I am generally interested in making my code as efficient as possible, utilising lightweight libraries and frameworks that are fit for purpose.


I can already think of some aspects of the "game state" that we can disregard, like objects on screen that do not affect the game. I would particularly like to avoid cloud computing for this project, however.


Efficiency and Optimisation: Aiming for Success


1. Code optimisation: Use techniques like loop unrolling, cache-friendly memory access, and reducing redundant computations to minimise processing time.


2. Lightweight libraries and frameworks: Explore libraries that are designed for performance, such as:

  • Lightweight neural network frameworks like TensorFlow-Lite or PyTorch-Light
  • Optimized computer vision libraries like OpenCV-Optimized or OpenVINO


3. Focusing on essential game state features: Identify the most critical aspects of the game state and prioritise those in your model. Discarding irrelevant information can significantly reduce computational demands.


4. Simplifying the problem: Consider breaking down the project into smaller, more manageable components. This might help you focus on specific challenges and optimise individual parts before moving forward.


To further optimise your code:


  • Profile your code: Use profiling tools (e.g., Intel VTune Amplifier or GPU PerfWorks) to identify performance bottlenecks and hotspots.
  • Use parallel processing: Leverage multi-threading or multi-processing techniques to distribute computational tasks across multiple cores, taking advantage of your PC's i5 CPU.
  • Optimize data structures: Choose data structures that minimize memory access and optimise cache utilization, reducing the load on your RAM.


By applying these strategies, you can significantly improve the performance of your code and make it more feasible for execution on your modest PC.


Before I start looking into that sort of thing, however, I want to do set up the game emulation environment. I would prefer to use open-source applications where possible IE. I already have retroarch installed, but I am open to options. 

I think once and emulator has been decided, this might narrow down which "hooks"  can best be utilised for the model to control the game state.


Dutifully, Llama started listing options. It sounds like it got a little confused, or was unable to infer from context. To my knowledge, SMB1 was only officially released on the Nintendo Entertainment System (or Famicom), but may have just thought of "Nintendo" as any of the consoles from that Manufacturer. The rest of the advice was fairly solid though and I was able to go off and do my own research on what other people used in this space.


Setting up the Game Emulation Environment


  1. RetroArch: You already have it installed, and it's a popular choice among retro gaming enthusiasts. RetroArch is highly configurable, with support for various emulators and plugins.
  2. Mupen64Plus: A popular emulator for Nintendo 64 games, Mupen64Plus has a strong focus on accuracy and compatibility. It also supports plugins and scripting, which could be useful for your project.
  3. Dolphin: An open-source emulator for GameCube and Wii games, Dolphin is known for its high performance and accuracy. It also has a built-in scripting engine that could be helpful.


For this project, you might want to focus on using an emulator that:

  • Has good support for plugins or scripting
  • Allows for control over various aspects of the game state (e.g., position, speed, power-ups)
  • Is compatible with your target system (Windows, macOS, Linux)


Once you've chosen an emulator, you can start exploring its capabilities and identifying potential "hooks" that will allow your model to interact with the game state.


Some possible hooks to consider:


  • Controller inputs: Modify controller input values to manipulate game actions (e.g., movement, jumping, power-ups)
  • Cheats: Use built-in cheats or create custom ones to modify game variables (e.g., score, health, position)
  • Scripting engines: Utilise the emulator's scripting engine to write custom scripts that interact with the game state
  • Plugin interfaces: Leverage plugin interfaces to inject your model's outputs into the game state


Which emulator do you think you'll choose, and what kind of hooks are you most interested in exploring?


RetroArch, however, is a versatile emulator that supports a wide range of consoles and systems, including the NES. Its modular design allows it to run various core emulators, each with its own strengths and weaknesses.


Some benefits of using RetroArch as your emulator:

  • NES support: As you mentioned, RetroArch has built-in support for playing NES ROMs.
  • Multi-system support: RetroArch can emulate a variety of systems, including other retro consoles like the Super Nintendo Entertainment System (SNES), Sega Genesis, and more.
  • Core customization: You can choose from various core emulators, each with its own settings and features, to optimize performance and accuracy for your specific use case.
  • Plugin support: RetroArch has a wide range of plugins available, which can enhance gameplay, provide additional features, or even allow you to create custom scripts.


Open-Source NES Emulators: Scripting and Plugin Support

In addition to RetroArch, there are a few other open-source NES emulators that offer scripting or plugin support:

  1. FCEUX: A highly customizable emulator with built-in scripting capabilities using Lua. FCEUX has a strong focus on accuracy and compatibility, making it an excellent choice for those who want to script their game play.
  2. Nestopia: Another popular open-source NES emulator with a focus on accuracy and performance. Nestopia has a plugin system that allows you to extend its functionality with custom scripts or plugins.
  3. iNES: A lightweight, open-source NES emulator written in C++. iNES has a scripting interface using the Lua language and supports various plugins for things like cheat codes, savestates, and more.

All three of these emulators are capable of playing NES ROMs and have some level of scripting or plugin support. However, keep in mind that:

  • Sripting capabilities: FCEUX's Lua scripting is relatively advanced and flexible, while Nestopia's plugin system is more limited.
  • Plugin support: iNES has a more limited set of plugins compared to RetroArch or FCEUX.


If you're looking for a more straightforward scripting experience, FCEUX might be the best choice. If you prefer a more modular approach with plugins, Nestopia could be the way to go.

If you're interested in exploring Lua scripting and plugins, iNES is an excellent option.


I found that there is a fair amount of other research that uses FCEUX, so there does seem to be some credence on what its saying here.


It was at this point I had run out of time and brain power, so I decided to put a pin in it. 

I think I have made some good progress, at list with project scoping and tool selection.

The main thing is I did learn a lot from the conversation I had with it, along with my resultant research.

I have many projects on the go at the moment (FFMPEG PiStream, Surface Mount Soldering Proficiency and home network layout rework are the main ones to name a few) but I will start the process of deciding on an emulator, installing and configuring.

I will document that at some point and create a part 2.


If you got this far, firstly thanks for reading! Secondly, how did you find the from and format of this post? Its a bit different to how I usually form and this one took on a part discussion/interview and part documenting steps in research and deployment. I'd love to hear your feedback.

Thursday, 3 October 2024

Automating a Live Stream with Home Assistant Based on Sun Elevation

Up until just recently, I have been semi-manually starting and stopping the live stream.
Before I log off for the evening, I would calculate the number of seconds between that time and dawn then use that to create this command:

sleep <seconds> && sh script.sh

I would monitor the stream throughout the day (for mod actions like bot whack-a-mole and to count as an extra viewer :) )

Just after sunset when the light had dimmed, I would then attach to the session and terminate the script.

I found that sometimes I would either forget to do the first command, run the command from the wrong path, or miscalculate the duration.


With this in mind, I worked on an automation project using Home Assistant to control the live stream based on the sun's position. The goal was to automate the start and stop of the stream depending on the sun's elevation.

The Plan

I wanted the stream to run from just before sunrise to just after sunset. To achieve this, I decided to start the stream when the sun’s elevation reaches -8.0 degrees (during dawn) and stop it again when the sun’s elevation drops to -8.0 degrees after sunset (during dusk). This would mean the stream ran as the sun travelled across the sky, creating a seamless integration between the time of day and the stream.

Step 1: Controlling the Stream Remotely via SSH

The stream is controlled by two scripts on a remote host, which I can access over SSH. These scripts use tmux to ensure the stream continues running even if the connection to the remote host is lost.

  • Start Script: Checks for the existence of a tmux session. If the session doesn’t exist, it creates one. It then sends the command to the session to start the stream script.
  • Stop Script: Sends a command to the tmux session to stop the stream.

I set up Home Assistant to SSH into the host and trigger these scripts.

Step 2: Creating a new switch and adding SSH Commands in Home Assistant

To start with, I had to do the normal key-pair exchange on the home assistant command line, which creates the key file. In order for home assistant to use those keys, you need to copy them to the config folder.

To trigger these scripts, I created two command_line services in Home Assistant's configuration.yaml file:

yaml
command_line:
- switch: name: Pi_Stream command_on: "ssh -q -i /config/id_rsa -o UserKnownHostsFile=/config/known_hosts user@host sh start.sh" command_off: "ssh -q -i /config/id_rsa -o UserKnownHostsFile=/config/known_hosts user@host sh stop.sh"
  • The command_on: command connects to the remote host, runs the start script, which checks for the tmux session, and starts the stream.
  • The command_off: command connects to the remote host, runs the stop script, which sends the terminate script command to the session, halting the stream.

Step 3: Automating Based on Sun Elevation

The next step was to automate the stream using the sun's elevation. Home Assistant's built-in sun integration tracks the position of the sun, which can be used to trigger automations.

I created an automation that stops the stream when the sun's elevation reaches -8.0 degrees after sunset:



yaml
- id: '1727953002868'
alias: Stop Pi Stream description: When its dark triggers: - trigger: numeric_state entity_id: - sun.sun for: hours: 0 minutes: 1 seconds: 0 attribute: elevation below: -8 conditions: [] actions: - action: switch.turn_off metadata: {} data: {} target: entity_id: switch.pi_stream mode: single

Once this automation is triggered, the stream shuts down as dusk deepens. As the stream is currently running at the time of writing, I need to wait to see if this is successful.
Once this is confirmed to be working, I plan to create another automation to start the stream when the sun's elevation rises to -8.0 degrees in the morning.

My next task is to reconfigure how the chat bot works.
Currently, the bot runs from a script, which is running hourly as a cron job.
The plan is to take this out of cron and utilise the SSH function to get home assistant to fire off the script hourly, but only if the stream is running.
This way its not running all the time.

Summary

By integrating Home Assistant with SSH and tmux, I was able to fully automate the control of a video stream based on the sun's position (subject to successful testing in about 4.5 hours according to the MET office). This approach can be adapted to any scenario where a remote script needs to be triggered from Home Assistant. It opens the door to many possibilities, whether for controlling cameras, live streams, or other devices based on environmental factors like light and time of day.

If you're looking to combine the power of Home Assistant's automation with remote scripts, SSH and tmux are excellent tools to ensure your commands run reliably.

Wednesday, 2 October 2024

Automating Twitch Announcements Using Cron and Home Assistant

 

Recently, I integrated Twitch's API to automate sending announcements to my channel's chat. The goal was to set up a system where an hourly log, updated by Home Assistant's Sun integration, triggers a script that sends the last log entry as an announcement in my Twitch channel chat.

Setting Up the Twitch API

To begin, I created a Twitch application, which provided me with a Client ID and Client Secret. These are essential for making authenticated requests to the Twitch API. After creating the app, I needed to gain an Access Token with the appropriate scopes that would allow me to post chat messages.

Initially, I ran into an issue with the requested scope. After consulting Twitch's documentation, I learned that the correct scope for managing announcements had changed from channel:manage:announcements to moderator:manage:announcements. However, after further consideration, I decided to use the user:write:chat and user:bot scopes to simplify the integration.

Getting the Access Token

Using the Twitch OAuth flow, I generated an authorisation URL that included the necessary scopes. Once authorised, I exchanged the authorisation code for an access token using a simple curl command.

The access token allowed my script to communicate with Twitch’s API. Additionally, I retrieved my Broadcaster ID and Sender ID—important parameters for sending chat messages.

Automating Announcements

With the access token and IDs in hand, I wrote a bash script that:

  • Reads the latest log entry from a file (SunElevation.txt), which Home Assistant updates hourly.
  • Sends that entry as an announcement to my Twitch chat using Twitch’s chat API.

I configured the script to run hourly via cron, ensuring my channel stays updated with automated messages based on the Sun elevation data collected by Home Assistant.

Overcoming Common Issues

Throughout the setup, I encountered a few key hurdles:

  1. Authorisation Scope Mismatch: Initially, the scope for sending announcements was incorrect, but switching to user:write:chat and user:bot solved the issue.
  2. OAuth Flow and Redirect URI: I manually managed the OAuth flow, copying the authorisation code from the browser and exchanging it for the access token via the command line. Though the process works, I’ll explore automating this step in the future.
  3. Cron Job Automation: The final piece was setting up cron to run the announcement script hourly. With the SunElevation.txt being updated regularly, this ensures the announcements are always in sync with the current state of the sun.

Conclusion

This setup provides a seamless way to automate Twitch announcements based on data from Home Assistant. The ability to send messages via the Twitch API opens up countless possibilities for engaging viewers in a dynamic, automated way. Whether it’s updating viewers on the weather, system statuses, or other key data points, this method can be easily adapted to suit various needs.

Shout out to my good friend, best mod and overall tech wizard Peaeyeennkay, for helping me navigate the quagmire that is development documentation.

Please go check him out over at https://mastodon.social/@PeaEyeEnnKay

Stay tuned as I continue to refine and enhance this setup!

Monday, 23 September 2024

Streaming Setup: Integrating FFmpeg Overlays and Audio into a Picam feed

Lately, I’ve been setting up and refining a Raspberry Pi-based streaming setup, focusing on combining a video feed from a Raspberry Pi camera with overlay graphics and audio in real-time using ffmpeg. It’s been quite a journey, filled with trial and error as I worked through various technical challenges.

TL:DR Take me to:
The Twitch

I stumbled upon Restreamer (https://github.com/datarhei/restreamer) which runs in a container.
I deployed this to the Raspberry Pi and set about connecting everything up.

Initial Camera and Overlay Setup

I started by streaming a camera feed using rpicam-vid on a Raspberry Pi. The initial command streamed video at 1080p and 30 fps to a TCP connection:

rpicam-vid -t 0 --inline --listen -o tcp://0.0.0.0:8554 --level 4.2 --framerate 30 --width 1920 --height 1080 --denoise cdn_off -b 8000000

I was suitably able to add this to the restreamer software, add a secondary audio stream, connect it to a Twitch account and stream live.
Unfortunately the software has no mechanism for adding overlays to the resultant stream.

With this in mind I created another ffmpeg command that takes the TCP stream from the stream from the Pi, overlaid an image and added the contents of a text file mentioned above.

ffmpeg -loglevel debug -i tcp://192.168.1.54:8554 -i StreamOverlay.png \ -filter_complex "[0:v][1:v]overlay=0:0,drawtext=textfile='current_ track.txt':x=(w-text_w)/2:y=h-50:fontcolor=green:fontsize=24:box=1:boxcolor=black@0.5:boxborderw=10" -an -c:v libx264 -f mpegts tcp://<ip_address>:8556

It seems the Raspberry Pi 4 doesn't have sufficient resources to encode the camera feed with the overlay. I tried to reduce the incoming camera resolution to 1280 * 720, but this was still insufficient for the restreamer software to handle on the modest hardware. At this point I moved the heavy lifting over to a virtual machine on my home server and this seemed to solve the problem.

ffmpeg -loglevel debug -i tcp://0.0.0.0. -i StreamOverlay.png \-filter_complex "[0:v][1:v]overlay=0:0,drawtext=textfile='current_track.txt' :x=(w-text_w)/2:y=h-50:fontcolor=green:fontsize=24:box=1:boxcolor=black@0.5:boxborderw=10" \   -an -c:v h264 -b:v 8M -g 30 -preset veryfast -tune zerolatency -bufsize 16M -max_delay 500000 \   -x264-params keyint=30:min-keyint=15:scenecut=0 -f mpegts tcp://0.0.0.0:8554?listen

Initially, I encountered stream quality and decoding errors.
After tweaking buffer sizes, bitrate, and keyframe intervals, things began to stabilise.

Integrating Audio

Next, I focused on integrating audio into the video stream. Initially, I used a separate ffmpeg process to stream MP3 files over TCP, but I faced an issue where audio stopped after the first track ended. The ffmpeg process didn’t crash but would stall on subsequent tracks. Here’s the basic script I used:

#!/bin/bash
audio_folder="<folder where music resides>"
output_file="current_track.txt"
while true; do
  for file in "$audio_folder"/*.mp3; do
    echo "Now playing: $(basename "$file")" > "$output_file"
    cp $output_file /home/rob/$output_file
    ffmpeg -re -i "$file" -acodec copy -f mulaw tcp://0.0.0.0:8555?listen
  done
done

After switching to a local setup, with both the video and audio on the same server, I modified the overlay command to iterate through the MP3s in a folder directly.

Putting it all together


I moved the individual commands to their respective scripts and added some logic that would restart the "service" if it dropped for any reason:

It seems that the restreamer software doesn't like being on the Pi, with this in mind I bypassed that extra software entirely.

That worked, but I still had issues with audio.

#!/bin/bash

# Define the folder containing the audio files
audio_folder="/home/rob/Music"

# Define the text file where the current track info will be written
output_file="current_track.txt"

# Define the playlist file
playlist_file="playlist.txt"

while true; do
    # Generate the playlist file
    rm -f "$playlist_file"
    for file in "$audio_folder"/*.mp3; do
        echo "file '$file'" >> "$playlist_file"
    done

    # Get the first track name to display as "Now playing"
    first_track=$(basename "$(head -n 1 "$playlist_file" | sed "s/file '//g" | sed "s/'//g")")
    echo "Now playing: $first_track" > "$output_file"

    # Run ffmpeg to combine the video, overlay, and audio from the playlist
    echo "Starting ffmpeg overlay with playlist..."
    ffmpeg -loglevel level+debug -i tcp://192.168.1.54:8554 \
            -i StreamOverlay.png \
            -f concat -safe 0 -i "$playlist_file" \
            -filter_complex "[0:v][1:v]overlay=0:0,drawtext=textfile='$output_file':x=(w-text_w)/2:y=h-50:fontcolor=green:fontsize=24:box=1:boxcolor=black@0.5:boxborderw=10" \
            -c:a aac -ac 2 -b:a 128k \
            -c:v h264 -b:v 6000k -g 60 -preset veryfast -tune zerolatency \
            -bufsize 12M -max_delay 500000 -x264-params keyint=60:scenecut=0 \
            -f flv rtmp://live.twitch.tv/app/live_<stream_key>

    # Check if ffmpeg encountered an error and restart
    if [ $? -ne 0 ]; then
        echo "ffmpeg stopped. Restarting in 5 seconds..."
        sleep 5
    fi
done

This seemed to work fine for a time but then the audio would stop. I am yet to find the time to investigate.

Tidying up


I had the various scripts running in separate tmux sessions for my visibility. To make this easier, I made a script that creates the sessions and runs the respective script

#!/bin/bash

# Define script paths
camera_script="/path/to/your/camera_script.sh"
overlay_script="/path/to/your/overlay_script.sh"

# Define session names
overlay_script_session="Overlay"
camera_session="Camera"

# Start tmux session for Camera
tmux new-session -d -s "$camera_session" "bash $camera_script"
echo "Started tmux session: $camera_session"

# Start tmux session for Overlay
tmux new-session -d -s "$overlay_script_session" "bash $overlay_script"
echo "Started tmux session: $overlay_script_session"

This works great if I have to restart everything.
I'm also looking in to a way of automating the start and stop of streams based on the sunrise and sunset in my location, but for the time being I am just calculating the time in seconds between now and sunrise and adding that to the command in one line:

sleep <seconds> && sh script.sh

Timelapse Creation

During all of this, I also worked on creating a timelapse from the resultant 13-hour  off video. Using ffmpeg, I generated a 1-minute timelapse that was successfully uploaded to YouTube. The command was straightforward and effective:
ffmpeg -i input_video.mp4 -filter:v "setpts=PTS/802" -an -r 30 output_timelapse.mp4
This command sped up the video by a factor of 802 times by adjusting the presentation timestamps, producing a smooth timelapse.

Final Thoughts

This project has been a learning experience in stream handling, ffmpeg configurations, and overcoming hardware limitations. I’ve moved most of the intensive processing off the Raspberry Pi to ensure smoother streaming and a better viewer experience.
Man, formatting ffmpeg commands correctly, especially for taking multiple sources and overlaying them in the way I wanted.
While there are always more optimisations to be made, especially regarding audio stability, the progress has been rewarding. 

You can find:
The Twitch

Sunday, 22 October 2023

Add a Twitch randomised fact command

 

I recently had a conversation with a friend on stream about creating a twitch chat command that would put a random fact in to the chat.
After some poking around with Streamer.bot I came up with this.
This may not be the most efficient method for doing this, but it was something I was able to set up in a few minutes.

You essentially have a number of actions which are your facts being entered into chat and another action, which is triggered by a twitch !command, that gets a random number between 1 and the amount of facts you have and selects the corresponding action.

I thought I'd leave it here in case anyone finds a use for it.
Do you have an idea for streamer.bot that you want to turn into a reality, drop a comment and let me know!
If you haven't already, I'd suggest you also check out their discord, there's some very friendly and clever people over there. https://discord.gg/VmdKdmVya2

Add a randomised Foxy Facts chat command


go to actions tab

right-click in action list

click add



Give it a name

and you might want to add them to a group for ease of viewing later

click ok



For that action go right-click on sub-actions on the right

hover over "Twitch" and "chat"

click send message to channel



enter the fact

click ok



do that for all the quotes you have


right-click and add another action

do the same as before (add a name like “Foxy Facts”, group, click ok)

right-click in sub-action, hover over "core", "logic", click "Get Random Number"




Enter from “1” and the number of facts you have



Click OK


right click in sub-action again, hover over "core", "logic", click "If/Else"



enter the variable as "randomNumber"

enter the value as 1

click "do action" (the button will say <No Action Selected>)



scroll and select the first fact action



we need to add a sub-action for each fact we have

so, we right-click "core", "logic", then "If/Else"

enter the variable the same "randomNumber"

enter the value of 2 and so on

click "do action" (the button will say <No Action Selected>)


Once we've done all that, we need to set a trigger

click on the “Commands” tab

right-click on the list and click “Add”



Give it a name

give it a command like “!foxyfact”



if you want to set a cool-down, you can do that on the bottom right


click OK


Go back to “Actions” and click on “Foxy Facts”


right-click in “Triggers”

Hover over “Core”, “Commands” and “Command Triggered”



scroll and select the second fact action and so on



Now, whenever someone puts “!foxyfacts” (Or whatever you called it)

It will put a random foxy fact in chat.

Saturday, 13 May 2023

A falling out, A genius idea and an idiot moment.

The modem and the router had an argument in the early hours of Thursday night.

They are no longer on speaking terms, so that's why the network has  been down.

Regardless of my attempts of mediation, the two remain inconsolable.


I thought about purchasing a replacement, but I didn't really want to spend £100-200 on a new Draytek.

That's when I came up with the genius idea of making my own router, with black jack and hookers.


I started looking up OPNsense virtual appliance installation and configuration.

Thinking about the physical aspect, cables and such, my first priority was getting the internet over to the server. On the opposite side of the house. 

The current solution was trunking from the master socket up to the office, where the Draytek router resided. From there essentially a patch lead ran from that, through the loft, down into the opposite bedroom, which is my wife's art studio.

Inside the cupboard, aside from some jackets and art supplies, reside my main server, Which provide hypervisor and file services, along with my backup server (which is not entirely live) and a unifi access point.

The first thing I thought of was a spare mikrotik router.

That's when it hit me. You idiot, just use that.


Cut to three hours later and I have it configured to give my pc an IP address and also take an IP address from the crappy nowTV modem/router.

I would have rather preserved my IP reservations, but I decided I didn't have the energy to connect back to the Draytek to note down all the reservations manually, so I plugged in the rest of the network.

I added a few NAT port forwarding rules in place, but at the time of writing, that traffic is still being blocked somewhere upstream, so will require further investigation.


At least now the network has internet access again, so my wife can stream.

I still need to get port forwarding working so that I can start using "private cloud services" and soon my Minecraft mod can get back in the server when they want to.

I'm hoping as the LAN range is the same, the existing client leases will be honoured from a new DHCP server and get given the same IP. That being the case setting the reservations will be a lot easier.

Edit: They were.

Mikrotik's web GUI takes a bit of getting used, but it is starting to make sense. I suppose that's the same with any routers GUI.


Around the outside, I replaced my POE switch, this week.

I had a really old d-link web smart switch, it really wasn't happy. After the third time of waking up to find no network access, I decided to replace it.

I got a tiny 5 port Tenda unmanaged POE switch. Its smaller, quieter and cheaper to run. Yes, I lost VLANning, but the only reason I wanted that isn't a requirement any more.

It arrived and I realised my mistake. This device was POE powered. I hastened to start the returns process, but then remembered that I still had an old POE injector. 

It was only 10/100.

Ok, back to Amazon, cancel the return and grab a gigabit injector.


What I was actually hoping to sort out this weekend, aside from mowing the lawn and doing laundry, was maybe getting cabling down to the living room, or getting a spare Cisco AP installed in the dining room.

This would have been to improve bandwidth for my raspberry pi, who was trying to talk to the unifi on the other side of the house. This resulted in a, let's say, changeable experience when streaming content.


That will have to be for future Rob.

Saturday, 20 August 2022

Website and affiliates

I finally got around to completing version 1.0 of my wife's website for her art and business.

I'm not a web developer or content manager, but we went over the requirements, the main ones were a portfolio gallery and an asset rotator on the front page.

It took a while to get to grips with working on this kind of thing again, going through several iterations based on different platforms before finally landing on WordPress.

If you want, you can check them out. I've added the hyperlinks.


Once this was complete and the "customer" was happy, I decided it was time I updated my own website.

For the longest time, it was just a static page with a blurb and contact information. Honestly, it was looking pretty amateur.

The intention as with most websites is to attempt to engage with customers. I always had the idea of driving traffic from places like Twitch, YouTube, Twitter etc to a main site that had all the information in one place and showcase some of my other hobbies and services. I don't really have a particular niche, but I think I come up with or at least stumble upon some interesting stuff on occasion.

I've gone with a generic “tech services” template for now and having removed a lot of the boiler-plate stuff, I think it looks ok. There's basically no information on it presently, besides the contact information, but at least it now links out to this blog. In time, my intention will be to customise the theme for more of a tech geek style, with my central green colour scheme running through it.


Back along, my Sister created a family WhatsApp group for easily communicating updates during a difficult time. She had posted a link to a shower head on Amazon and it ended up being quite popular within the family. I believe at last count, five of us have one. I joked that she should create an affiliate link for it so that she could get a small kick back.

I thought it might be fun to make a pseudo-shop, which showcases things that I've found or purchased that I think are cool or useful enough to show. I created an Amazon affiliate account that I could link out to, so if anyone ever thinks like me and wants to purchase an item, I'll be able to see. This would primarily be for the purpose of learning a bit about something new, I really don't expect to make any money from it.

From there, I also created an affiliate account with my chosen hosting provider Eco Hosting, after chatting with a colleague in the market for a domain. So, I've been toying with CNAMES and subdomains to try and make this a little more useable and professional looking.


Lastly on the website front, I had an idea in my head for some time now for a website that has a bunch of how-to guides about things I've learned while streaming on Twitch.

Things like how to do certain things in Open Broadcast Studio, setup cameras and microphones, add transitions and more custom elements like counters using Streamer.bot.

The ultimate goal would be to have written articles outlining the steps, with an accompanying YouTube video.

The sites working title is "Streaming is Hard"

If you're reading this and have any interest, do please leave a comment to say so and if there is anything particularity you wish to see on there.

There's not really a timeline for this as such. back along I went through my current OBS configuration and noted everything down, with the intention of one day ripping it all out and doing a stream where I set it all up from scratch.

This way, I can easily split it up in to clips to add to the articles. What would be really cool and time-saving would be if I could get some sort of step recorder that I could use to create the framework of the written article. I'll have to do some research, If I remember correctly the built-in windows step record is not quite as clever as all that.


That's about it for this one. I know I say this every couple of years, but I am, once again, going to try and update this blog more frequently and I have a number of new posts in the pipe.

I'm going to try and find a balance between frequency and energy expenditure, so that I don't burn out too quickly.

Keep an eye out for the next one which will be in a few days to a week. I'm not sure which one I will choose for the next instalment, most of them being of a technical nature, outlining what I've setup or done with a particular service or piece of software, but I think they have the bones of something interesting.

Take care of yourself and keep an eye on your spoons.