Sony received 50 granted patents in H1 2026 across 12 categories: AI & Machine Learning (28), Audio (3), Game Engines (2), Platforms (1), VR & AR (2), Hardware (5), Graphics (2), UI/UX (1), Monetization (1), Esports (1), Cloud Gaming (3), and Streaming (1).
The AI & Machine Learning category dominates with technologies that span player matchmaking based on voice patterns and personality detection, real-time content filtering and personalization systems, automated game asset generation from photos and videos, and dynamic NPC behavior adaptation. Audio patents cover intelligent mixing that separates music vocals during dialogue, spatial voice recognition for multi-player commands, and indoor positioning using stereo analysis. Cloud_gaming and Streaming patents address bandwidth optimization through priority-based encoding and spare bandwidth utilization, while Hardware and VR & AR patents include accessible controllers with repositionable contact pads, emotion-sensing environmental adjustments, and hybrid motion tracking that switches between camera and IMU sensors.
The 28 AI and machine learning patents filed this period span nearly every layer of the gaming experience, from how players communicate to how games are built. Several patents focus on making player input smarter: one system uses game state data and machine learning inference to create time windows where voice and gesture commands are contextually appropriate, so the game only listens when a player actually intends to give a command rather than during casual conversation or incidental movement. A related patent handles multi-player voice recognition by combining microphone array spatial data with individual voice profiles, allowing multiple players to issue hands-free commands simultaneously without any manual switching. Another takes gesture recognition further by translating sign language and physical movements into chat messages that are automatically adjusted in tone and content to match the current game situation. On the matchmaking side, 1 patent analyzes voice patterns to infer personality and mood, pairing players based on social compatibility rather than skill alone, while a second predicts how long a player's session will last based on their history, letting teammates align on time commitments before a match begins. The personalization patents are particularly dense: one uses behavioral signals like controller inputs and time spent in menus to build a continuous preference profile across visual, Audio, and gameplay dimensions without requiring the player to fill out any surveys, while another cross-references NPC interaction patterns across multiple sessions to tailor dialogue to how each individual player naturally engages with characters. A separate system applies a similar approach to music, generating personalized soundtracks in real time that evolve based on player behavior and game state rather than cycling through pre-recorded loops. Content filtering also gets personalized treatment, with a patent describing a neural network system that detects objectionable content and obscures it according to each user's own specified sensitivities, including the option to replace content using deepfake techniques. A companion system filters in-game messages not for profanity but for contextual relevance, rewriting communications so they make sense to the recipient based on where they are in the game at that moment. Several patents address AI-assisted game development and asset creation. One uses a large language model to generate interconnected JSON schemas and corresponding JavaScript code for game objects simultaneously, reducing the manual work of wiring data structures together. Another converts 2D vector Graphics into 3D game assets by feeding geometric metadata from vector files into machine learning models alongside raster data, giving the model richer structural information to work from. A third system takes real-world photos or video and processes them into usable in-game 3D assets, while a separate patent generates avatars from selfies that match the art style of whatever game the player is entering, whether realistic, cartoon, or anime, without any manual artist involvement. A more flexible avatar creation patent combines body movement captured from standard video with natural language descriptions to build fully rigged custom characters. For accessibility and narration, a patent describes an automated system that detects gaps in game dialogue, generates contextually appropriate Audio descriptions using a language model and voice synthesis, and inserts them with timing and emotional tone matched to the surrounding content. For legacy games that lack modern data APIs, another system analyzes raw pixel output, Audio, and controller inputs to infer and generate structured game context metadata, enabling Platforms-level features to function without developer cooperation or source code access. Several patents address AI behavior in ways that involve human feedback loops. One enables game spectators to approve or disapprove of AI gameplay moves in real time, using those reactions as reinforcement learning signals to train the AI agent, effectively applying crowd-sourced human feedback to game bot training. A related patent turns viewer comments and votes on gameplay clips into labeled training data for an AI that can automatically commentate live matches, with a rewards system that incentivizes participation without requiring users to think of themselves as annotators. Another creates an AI Streaming personality that adjusts its in-game strategy based on audience reactions, optimizing for viewer engagement rather than winning. On the guidance and recommendation side, a ghost player patent describes a trained AI model that watches live gameplay and generates real-time control inputs for a visible guide character that shows players exactly how to navigate challenges, with a natural language interface for asking the ghost specific questions. A crowd-sourced help system aggregates anonymized player behavior across many sessions to identify which strategies, items, and skills have the highest success rates for specific challenges, then surfaces personalized recommendations based on the individual player's current loadout and difficulty settings. A related item recommendation engine draws on community data and real-time game state to suggest relevant weapons and upgrades across multiple titles. Biometric data appears in several patents as well. One system captures physiological signals during gameplay to let spectators see when a player's stress or excitement peaked, enables emotional bookmarking of high-intensity moments, and feeds that data into NPC behavior systems that adjust difficulty or character responses based on detected player state. An emotion-sensing VR patent takes a similar approach, using inferred emotional state as a passive environmental control input that adjusts the virtual world without requiring the player to press anything. For content Streaming to smaller screens, a computer vision system identifies which game elements are gameplay-critical and prioritizes their clarity during the downscaling process rather than applying uniform resolution reduction. A video analysis patent applies machine learning to predict audience engagement for game trailers before they are released, allowing studios to adjust promotional materials based on learned patterns. Finally, a highlight generation patent combines a personalized model trained on each user's play style with an image composition AI to automatically produce shareable collage highlights from gaming sessions, with an option to mint them as NFTs. Rounding out the category, an adaptive content modification patent continuously analyzes gameplay signals to adjust dialogue, Audio, visuals, and interface elements in real time based on accessibility needs or contextual triggers, and a separate patent auto-generates game context metadata from unstructured inputs to support Platforms features on games that predate modern data-sharing standards.
The 3 Audio patents each address a different layer of sound management in gaming environments. One describes a real-time source separation system that detects when in-game dialogue or voice chat is about to overlap with music that contains vocals, then isolates and modifies those vocal frequency ranges on the fly to prevent the two from clashing, all without any pre-processing of the music assets. A second patent goes further in treating Audio as an input rather than just an output: it extracts sentiment and rhythmic characteristics from spoken voice and ambient sound, then uses those qualities to influence NPC behavior and game interactions continuously during play, so the emotional tone of your voice or the music in the room actively shapes what happens in the game. The third patent solves a spatial positioning problem using only 2 standard stereo speakers and a microphone, transmitting frequency-shifted signals in the 16 to 24 kilohertz range (above normal hearing) and analyzing arrival time differences and power ratios through a trained neural network to determine a user's 2D position in a room, enabling adaptive sound field control without any additional Hardware.
Game development workflows are the focus of the 2 Game Engines patents. One describes a pipeline that takes photographs or video footage of real-world subjects and converts them into game-ready 3D assets through automated or semi-automated processing, reducing the manual labor traditionally involved in building art for games. The other operates on the back end after a game is already in players' hands, using server-side automated capture and analysis of gameplay footage across large numbers of players to identify bugs and areas of unusual difficulty for developers to address, while simultaneously flagging exceptional player performances to be packaged into shareable highlight content.
Among the Platforms patents, Sony received 1 that addresses ghost character functionality at the operating system level rather than within individual games. The system extracts recorded gameplay video and composites a translucent ghost overlay onto a live session, allowing players to race or compete against historical runs of any game without that game needing to natively support ghost playback. Because it operates through video extraction and compositing rather than by reading game engine control data, the feature works universally across titles regardless of whether the developer built it in.
The 2 VR and AR patents address tracking reliability and spatial anchoring. One describes an intelligent mode-switching mechanism for VR and AR controllers that monitors the reliability of camera-based SLAM tracking and seamlessly transitions to inertial measurement unit data when visual tracking degrades, preventing the abrupt position corrections that currently occur when a controller moves out of camera view. The other patent focuses on how mixed-reality systems establish reference points: rather than scanning an entire room to find anchors, it identifies the gaming console itself as the primary spatial reference, then spawns virtual content anchored to that stable, known object so that AR and VR overlays persist reliably relative to the physical gaming setup.
Five Hardware patents cover a range of accessibility and input customization solutions. One converts Braille character encodings into haptic vibration patterns on a standard game controller, allowing visually impaired players to read subtitles through tactile feedback, and it adapts the speed of video playback to match the pace at which the user reads each character. A second patent describes a wearable glove controller with contact pads that can be physically repositioned anywhere on the garment, with connectors that handle both mechanical attachment and electrical connectivity simultaneously, so players can place buttons wherever their hands can comfortably reach them. A third patent addresses analog stick drift by automatically detecting when a controller is idle and recalibrating the dead zone geometry during those windows, compensating for drift at the system level rather than requiring each game to handle it independently, and it can produce non-circular dead zones to target specific drift directions without sacrificing sensitivity elsewhere. The fourth patent uses optical sensors embedded in controller surfaces to detect a finger's approach before it makes contact, and allows the same surface to be configured as a d-pad, a joystick, or discrete buttons through software, with layouts adjustable to fit different hand sizes. The fifth replaces the traditional rigid controller form factor entirely with a flat surface on which players draw their own button layouts using conductive ink, and includes programmable keys that can be remapped to reduce repetitive strain or accommodate specific accessibility needs.
The 2 Graphics patents each address a different aspect of visual representation. One describes a system that progressively deforms a character's mesh geometry based on how frequently a player performs certain actions, so repeated behaviors visibly alter the avatar's appearance over time and other players can read a character's play style from its physical form, with deformations that can also be reversed if habits change. The other patent enables 2D to 3D conversion for video and games by using a trained object identification model to infer depth relationships purely from rendered 2D output, without requiring access to the original rendering pipeline's depth buffers, which makes the technique applicable to legacy content and streamed video where internal engine data is unavailable.
The single UI and UX patent describes a system for player-created annotations that are anchored to game state data rather than to timestamps in a recording. Because each annotation is tied to specific conditions in the game rather than a moment in time, a tip or walkthrough overlay appears precisely when a player reaches the relevant situation, regardless of how they arrived there. The system also supports granular sharing controls, social distribution features, and machine-learning-assisted generation of annotations automatically.
Sony's 1 Monetization patent describes a blockchain architecture designed specifically for in-game item ownership that separates the record of who owns an item from the description of what that item is, distributing those functions across multiple blockchains. This structure allows players to trade and transfer items across Platformss without a central publisher controlling the market, and it includes a weighted modulo system for determining item types at the moment of creation, providing a decentralized mechanism for managing drop and loot economies.
The single Esports patent describes an automated spectator camera system that reads real-time game data, including player positions, actions, and in-game events, and uses that information to calculate optimal viewing angles algorithmically. The system removes the need for a human broadcast director to manually control camera placement during competitive events, generating coverage decisions from aggregated gameplay parameters rather than individual human judgment.
The 3 cloud gaming patents each target a different inefficiency in how game streams are delivered. The first describes a split-client architecture that separates rendering and Streaming responsibilities between local and remote components, allowing the Platforms operator to update the interface and functionality of the service entirely on the server side without pushing software updates to the player's device. The second applies variable quality encoding at the object and region level within individual frames, allocating higher frame rates and resolution to gameplay-critical elements while reducing quality for less important areas, rather than encoding the entire frame uniformly. The third takes advantage of bandwidth that would otherwise go unused during paused or static scenes, sending progressive image refinements over a separate secondary channel so quality improves opportunistically without disrupting or reconfiguring the primary video stream.
The single Streaming patent describes a highlight detection system that identifies exciting moments in gameplay streams by collecting heart rate and other physiological data from both players and spectators, then clustering those signals to find peaks in collective engagement. Rather than relying on manual bookmarking or game-state triggers, the system uses aggregated biometric data as its primary signal, and it can segment that data by audience demoGraphics such as location, age, or skill level to provide additional context about which moments resonated with which groups.
All data sourced from USPTO patent filings. Google Patents may take several weeks to index recent publications. If a link is unavailable, search for the patent number at USPTO Patent Public Search.