The input image is run through the Ball Location Model to estimate the ball positions on the table, which is then handed to the Shot Suggestion Model. First, we obtain the dot- and ball detections, which we use to find the table lines and thus estimate a mapping from the image to a template. Then, we use the mapping to estimate the center point for the balls, resulting in the positions for the environment. The Shot Suggestion Model sends the state to the agent, suggesting the action. During training, the environment evaluates the action, and the agent receives a reward.
In the 16 ball environment, the task is to complete a full turn (all blue balls and then the black) without violating the rules, such as pocketing the cue/black ball, or not pocketing a blue ball.
To include kick and bank shots, we establish an approach to mirror the balls and pockets to a separate space on each table side. Aiming for a ball or pocket in the mirrored space is equivalent to hitting a cushion and then a ball on the real table. The cyan points indicate the aiming points.
@misc{schiøtt2025pix2pocketsshotsuggestions8ball,
title={pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild},
author={Jonas Myhre Schiøtt and Viktor Sebastian Petersen and Dimitrios P. Papadopoulos},
year={2025},
eprint={2504.12045},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.12045},
}