Landmark Tracker

Made by SeanvonB | Source

This was my final for Udacity's Computer Vision Nanodegree, which I completed in 2020. Unlike the other projects, this one doesn't really produce a standalone piece of software; rather, this notebook demonstrates a mathematical solution to the Simultaneous Localization and Mapping (SLAM) problem; or, how can an autonomous agent (i.e. robot) simultaneously explore an unknown environment and keep track of itself within that unknown? I'll give you a hint: it will involve some real Greek letter math, and maybe even some LaTeX – how exciting!

Udacity originally divided this project into three sections:

  1. Moving and Sensing
  2. Omega and Xi
  3. Implement-SLAM

Note: this project was created by Udacity in 2018 and completed by me in 2020, based on research from 2005; so, by the time you read this, there's no telling how outdated this approach may be.

1.0 Moving and Sensing

Computer vision and movement always have a degree of uncertainty to them – reality is simply too busy to account for everything while still maintaining reasonable computational performance. Udacity provided this excellent example:

Imagine a car driving uphill and downhill; the speedometer reading will likely overestimate the speed of the car going up hill and underestimate the speed of the car going down hill because it cannot perfectly account for gravity. Similarly, we cannot perfectly predict the motion of a robot. A robot is likely to slightly overshoot or undershoot a target location.

There will be a good example of this uncertainty after I define and initialize the robot class:

1.1 Creating the Robot

The next cell instantiates a copy of the robot object inside its own tiny 10x10 world. You might ask, "10x10 of what units?" – Udacity never specified, so I think we can decide. I like to imagine centimeters, so the robot can be tiny and cute. But you can imagine kilometers if you would rather live your mecha fantasy instead. Either way, this world will be sufficient for learning how the robot works, but future tests will use worlds that are at least 100x100 imaginary 2D space units.

Welcome to loaded memory, little guy! Anthropomorphising the data object is optional but recommended.

1.2 Visualizing the World

For our benefit, Udacity provided a display_world function in their helpers.py package. But, to be clear, the robot doesn't see the world as this function shows it. The robot can only see measurement_range units in any direction.

In the example below, the robot is in the middle of a 10x10 world at (x, y) = (5.0, 5.0), marked by the red O:

1.3 Moving the Robot

Moving the robot is easy with the .move() method, so I'll move the robot 1 unit right and 2 units up. Where is the robot now?

Well, it probably won't be at Robot: [x=6.00000 y=7.00000] due to the aforementioned uncertainty. But, with an assigned motion_noise of 0.2, the robot should be somewhere within 0.2 units of the desired position. That's not so bad. However, subsequent movements will only compound that uncertainty; and, pretty soon, the robot will have better luck nagivating by bumping into boundaries than by using the tracking system – not a good look for a robot who dreams of someday becoming an autonomous vehicle.

The robot thinks it's at (6, 7), but the following shows the true location:

1.4 Creating Landmarks

In this instance, "landmarks" are anything visible and stationary within the robot's world. If you're imagining a giant robot, they might be buildings; if you're imagining a tiny robot, they might be the debris on your desk. Regardless, landmarks are stationary. They'll each appear as a purple X on the grid for the purpose of visualization.

The same package that includes display_world also includes a make_landmarks function. This function generates and randomly distributes the specified number of landmarks in the world of the robot on which it is called. They're stored on the robot in r.landmarks, but that's because the robot and the world associated with it are the same object – the robot doesn't automatically know where they are and must still locate them.

The following will create those landmarks:

1.5 Sense the Landmarks

Now that there are a few landmarks for the robot to find, I needed to create the sense function, found in the robot_class.py file, for the robot to find them. The sense function must be limited by the robot's measurement_range and modified by its measurement_noise value; otherwise, it gives the robot fairly accurate information about the distance between a landmark and itself. This information is stored in an [i, dx, dy] list, where i is the landmark's index within its own list, and dx, dy are the distances.

In the following cell, the robot can only sense the second landmark, because the others are outside measurement_range:

1.6 Data

Most of what I've done so far feels like "data"... but data will be an array on the robot object that stores specific pairs of lists: movements, and measurements. Solving the problem of SLAM will be when the robot can successfully do what I've done so far but backwards: it'll sense the surroundings, move, and repeat. If the robot can reconstruct the world, it will be a success. That's where the robot's data comes in. Over the next section, I'll use this array to form constraint matrics and vectors and implement SLAM.

Here is data being constructed, which you'll notice happens over a series of time steps:

2.0 Omega and Xi

Note: apparently, the robot's location ought to be called a pose ("and pose and pose and pose"), while location refers to landmark locations.

In Graph SLAM, the Greek letter Ω or omega represents a 2D matrix of robot poses ([Pxi, Pyi]) and landmarks ([Lxi, Lyi]), while the letter ξ or xi represents a 1D vector of relational data.

For the uninitiated, these objects look like this:

Now, here's another example that depicts an initial state with three poses, constraints provided:

But there are still plenty of unknown x values. Graph SLAM represents these as μ or mu – yup, another Greek letter – which equals Ω-1 * xi.

The following cell demonstrates this assertion if you don't wanna just be cool and believe me:

2.1 Motion Constraints and Landmarks

But I can't just make up my own constraints; so, from now on, I'll have to calculate them myself. Let's wipe the slate:

This example now also includes landmarks. If the pose displacement from x0 to x1 has a dx displacement of 5, then that's the motion constraint that relates x0 to x1.

But the motion constraint that relates the pose at x0 to x1 by a value of 5 will affect the rest of the matrix, adding values for everything that relates to x0 and x1.

Now, what about y? Yeah, I'll have to do this again for the y-dimension. See you in Part 3!

3.0 Implement SLAM

To reiterate what I said in the introduction, SLAM is the problem of how an autonomous agent (i.e. robot) can simultaneously explore an unknown environment while also keeping track of itself within that unfamiliar space. It's a chicken-and-egg scenario, but one that needs to be solved for fully-autonomous vehicles to become a reality. The solution for this problem will be a system for both localizing the robot and generating a map of its environment as the robot both moves and senses in real-time – simultaneously is the key word, here, because neither of the stated objectives can depend on the other already being complete.

That said, this problem has both already been solved and remains an active area of research, as new solutions bring increasingly signficant advantages. I will not be creating a fresh solution; I am simply implementing a solution known as Graph SLAM, which was an early solution created by Udacity founder and my Computer Vision instructor, Sebastian Thrun.

On the surface, the assignment sounds simple: "define a function, slam, which takes in six parameters as input and returns the vector mu."

No problem, right? Let's get into it...

3.1 Create the World

And in the 23nd cell, Sean said, "Let there be a world of size 100x100 unspecified units with 5 featureless landmarks"; and it was so.

3.2 The make_data Function

The following is documentation on the make_data function provided by Udacity:

The function above, make_data, takes in so many world and robot motion/sensor parameters because it is responsible for:

  1. Instantiating a robot (using the robot class)
  2. Creating a grid world with landmarks in it

This function also prints out the true location of landmarks and the final robot location, which you should refer back to when you test your implementation of SLAM.

The data this returns is an array that holds information about robot sensor measurements and robot motion (dx, dy) that is collected over a number of time steps, N. You will have to use only these readings about motion and measurements to track a robot over time and find the determine the location of the landmarks using SLAM. We only print out the true landmark locations for comparison, later.

In data the measurement and motion data can be accessed from the first and second index in the columns of the data array. See the following code for an example, where i is the time step:

measurement = data[i][0]
  motion = data[i][1]

3.3 Initialize Omega and Xi

Solving this problem in the first place must have been difficult. But, in this project, the most difficult part will probably be maintaining and updating the constraints, omega matrix and xi vector, without anyone getting flipped around or corrupted. There are a couple different ways to approach tackle this, but I chose the following:

3.4 Test the Initialization

Ultimately, I'll need a process for testing and visualizing the results of my SLAM implementation, so why not test my initialize_contraints with Udacity's suggested solution, which uses seaborn for data visualization? At least I'll know whether or not the pipe leaks.

3.5 Implement Graph SLAM

Per the rubric:

In addition to data, your slam function will also take in:

  • N - The number of time steps that a robot will be moving and sensing
  • num_landmarks - The number of landmarks in the world
  • world_size - The size (w/h) of your world
  • motion_noise - The noise associated with motion; the update confidence for motion should be 1.0/motion_noise
  • measurement_noise - The noise associated with measurement/sensing; the update weight for measurement should be 1.0/measurement_noise

Plus, the only real goal is an accurate mu, calculated like so: $\mu = \Omega^{-1}\xi$

With these requirements in mind, I'm jumping into implementation with the following cell:

3.6 Helper Functions

Udacity then provided the following helper functions to display the estimated pose and landmark locations that my implementation production:

3.7 Run SLAM

With the above helper functions, and I run the slam function that I defined and see the results as two lists:

  1. Estimated poses, a list of (x, y) pairs that is exactly N in length since this is how many motions your robot has taken. The very first pose should be the center of your world, i.e. [50.000, 50.000] for a world that is 100.0 in square size.
  2. Estimated landmarks, a list of landmark positions (x, y) that is exactly num_landmarks in length.

I'll know how I did by comparing the following values to the true values printed above, factoring in noise.

Here goes:

3.8 Visualize the World

Finally, I can visualize the world printed above using the display_world code from the helpers.py file provided by Udacity:

3.9 Further Testing

Not bad! Although, in the final run, I'm not sure what caused the robot to finish out of bounds – given how its behavior is defined, that shouldn't be possible, so I wonder whether this might be a glitch in visualization rather than in the robot itself. Either way, the numbers look good, at least!

Udacity provided a couple more test sets for confirmation, which I'll run and leave for you to examine:

Thanks for reading!

Made by SeanvonB | Source