计算机视觉计算相机空间位置

前言

这篇博客源自于机器人课的一个作业,所以是全英文的文档。使用了aruco库中的函数和marker来实际评估摄像头在空间中的位置。

Question 1

  • What is the accuracy of your estimation?

Answer 1

theoretical_gsc =

1
2
3
4
5
6
7
1.0000  0        0        0

0 1.0000 0 0

0 0 1.0000 -0.6920

0 0 0 1.0000

computed_gsc =

1
2
3
4
5
6
7
 0.9870  -0.0178  0.0866   0.0315

0.0065 0.9967 0.0061 0.0118

-0.0856 -0.0067 0.9840 -0.9053

0 0 0 1.0000

accuracy =

1
2
3
4
5
6
7
 0.0130  0.0178  -0.0866  -0.0315

-0.0065 0.0033 -0.0061 -0.0118

0.0856 0.0067 0.0160 0.2133

0 0 0 0

As the data above, accuracy of the estimation is equal to theoretical gsc minus computed gsc.

Question 2

  • Do you get better accuracy with more markers in step 7? Compare the localization results derived from an image with only 1 marker, 2 markers and 3 markers by averaging the exponential coordinates. Justify with experiment.

Answer 2

computed_gsc1 =

1
2
3
4
5
6
7
 0.9978    -0.0190    0.0634    0.0164

0.0321 0.9767 -0.2120 0.2187

-0.0579 0.2136 0.9752 -1.0359

0 0 0 1.0000

accuracy1 =

1
2
3
4
5
6
7
 0.0022    0.0190   -0.0634   -0.0164

-0.0321 0.0233 0.2120 -0.2187

0.0579 -0.2136 0.0248 0.3439

0 0 0 0

computed_gsc2 =

1
2
3
4
5
6
7
 0.9870   -0.0178    0.0866    0.0315

0.0065 0.9967 0.0061 0.0118

-0.0856 -0.0067 0.9840 -0.9053

0 0 0 1.0000

accuracy2 =

1
2
3
4
5
6
7
 0.0130    0.0178   -0.0866   -0.0315

-0.0065 0.0033 -0.0061 -0.0118

0.0856 0.0067 0.0160 0.2133

0 0 0 0

computed_gsc3 =

1
2
3
4
5
6
7
 0.9918   -0.0150    0.0853    0.0195

0.0086 0.9954 0.0060 0.0076

-0.0851 -0.0061 0.9875 -0.9382

0 0 0 1.0000

accuracy3 =

1
2
3
4
5
6
7
 0.0082    0.0150   -0.0853   -0.0195

-0.0086 0.0046 -0.0060 -0.0076

0.0851 0.0061 0.0125 0.2462

0 0 0 0

When a marker is attached to another plane in three dimensions, the accuracy of the elements in the matrix generated by the computer program is improved. This precise degree of improvement is precisely the same as the matrix generated by multiple marker programs in the same 3D plane.

Question 3

  • In step 7, do you get better accuracy if you combine the results from multiple photos taken at the same position?

Answer 3

computed_gsc1 =

1
2
3
4
5
6
7
 0.9856   -0.0194    0.1244    0.0006

0.0109 0.9984 0.0502 -0.0390

-0.1250 -0.0494 0.9843 -0.8955

0 0 0 1.0000

computed_gsc2 =

1
2
3
4
5
6
7
 0.9857   -0.0152    0.1298   -0.0059

0.0064 0.9985 0.0442 -0.0317

-0.1301 -0.0436 0.9844 -0.8949

0 0 0 1.0000

computed_gsc3 =

1
2
3
4
5
6
7
 0.9863   -0.0164    0.1275   -0.0044

0.0081 0.9982 0.0584 -0.0475

-0.1281 -0.0577 0.9847 -0.8919

0 0 0 1.0000

computed_gsc =

1
2
3
4
5
6
7
 0.9859   -0.0170    0.1272   -0.0032

0.0085 0.9984 0.0509 -0.0394

-0.1277 -0.0503 0.9845 -0.8941

0 0 0 1.0000

accuracy =

1
2
3
4
5
6
7
 0.0141    0.0170   -0.1272    0.0032

-0.0085 0.0016 -0.0509 0.0394

0.1277 0.0503 0.0155 0.2021

0 0 0 0

When combining two images, the accuracy was not significantly increased. It may be because the difference between the two images is small. When we introduced the third image, the accuracy of some elements in the matrix did improve. It may be because we introduced more images, which balanced the accuracy problems caused by various problems. By analogy, we concluded that keeping the camera’s spatial position constant, combined with more images can be an exact improvement of certain elements in the matrix.

Question 4

  • Assume you are given a mobile robot with camera on board in a relatively large environment (i.e. you may not capture marker from every position of the robot). One possible option, you can use the same technique but placing as many markers as possible so that you will detect at least one marker from any instance. However, this will come at higher cost. How would you solve this problem? How you will optimize the usage of markers? Elaborate the strategy.

Answer 4

We have summarized three methods through multiple measurements. The first is to attach markers in all three three-dimensional planes, so that the z-axis detected by the program will be more accurate. The second method is to keep the position of the camera from multiple angles while maintaining the previous method. The third method is to increase the photo pixels. When the marker is far away from the camera, it will be difficult to recognize or the accuracy of the recognition will decrease. Increasing the camera pixels can improve the recognition accuracy.

The difficulties we faced & The strategies we adopted

  1. At the time of encoding, we found that the functions we used in the opencv.aruco class were packaged, which meant that we could not understand the principles of these functions. Therefore, we have to read the documentation again and again to try to fully understand these features.
  2. Because these functions are written in C++ in tutorials, we have to translate them into python. And the features between C++ and Python raised some issues. For example, python cannot use undefined parameters. Therefore, we have to test each function in Python.
  3. The code in the tutorials is for “Charucoboard” and we need “Markers”. In order to achieve this goal, we must change some parameters. For example, change the function “ estimatePoseCharucoBoard” to “ estimatePoseSingleMarkers”. These two functions have different kinds of parameters.
  4. In order to find the spatial frame, we need to display origin[0 0 0] in the camera, which means we need to draw three axes, representing the x, y, and z axes. The problem is that the parameter used by the “estimatePoseSingleMarkers” function is of type numpy.ndarray, we have to create a matrix for that type, otherwise the program will declare an error.
  5. The program cannot recognize multiple markers at first, until we loop the function “ drawAxis”. After modifying the code, the issue has been resolved.
  6. I can’t distinguish the “marker” IDs, so I use “cv2.putText” function to print the ID in the output image.
  7. Light is an important factor. Accuracy errors are caused by differences in light between calibration and estimation. When estimating, I will try to imitate the same lighting environment.
  8. We use a Macbook camera, but the camera is already embedded in the laptop. So even we made the camera parallel to the space frame, it is hard to measure the exact distance from the camera to the space frame.
  9. We use the “Photo Booth” app to capture images. We found that the size of the image output using the “cv2.VideoCapture” function is different from the output image. Therefore, the input images we use are all taken by the same application.
  10. We don’t have a tape measure, so we made it ourselves. Therefore the distence may not be very accurate. And we also don’t have protractors. Therefore, we can only make the camera perpendicular to the space frame.

“test.py”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import pdb
import time
import numpy as np
import cv2
import glob
from calculate import *
from scipy.linalg import expm

import cv2.aruco as aruco

# 打开摄像头
cap = cv2.imread("../images/image2.jpg")


# 以下三个函数功能为读取FileStorge对象文档中的不同格式数据
def read_node_real(reader,name):
node = reader.getNode(name)
return node.real()

def read_node_string( reader, name ):
node = reader.getNode( name )
return node.string()

def read_node_matrix( reader, name ):
node = reader.getNode( name )
return node.mat()

# read defaultConfig.xml presented to
# opencv_interactive-calibration
# 使用FileStorge模块持久化保存数据
# 新建一个FileStorge对象
config_reader = cv2.FileStorage()
config_reader.open("defaultConfig.xml",cv2.FileStorage_READ)

# 创建一个cv2.aruco_DetectorParameters类
# 从defaultConfig文件中读取预置信息
aruco_parameters = aruco.DetectorParameters_create()
# name of special dictionary, which has been used for generation of chAruco pattern
aruco_dict_num = int(read_node_real(config_reader,"charuco_dict"))
# 获取指定的字典
aruco_dict = aruco.Dictionary_get(aruco_dict_num)
# size of square on chAruco board (in pixels)
charuco_square_length = int(read_node_real( config_reader, "charuco_square_lenght" ))
# size of Aruco markers on chAruco board
charuco_marker_size = int(read_node_real( config_reader, "charuco_marker_size" ))
# 释放读取链接
config_reader.release()

# TODO clean this up and use config parameters?
# look up definition o charuco_square_lenght, charuco_marker_size
# 5 was width (x) passed to interactive-calibration
# 7 was height (y) passed to interactive-calibration
# .034 is square_length (meters)
# .02 is marker_size (meters)

# read the cameraParameters.xml file generated by
# opencv_interactive-calibration
camera_reader = cv2.FileStorage()
camera_reader.open("cameraParameters.xml",cv2.FileStorage_READ)
camera_matrix = read_node_matrix( camera_reader, "cameraMatrix" )
dist_coeffs = read_node_matrix( camera_reader, "dist_coeffs" )
camera_reader.release()

# Read frame from Camera
# convert frame to grayscale
#ret, frame = imread(cap)
gray = cv2.cvtColor(cap, cv2.COLOR_BGR2GRAY)

# 在已经转化为灰度图的输入图像中进行检测,检测输入图像中的Marker,
# 这个Marker应该是在marker dictionary中的,通过参数中指定marker dictionary和其参数信息
corners, ids, rejectedImgPoints = aruco.detectMarkers(gray, aruco_dict, parameters=aruco_parameters) # 调试成功


#if(ids is not None):
#gray = aruco.drawDetectedMarkers(gray, corners)
#ret, ch_corners, ch_ids = aruco.interpolateCornersCharuco(corners, ids, gray, charuco_board)

if(cap is not None):
rvec,tvec, retval = aruco.estimatePoseSingleMarkers(corners,0.062, camera_matrix, dist_coeffs)
for i in range(len(rvec)):
frame = aruco.drawAxis(cap,camera_matrix,dist_coeffs,rvec[i],tvec[i],0.03)
font=cv2.FONT_HERSHEY_SIMPLEX
frame=cv2.putText(frame,str(ids[i][0]),tuple(corners[i][0][0]),font,1.2,(0,0,255),3)
print(ids[i])
g = find_g(rvec[i],tvec[i])
print(g)
print()
while(True):
cv2.imshow('frame', frame)
if( cv2.waitKey(100)==113):
break

cap.release()
cv2.destroyAllWindows()

“image1.m”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
gsa = [1 0 0 0.051; 0 -1 0 0.139; 0 0 -1 0; 0 0 0 1];
gsb = [0 0 -1 0.155; -1 0 0 0.024; 0 1 0 -0.356; 0 0 0 1];
gsd = [1 0 0 -0.149; 0 -1 0 0.133; 0 0 -1 0; 0 0 0 1];
gsc = [1 0 0 0; 0 1 0 0; 0 0 1 -0.692; 0 0 0 1]

%% Question 1
gca = [[ 0.99854884 -0.02330514 -0.04854972 -0.02551754]
[-0.02677398 -0.99704017 -0.07206973 0.14063053]
[-0.04672643 0.07326501 -0.99621729 1.02275981]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.01021082 -0.21982584 -0.97548569 0.12012036]
[-0.99629306 -0.08556718 0.00885396 0.03596653]
[-0.08541589 0.97177922 -0.21988467 0.4696211 ]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
computed_gsc = (gsc_a + gsc_b)/2
accuracy = gsc - computed_gsc

%% Question 2
% one marker
gca = [[ 0.99780721 -0.03211503 0.05787395 -0.02799517]
[-0.01904964 -0.97674173 -0.2135713 0.14271754]
[ 0.06338675 0.21200051 -0.97521172 1.02927619]
[ 0. 0. 0. 1. ]];
gsc_a = gsa * gca^-1;
computed_gsc1 = gsc_a
accuracy1 = gsc - computed_gsc1

% two markers
gca = [[ 0.99854884 -0.02330514 -0.04854972 -0.02551754]
[-0.02677398 -0.99704017 -0.07206973 0.14063053]
[-0.04672643 0.07326501 -0.99621729 1.02275981]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.01021082 -0.21982584 -0.97548569 0.12012036]
[-0.99629306 -0.08556718 0.00885396 0.03596653]
[-0.08541589 0.97177922 -0.21988467 0.4696211 ]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
computed_gsc2 = (gsc_a + gsc_b)/2
accuracy2 = gsc - computed_gsc2

% three markers
gca = [[ 0.99908683 -0.02579108 0.03406361 -0.02550511]
[-0.02784908 -0.99772514 0.06139205 0.14009828]
[ 0.03240276 -0.06228463 -0.9975323 1.01798917]
[ 0. 0. 0. 1. ]];
gcd = [[ 0.99988767 -0.01394673 0.00549023 -0.22912949]
[-0.01314839 -0.99201848 -0.12540515 0.1297706 ]
[ 0.00719541 0.12531888 -0.99209042 1.01415894]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.01382854 -0.2158119 -0.97633703 0.12000649]
[-0.99660632 -0.08221543 0.00405747 0.03598915]
[-0.08114561 0.97296755 -0.21621642 0.46882808]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
gsc_d = gsd * gcd^-1;
computed_gsc3 = (gsc_a + gsc_b + gsc_d)/3
accuracy3 = gsc - computed_gsc3

%% Question 3
% first picture
gca = [[ 0.99959482 -0.02535118 0.01294272 -0.02845354]
[-0.02573156 -0.99921498 0.03012158 0.14133341]
[ 0.01216894 -0.03044241 -0.99946244 1.01763545]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.00352658 -0.23696025 -0.97151294 0.11782594]
[-0.99754813 -0.06873821 0.01314475 0.03733213]
[-0.06989484 0.96908457 -0.23662167 0.46667682]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
computed_gsc1 = (gsc_a + gsc_b)/2

% second picture
gca = [[ 0.99951179 -0.01861803 0.02509074 -0.02849498]
[-0.01901059 -0.99969914 0.01549919 0.14049457]
[ 0.02479463 -0.01596861 -0.99956502 1.01300827]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.00576316 -0.23507061 -0.97196121 0.11774883]
[-0.99735577 -0.07176713 0.01144328 0.03712797]
[-0.07244485 0.96932517 -0.23486264 0.46739323]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
computed_gsc2 = (gsc_a + gsc_b)/2

% third picture
gca = [[ 0.99944167 -0.02066014 0.02625849 -0.02849319]
[-0.02195326 -0.99851036 0.04995111 0.14034252]
[ 0.02518738 -0.05049968 -0.99840642 1.01302279]
[ 0. 0. 0. 1. ]];
gcb = [[ 0.00447023 -0.23001903 -0.97317586 0.11738079]
[-0.99779231 -0.06551089 0.01090079 0.037154 ]
[-0.06626101 0.97097866 -0.22980407 0.46557739]
[ 0. 0. 0. 1. ]];

gsc_a = gsa * gca^-1;
gsc_b = gsb * gcb^-1;
computed_gsc3 = (gsc_a + gsc_b)/2


computed_gsc = (computed_gsc1 + computed_gsc2 + computed_gsc3)/3
accuracy = gsc - computed_gsc

“calculate.py”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
from scipy.linalg import expm

def norm(rvec):
result = np.linalg.norm(rvec)
return result

def find_g(rvec,tvec):
rvec = rvec.tolist()[0]
tvec = tvec.tolist()[0]
theta = np.linalg.norm(rvec)
omega_hat = [[0,rvec[2],-rvec[1]],[-rvec[2],0,rvec[0]],[rvec[1],-rvec[0],0]]
omega_hat = np.matrix(omega_hat)
rotation_matrix = expm(omega_hat)
transfer_matrix = np.vstack((rotation_matrix,tvec))
bottom = np.array([[0,0,0,1]])
g = np.insert(transfer_matrix,3,values=bottom,axis=1)
return g.T

“cameraParameters.xml”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<?xml version="1.0"?>
<opencv_storage>
<calibrationDate>"Sun Nov 3 19:03:29 2019"</calibrationDate>
<framesCount>28</framesCount>
<cameraResolution>
1280 720</cameraResolution>
<cameraMatrix type_id="opencv-matrix">
<rows>3</rows>
<cols>3</cols>
<dt>d</dt>
<data>
1.3995684115870210e+03 0. 6.4042196433282515e+02 0.
1.3995684115870210e+03 3.2833469737463264e+02 0. 0. 1.</data></cameraMatrix>
<cameraMatrix_std_dev type_id="opencv-matrix">
<rows>4</rows>
<cols>1</cols>
<dt>d</dt>
<data>
0. 6.8886840740860379e+01 8.2270976171403483e+00
1.1045163357904878e+01</data></cameraMatrix_std_dev>
<dist_coeffs type_id="opencv-matrix">
<rows>1</rows>
<cols>5</cols>
<dt>d</dt>
<data>
-6.2986200034727241e-02 -7.2876457858523658e-01
-1.9906188024616939e-02 5.1892453909485681e-03
-1.8703042483566668e-01</data></dist_coeffs>
<dist_coeffs_std_dev type_id="opencv-matrix">
<rows>5</rows>
<cols>1</cols>
<dt>d</dt>
<data>
4.8185464427045591e-02 3.8077815287665395e-01 2.7947511854687353e-03
2.2180924065219663e-03 9.8407816225896971e-01</data></dist_coeffs_std_dev>
<avg_reprojection_error>9.6218256866092677e-01</avg_reprojection_error>
</opencv_storage>

“defaultConfig.xml”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0"?>
<opencv_storage>
<charuco_dict>10</charuco_dict>
<charuco_square_lenght>200</charuco_square_lenght>
<charuco_marker_size>100</charuco_marker_size>
<calibration_step>1</calibration_step>
<max_frames_num>30</max_frames_num>
<min_frames_num>10</min_frames_num>
<solver_eps>1e-7</solver_eps>
<solver_max_iters>30</solver_max_iters>
<fast_solver>0</fast_solver>
<frame_filter_conv_param>0.1</frame_filter_conv_param>
<camera_resolution>1280 720</camera_resolution>
</opencv_storage>