Abstract—Simultaneous Localization and Mapping (SLAM), as the core module of autonomous driving and high-precision map production, has developed rapidly in recent years. SLAM technology refers to the mobile carrier only through its own sensors to complete real-time self-positioning and map construction of the perceived environment. The surveying and mapping method based on total station is inefficient, and the surveying method based on GPS is easy to be interfered by the environment, so it cannot be carried out in underground space. It can be seen that the traditional surveying and mapping model cannot meet the needs of digital mine and intelligent mine, mine and underground space perception and three-dimensional expression, underground positioning and intelligent mining and other fields. SLAM technology can not rely on GPS signal, in the occlusive environment for high-precision positioning and mapping, can effectively carry out the mine and underground spatial positioning and 3D model acquisition, is a favorable means to realize intelligent mapping. SLAM technology will play a great role in intelligent mapping, mapping equipment and technology, geospatial information technology and future mining, digital mine and intelligent mine, mine and underground space perception and three-dimensional expression, underground positioning and intelligent mining and other fields. The SLAM scheme based on multi-source fusion such as vision and lidar is mainly based on filtering and optimization, both of which are essentially the maximum posterior estimation of the state variable. experiments show that both fusion methods can achieve good positioning accuracy, but the fusion method of graph optimization is easier to add the constraints of loopback detection than the filtering method, and the method based on graph optimization is easier to realize the plugging and unplugging of different sensors. The shortcomings of graph optimization compared with filtering are reflected in the iterative optimization solution process of the back-end, which requires more computing power and is not very friendly to embedded platforms, but with the development of computer hardware, computing power will no longer be the bottleneck restricting graph optimization, which is one of the reasons why graph optimization is selected for multi-source fusion in this paper. this paper proposes a factorial graph framework to fuse binocular vision, Lidar, and IMU sensors through factorial graph optimization. In this paper, factor constraints are constructed based on the three sensors of vision, Lidar, and IMU, and the state solution of all constraint factors is performed on graph optimization problems . The framework of simultaneous localization and mapping for lidar/vision/IMU multi-sensor fusion designed in this paper includes five parts: sensor data pre-processing, state estimation, local sliding window optimization, closed-loop detection and global optimization. The functions and implementation ideas of each module are then introduced separately. 1) Sensor data pre-processing. An image pyramid is constructed for each frame of the image acquired by the camera, harris feature points are extracted for each layer of the image, and quadtree is used to uniformize the feature points to obtain evenly distributed feature points, and the tracked feature points are pushed to the image queue. The IMU data are integrated to obtain the position, velocity, and rotation at the current moment. The pre-integration increment of adjacent image frames that will be used in the back-end optimization is calculated, as are the Jacobian matrix and covariance of the pre-integration error matrix item. The 3D lidar point cloud is de-distorted according to the IMU pre-integration positioning result, and the point cloud data of one frame is unified into the coordinate system of the first lidar point. 2) State estimation. This part includes the lidar inertial odometer (LIO) generated by the fusion of 3D Lidar and IMU and the visual–inertial odometer (VIO) integrated with vision and IMU. VIO assists the lidar odometer to de-warp the point cloud and provides matching initial values, which can effectively shorten the number of optimized iterations and improve the calculation efficiency. 3) Local sliding window optimization. The nonlinear optimization objective function matching the current frame point cloud with the local map is constructed. To maintain the real-time performance of the calculation, the optimization method of the sliding window is used for real-time pose calculation, and the optimized result is fed back to the state estimation. Based on the above operation, high-precision position and attitude output at IMU frequency can be realized. 4) Closed-loop detection. The closed-loop detection algorithm based on 3D lidar and the vision-based bag of words model (Dbow3) algorithm is used for closed-loop detection. Only when the constraints of these two methods are satisfied at the same time is it considered a closed loop. Closed-loop constraints are then added to the global optimization. 5) Global optimization. A separate thread is opened for the global optimization of keyframe-based pose graphs.