Computer vision

High-speed 3D geometrical modeling using Fast Level Set Method

The level set method, introduced by S. Osher and J. A. Sethian, has garnered significant attention as a topology-free approach to active contour modeling. This method employs an implicit representation of the contour to be tracked, inherently managing the contour's topological changes. Various applications based on the level set method include motion tracking, 3D geometric modeling, and simulations of crystallization or semiconductor growth. However, the computational cost of reinitializing and updating the implicit function is considerably higher than that of conventional active contour models like "Snakes." We propose an efficient algorithm for the level set method, called the Fast Level Set Method (FLSM). The key features of the proposed FLSM are: i) Utilization of extension velocity and the rapid construction of the extension velocity field using the Fast Narrow Band Method. ii) Frequent execution of the reinitialization process for the implicit function, which incurs minimal computational cost. The efficiency of the proposed method is validated through computer simulations and two typical applications: real-time tracking of moving objects in video images and fast 3D surface reconstruction from scattered point data.

Bunny (Stanford Univ.) Wired basket

Real-time tracking of multiple objects using Fast Level Set Method

Simultaneous tracking of moving objects Fast detection of moving objects
Skeleton extraction Labeling

Development of robust motion capture system using FLSM and stereo cameras

We are developing a new motion capture system using the Fast Level Set Method and multiple stereo cameras. This system can capture motion data of several people simultaneously, even if they are occluding each other. Experiments have been conducted to capture Japanese traditional dancing and clothing in 3D.

Motion Captured data
After texture mapping Motion capture

Papers

2D-3D alignment based on geometrical consistency

We have proposed a new registration algorithm for aligning 2D images with 3D geometrical models to reconstruct realistic 3D models of indoor scene settings. One of the common techniques for pose estimation of a 3D model in a 2D image involves matching 2D photometric edges with 3D geometrical edges projected onto the 2D image. However, in indoor settings, the features that can be robustly extracted from 2D images and the jump edges of geometrical models are limited. This limitation makes it difficult to accurately find corresponding edges between the 2D image and the 3D model. Consequently, the relative position often needs to be manually set close to the correct position beforehand. To overcome this issue, the proposed method first roughly estimates the relative pose by utilizing the geometric consistency of back-projected 2D photometric edges on a 3D model. After this initial estimation, an edge-based method is applied for precise pose estimation once the prior estimation has converged. The performance of the proposed method is successfully demonstrated through experiments using simulated models of indoor scene settings and actual environments measured by range and image sensors.

2D-3D alignment Alignment result

Papers

Visual servo of mobile manipulator using redundancy

We have proposed a new technique for visual servoing using the concept of "redundancy." The key idea is the use of a "virtual link" that connects the camera and the target positions. This virtual link can be treated as a virtual mechanical link, allowing the null-space operation, which was developed for controlling a redundant manipulator, to be applied in the same manner.

Tracking using redundancy Visual servo using redundancy

Place recognition using RGB-D camera and laser scanner

Categorizing places in indoor and outdoor environments is crucial for service robots to effectively work and interact with humans. In this study, we present a method for categorizing different areas using a mobile robot equipped with an RGB-D camera (Microsoft Kinect) or a laser scanner (FARO/Velodyne). Our approach converts depth and color images taken at each location into histograms of local binary patterns (LBPs), whose dimensionality is further reduced using a uniform criterion. These histograms are then combined into a single feature vector, which is categorized using a supervised method. For indoor environments, our technique distinguishes between five place categories: corridors, laboratories, offices, kitchens, and study rooms. Experimental results show that our approach can accurately categorize these places. We also apply the proposed technique to outdoor environments such as parking areas, residential areas, and urban areas. This technique is beneficial for autonomous driving technology.

Place recognition using Kinect sensor
photo

Database

corridors(255 Mbyte, 5 categories)

photo
genkiclub_f3_corridor_01, genkiclub_f4_corridor_01, w2_10f_corridor_01, w2_7f_corridor_01, w2_9f_corridor_02

kitchens(204 Mbyte, 8 categories)

photo
genkiclub_f3_kitchen_01, genkiclub_f3_kitchen_02, w2_10f_kitchen_01, w2_10f_kitchen_09, w2_9f_kitchen_01, w2_9f_kitchen_02, w2_9f_kitchen_10, w4_6f_kitchen_01

labs(583 Mbyte, 4 categories)

photo
hasegawa_lab, kurazume_lab, taniguchi_lab, uchida_lab

offices(95 Mbyte, 3 categories)

photo
hasegawa_office, kurazume_office, morooka_office

studyrooms(328 Mbyte, 8 categories)

photo
w2_2f_studyroom_01, w2_2f_studyroom_02, w2_2f_tatamiroom_01, w2_2f_tatamiroom_02, w4_2f_studyroom_01, w4_2f_studyroom_02, w4_2f_tatamiroom_01, w4_2f_tatamiroom_02

toilets(116 Mbyte, 3 categories)

photo
w2_10f_toilet_01, w2_2f_toilet_01, w2_9f_toilet_01

Papers

Previewed Reality - Near-future perception system -

This research develops a near-future perception system named "Previewed Reality." The system consists of an informationally structured environment (ISE), an immersive VR display, a stereo camera, an optical tracking system, and a dynamic simulator. In an ISE, numerous sensors are embedded to sense and store information about the position of furniture, objects, humans, and robots in a database. The position and orientation of the immersive VR display are also tracked by an optical tracking system. Consequently, the system can forecast the next possible events using a dynamic simulator and synthesize virtual images of what users will see in the near future from their own viewpoint. These synthesized images, overlaid on a real scene using augmented reality technology, are presented to the user. The proposed system can allow humans and robots to coexist more safely by intuitively showing possible hazardous situations to the human in advance.

Previewed Reality Previewed Reality
Previewed Reality 1.0 and 2.0
Smart Previewed Reality

Papers

Fourth person sensing / Fourth person captioning

"Fourth person sensing" and "fourth person captioning" are innovative concepts for accurately recognizing the circumstances surrounding a user by integrating multimodal information from various viewpoints. These concepts categorize information sources based on n-person viewpoints: the first-person (wearable camera), the second-person (camera on a robot), and the third-person (camera embedded in the environment). All the information is combined to correctly recognize the current situation. For instance, a novel reader can understand all the information, including the emotions of the hero, sub-characters, and other characters. This is akin to a "god's viewpoint," and this research aims to achieve such a comprehensive perspective.

fig. Fourth person sensing fig. Fourth person captioning

Papers