Computer vision

High-speed 3D geometrical modeling using Fast Level Set Method

The level set method, introduced by S. Osher and J. A. Sethian, has attracted much attention as a method that realizes a topology free active contour modeling. This method utilizes an implicit representation of a contour to be tracked, and is able to handle the topological change of the contour intrinsically. Various applications based on the level set method have been presented including motion tracking, 3D geometric modeling, and simulation of crystallization or semiconductor growth. However, the calculation cost of reinitialization and updating of the implicit function is considerably expensive as compared with the cost of conventional active contour models such as ``Snakes''. We propose an efficient calculation algorithm for the level set method named the Fast Level Set Method (FLSM). Characteristics of the proposed FLSM are as follows: i) the use of the extension velocity and the high speed construction of the extension velocity field using the Fast Narrow Band Method, ii) the frequent execution of the reinitialization process of the implicit function which requires little calculation cost. The efficiency of the proposed method is verified through computer simulations, and two kinds of typical applications; real-time tracking of moving objects in video images and fast 3D surface reconstruction from scattered point data.

Bunny (Stanford Univ.) Wired basket

Real-time tracking of multiple objects using Fast Level Set Method

Simultaneous tracking of moving objects Fast detection of moving objects
Skeleton extraction Labeling

Development of robust motion capture system using FLSM and stereo cameras

We are developing a new motion capture system using the Fast Level Set Method and multiple stereo cameras. This system can capture multiple motion data performed by several people simultaneously even if they are occluded each other. Experiments for capturing Japanese traditional dancing and clothes in 3D have been conducted.

Motion Captured data
After texture mapping Motion capture


2D-3D alignment based on geometrical consistency

We have proposed a new registration algorithm of a 2D image and a 3D geometrical model for reconstructing a realistic 3D model of indoor scene settings. One of the typical techniques of pose estimation of a 3D model in a 2D image is the method based on the correspondences between 2D photometrical edges and 3D geometrical edges projected on the 2D image. However, for indoor settings, features extracted on the 2D image and jump edges of the geometrical model, which can be extracted robustly, are limited. Therefore, it is difficult to find corresponding edges between the 2D image and the 3D model correctly. For this reason, in most cases, the relative position has to be manually set close to correct position beforehand. To overcome this problem, in the proposed method, firstly the relative pose is roughly estimated by utilizing geometrical consistencies of back-projected 2D photometrical edges on a 3D model. Next, the edge-based method is applied for the precise pose estimation after the above estimation procedure is converged. The performance of the proposed method is successfully demonstrated with some experiments using simulated models of indoor scene settings and actual environments measured by range and image sensors.

2D-3D alignment Alignment result


Visual servo of mobile manipulator using redundancy

We proposed a new technique for the visual servo using the concept of "redundancy". The key idea is the use of a "virtual link" which connects the camera and the target positions. This virtual link can be treated as a virtual mechanical link, and thus, the null-space operation which has been developed for controlling a redundant manipulator can be applied in the same manner.

Tracking using redundancy Visual servo using redundancy

Place recognition using RGB-D camera and laser scanner

The categorization of places in indoor/outdoor environments is an important capability for service robots working and interacting with humans. In this study, we present a method to categorize different areas in indoor/outdoor environments by a mobile robot equipped with a RGB-D camera (Microsoft Kinect) or a laser scanner (FARO/Velodyne). Our approach transforms depth and color images taken at each place into histograms of local binary patterns (LBPs) whose dimensionality is further reduced following a uniform criterion. The histograms are then combined into a single feature vector which is categorized using a supervised method. For indoor environment, we apply our technique to distinguish five different place categories: corridors, laboratories, offices, kitchens, and study rooms. Experimental results show that we can categorize these places with high accuracy using our approach. We also apply the proposed technique for outdoor environment such as parking area, residential area, or urban area. The proposed technique is useful for autonomous driving technology.

Place recognition using Kinect sensor


corridors(255 Mbyte, 5 categories)

genkiclub_f3_corridor_01, genkiclub_f4_corridor_01, w2_10f_corridor_01, w2_7f_corridor_01, w2_9f_corridor_02

kitchens(204 Mbyte, 8 categories)

genkiclub_f3_kitchen_01, genkiclub_f3_kitchen_02, w2_10f_kitchen_01, w2_10f_kitchen_09, w2_9f_kitchen_01, w2_9f_kitchen_02, w2_9f_kitchen_10, w4_6f_kitchen_01

labs(583 Mbyte, 4 categories)

hasegawa_lab, kurazume_lab, taniguchi_lab, uchida_lab

offices(95 Mbyte, 3 categories)

hasegawa_office, kurazume_office, morooka_office

studyrooms(328 Mbyte, 8 categories)

w2_2f_studyroom_01, w2_2f_studyroom_02, w2_2f_tatamiroom_01, w2_2f_tatamiroom_02, w4_2f_studyroom_01, w4_2f_studyroom_02, w4_2f_tatamiroom_01, w4_2f_tatamiroom_02

toilets(116 Mbyte, 3 categories)

w2_10f_toilet_01, w2_2f_toilet_01, w2_9f_toilet_01


Previewed Reality - Near-future perception system -

This research develops a near-future perception system named "Previewed Reality". The system consists of an informationally structured environment (ISE), an immersive VR display, a stereo camera, an optical tracking system, and a dynamic simulator. In an ISE, a number of sensors are embedded, and information such as the position of furniture, objects, humans, and robots, is sensed and stored in a database. The position and orientation of the immersive VR display are also tracked by an optical tracking system. Therefore, we can forecast the next possible events using a dynamic simulator and synthesize virtual images of what users will see in the near future from their own viewpoint. The synthesized images, overlaid on a real scene by using augmented reality technology, are presented to the user. The proposed system can allow a human and a robot to coexist more safely by showing possible hazardous situations to the human intuitively in advance.

Previewed Reality Previewed Reality
Previewed Reality 1.0 and 2.0
Smart Previewed Reality


Fourth person sensing / Fourth person captioning

"Fourth person sensing" and "fourth person captioning" are new concepts for correctly recognizing the circumstances surrounding the user by combining multimodal information obtained from various viewpoints. In these concepts, the information sources are categorized in terms of n-person viewpoints, that is, the first-person (wearable camera), the second-person (camera on a robot), and the third-person (camera embedded in environment) viewpoints, and all the information are fused to recognize the current situation correctly. For example, a reader of a novel can know all the information including emotions such as a hero, a sub character, and other characters. This is like "god's viewpoint" and this research aims to realize this "god's viewpoint".

fig. Fourth person sensing fig. Fourth person captioning