ADD online demo for LoFTR.

chiebot
Jiaming Sun 4 years ago committed by Jiaming Sun
parent 63818b0287
commit 8a6eabeaa3
  1. 6
      .gitignore
  2. 68
      README.md
  3. 240
      demo/demo_loftr.py
  4. 34
      demo/run_demo.sh

6
.gitignore vendored

@ -1,4 +1,5 @@
.vscode/ .vscode/
__pycache__/
*.pyc *.pyc
*.DS_Store *.DS_Store
*.swp *.swp
@ -9,4 +10,7 @@ tmp.*
logs/ logs/
weights/ weights/
dump/ dump/
src/loftr/utils/superglue.py demo/*.mp4
demo/demo_images/
src/loftr/utils/superglue.py
demo/utils.py

@ -15,12 +15,12 @@ In the meanwhile, discussions about the paper are welcomed in the [discussion pa
- [x] Inference code and pretrained models (DS and OT) (2021-4-7) - [x] Inference code and pretrained models (DS and OT) (2021-4-7)
- [x] Code for reproducing the test-set results (2021-4-7) - [x] Code for reproducing the test-set results (2021-4-7)
- [ ] Webcam demo to reproduce the result shown in the GIF above (expected 2021-4-13) - [x] Webcam demo to reproduce the result shown in the GIF above (2021-4-13)
- [ ] Training code and training data preparation (expected 2021-6-10) - [ ] Training code and training data preparation (expected 2021-6-10)
## Installation ## Installation
```shell ```shell
# For full pytorch-lightning trainer features # For full pytorch-lightning trainer features (recommended)
conda env create -f environment.yaml conda env create -f environment.yaml
conda activate loftr conda activate loftr
@ -33,7 +33,8 @@ We provide the [download link](https://drive.google.com/drive/folders/1DOcOPZb3-
- the megadepth-1500-testset (~600MB). - the megadepth-1500-testset (~600MB).
- 4 pretrained models of indoor-ds, indoor-ot, outdoor-ds and outdoor-ot (each ~45MB). - 4 pretrained models of indoor-ds, indoor-ot, outdoor-ds and outdoor-ot (each ~45MB).
By now, the LoFTR-DS model is ready to go! By now, the environment is all set and the LoFTR-DS model is ready to go!
If you want to run LoFTR-OT, some extra steps are needed:
<details> <details>
<summary>[Requirements for LoFTR-OT]</summary> <summary>[Requirements for LoFTR-OT]</summary>
@ -71,7 +72,55 @@ By now, the LoFTR-DS model is ready to go!
</details> </details>
An example is in the `notebooks/demo_single_pair.ipynb`. An example is given in `notebooks/demo_single_pair.ipynb`.
### Online demo
Run the online demo with a webcam to reproduce the result shown in the GIF above.
```bash
cd demo
./run_demo.sh
```
<details>
<summary>[run_demo.sh]</summary>
```bash
#!/bin/bash
set -e
# set -x
if [ ! -f utils.py ]; then
echo "Downloading utils.py from the SuperGlue repo."
echo "We cannot provide this file directly due to its strict licence."
wget https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/models/utils.py
fi
# Use webcam 0 as input source.
input=0
# or use a pre-recorded video given the path.
# input=/home/sunjiaming/Downloads/scannet_test/$scene_name.mp4
# Toggle indoor/outdoor model here.
model_ckpt=../weights/indoor_ds.ckpt
# model_ckpt=../weights/outdoor_ds.ckpt
# Optionally assign the GPU ID.
# export CUDA_VISIBLE_DEVICES=0
echo "Running LoFTR demo.."
eval "$(conda shell.zsh hook)"
conda activate loftr
python demo_loftr.py --weight $model_ckpt --input $input
# To save the input video and output match visualizations.
# python demo_loftr.py --weight $model_ckpt --input $input --save_video --save_input
# Running on remote GPU servers with no GUI.
# Save images first.
# python demo_loftr.py --weight $model_ckpt --input $input --no_display --output_dir="./demo_images/"
# Then convert them to a video.
# ffmpeg -framerate 15 -pattern_type glob -i '*.png' -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4
```
</details>
### Reproduce the testing results with pytorch-lightning ### Reproduce the testing results with pytorch-lightning
@ -84,14 +133,12 @@ bash ./scripts/reproduce_test/indoor_ds.sh
python test.py configs/data/scannet_test_1500.py configs/loftr/loftr_ds.py --ckpt_path weights/indoor_ds.ckpt --profiler_name inference --gpus=1 --accelerator="ddp" python test.py configs/data/scannet_test_1500.py configs/loftr/loftr_ds.py --ckpt_path weights/indoor_ds.ckpt --profiler_name inference --gpus=1 --accelerator="ddp"
``` ```
For visualizing the dump results, please refer to `notebooks/visualize_dump_results.ipynb`. For visualizing the results, please refer to `notebooks/visualize_dump_results.ipynb`.
<br/> <br/>
## Citation ## Citation
If you find this code useful for your research, please use the following BibTeX entry. If you find this code useful for your research, please use the following BibTeX entry.
@ -100,16 +147,11 @@ If you find this code useful for your research, please use the following BibTeX
@article{sun2021loftr, @article{sun2021loftr,
title={{LoFTR}: Detector-Free Local Feature Matching with Transformers}, title={{LoFTR}: Detector-Free Local Feature Matching with Transformers},
author={Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei}, author={Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei},
journal={CVPR}, journal={{CVPR}},
year={2021} year={2021}
} }
``` ```
<!-- ## Acknowledgment
This repo is built based on the Mask R-CNN implementation from [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark), and we also use the pretrained Stereo R-CNN weight from [here](https://drive.google.com/file/d/1rZ5AsMms7-oO-VfoNTAmBFOr8O2L0-xt/view?usp=sharing) for initialization. -->
## Copyright ## Copyright
This work is affiliated with ZJU-SenseTime Joint Lab of 3D Vision, and its intellectual property belongs to SenseTime Group Ltd. This work is affiliated with ZJU-SenseTime Joint Lab of 3D Vision, and its intellectual property belongs to SenseTime Group Ltd.

@ -0,0 +1,240 @@
front_matter = """
------------------------------------------------------------------------
Online demo for [LoFTR](https://zju3dv.github.io/loftr/).
This demo is heavily inspired by [SuperGlue](https://github.com/magicleap/SuperGluePretrainedNetwork/).
We thank the authors for their execellent work.
------------------------------------------------------------------------
"""
import os
import argparse
from pathlib import Path
import cv2
import torch
import numpy as np
import matplotlib.cm as cm
os.sys.path.append("../") # Add the project directory
from src.loftr import LoFTR, default_cfg
from src.config.default import get_cfg_defaults
try:
from demo.utils import (AverageTimer, VideoStreamer,
make_matching_plot_fast, make_matching_plot, frame2tensor)
except:
raise ImportError("This demo requires utils.py from SuperGlue, please use run_demo.sh to start this script.")
torch.set_grad_enabled(False)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='LoFTR online demo',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--weight', type=str, help="Path to the checkpoint.")
parser.add_argument(
'--input', type=str, default='0',
help='ID of a USB webcam, URL of an IP camera, '
'or path to an image directory or movie file')
parser.add_argument(
'--output_dir', type=str, default=None,
help='Directory where to write output frames (If None, no output)')
parser.add_argument(
'--image_glob', type=str, nargs='+', default=['*.png', '*.jpg', '*.jpeg'],
help='Glob if a directory of images is specified')
parser.add_argument(
'--skip', type=int, default=1,
help='Images to skip if input is a movie or directory')
parser.add_argument(
'--max_length', type=int, default=1000000,
help='Maximum length if input is a movie or directory')
parser.add_argument(
'--resize', type=int, nargs='+', default=[640, 480],
help='Resize the input image before running inference. If two numbers, '
'resize to the exact dimensions, if one number, resize the max '
'dimension, if -1, do not resize')
parser.add_argument(
'--no_display', action='store_true',
help='Do not display images to screen. Useful if running remotely')
parser.add_argument(
'--save_video', action='store_true',
help='Save output (with match visualizations) to a video.')
parser.add_argument(
'--save_input', action='store_true',
help='Save the input images to a video (for gathering repeatable input source).')
parser.add_argument(
'--skip_frames', type=int, default=1,
help="Skip frames from webcam input.")
parser.add_argument(
'--top_k', type=int, default=2000, help="The max vis_range (please refer to the code).")
parser.add_argument(
'--bottom_k', type=int, default=0, help="The min vis_range (please refer to the code).")
opt = parser.parse_args()
print(front_matter)
parser.print_help()
if len(opt.resize) == 2 and opt.resize[1] == -1:
opt.resize = opt.resize[0:1]
if len(opt.resize) == 2:
print('Will resize to {}x{} (WxH)'.format(
opt.resize[0], opt.resize[1]))
elif len(opt.resize) == 1 and opt.resize[0] > 0:
print('Will resize max dimension to {}'.format(opt.resize[0]))
elif len(opt.resize) == 1:
print('Will not resize images')
else:
raise ValueError('Cannot specify more than two integers for --resize')
if torch.cuda.is_available():
device = 'cuda'
else:
raise RuntimeError("GPU is required to run this demo.")
# Initialize LoFTR
matcher = LoFTR(config=default_cfg)
matcher.load_state_dict(torch.load(opt.weight)['state_dict'])
matcher = matcher.eval().to(device=device)
# Configure I/O
if opt.save_video:
print('Writing video to loftr-matches.mp4...')
writer = cv2.VideoWriter('loftr-matches.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 15, (640*2 + 10, 480))
if opt.save_input:
print('Writing video to demo-input.mp4...')
input_writer = cv2.VideoWriter('demo-input.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 15, (640, 480))
vs = VideoStreamer(opt.input, opt.resize, opt.skip,
opt.image_glob, opt.max_length)
frame, ret = vs.next_frame()
assert ret, 'Error when reading the first frame (try different --input?)'
frame_id = 0
last_image_id = 0
frame_tensor = frame2tensor(frame, device)
last_data = {'image0': frame_tensor}
last_frame = frame
if opt.output_dir is not None:
print('==> Will write outputs to {}'.format(opt.output_dir))
Path(opt.output_dir).mkdir(exist_ok=True)
# Create a window to display the demo.
if not opt.no_display:
window_name = 'LoFTR Matches'
cv2.namedWindow(window_name, cv2.WINDOW_NORMAL)
cv2.resizeWindow(window_name, (640*2, 480))
else:
print('Skipping visualization, will not show a GUI.')
# Print the keyboard help menu.
print('==> Keyboard control:\n'
'\tn: select the current frame as the reference image (left)\n'
'\td/f: move the range of the matches (ranked by confidence) to visualize\n'
'\tc/v: increase/decrease the length of the visualization range (i.e., total number of matches) to show\n'
'\tq: quit')
timer = AverageTimer()
vis_range = [opt.bottom_k, opt.top_k]
while True:
frame_id += 1
frame, ret = vs.next_frame()
if frame_id % opt.skip_frames != 0:
# print("Skipping frame.")
continue
if opt.save_input:
inp = np.stack([frame]*3, -1)
inp_rgb = cv2.cvtColor(frame, cv2.COLOR_GRAY2RGB)
input_writer.write(inp_rgb)
if not ret:
print('Finished demo_loftr.py')
break
timer.update('data')
stem0, stem1 = last_image_id, vs.i - 1
frame_tensor = frame2tensor(frame, device)
last_data = {**last_data, 'image1': frame_tensor}
matcher(last_data)
total_n_matches = len(last_data['mkpts0_f'])
mkpts0 = last_data['mkpts0_f'].cpu().numpy()[vis_range[0]:vis_range[1]]
mkpts1 = last_data['mkpts1_f'].cpu().numpy()[vis_range[0]:vis_range[1]]
mconf = last_data['mconf'].cpu().numpy()[vis_range[0]:vis_range[1]]
# Normalize confidence.
if len(mconf) > 0:
conf_vis_min = 0.
conf_min = mconf.min()
conf_max = mconf.max()
mconf = (mconf - conf_vis_min) / (conf_max - conf_vis_min + 1e-5)
timer.update('forward')
alpha = 0
color = cm.jet(mconf, alpha=alpha)
text = [
f'LoFTR',
'# Matches (showing/total): {}/{}'.format(len(mkpts0), total_n_matches),
]
small_text = [
f'Showing matches from {vis_range[0]}:{vis_range[1]}',
f'Confidence Range: {conf_min:.2f}:{conf_max:.2f}',
'Image Pair: {:06}:{:06}'.format(stem0, stem1),
]
out = make_matching_plot_fast(
last_frame, frame, mkpts0, mkpts1, mkpts0, mkpts1, color, text,
path=None, show_keypoints=False, small_text=small_text)
# Save high quality png, optionally with dynamic alpha support (unreleased yet).
# save_path = 'demo_vid/{:06}'.format(frame_id)
# make_matching_plot(
# last_frame, frame, mkpts0, mkpts1, mkpts0, mkpts1, color, text,
# path=save_path, show_keypoints=opt.show_keypoints, small_text=small_text)
if not opt.no_display:
if opt.save_video:
writer.write(out)
cv2.imshow('LoFTR Matches', out)
key = chr(cv2.waitKey(1) & 0xFF)
if key == 'q':
if opt.save_video:
writer.release()
if opt.save_input:
input_writer.release()
vs.cleanup()
print('Exiting...')
break
elif key == 'n':
last_data['image0'] = frame_tensor
last_frame = frame
last_image_id = (vs.i - 1)
frame_id_left = frame_id
elif key in ['d', 'f']:
if key == 'd':
if vis_range[0] >= 0:
vis_range[0] -= 200
vis_range[1] -= 200
if key =='f':
vis_range[0] += 200
vis_range[1] += 200
print(f'\nChanged the vis_range to {vis_range[0]}:{vis_range[1]}')
elif key in ['c', 'v']:
if key == 'c':
vis_range[1] -= 50
if key =='v':
vis_range[1] += 50
print(f'\nChanged the vis_range[1] to {vis_range[1]}')
elif opt.output_dir is not None:
stem = 'matches_{:06}_{:06}'.format(stem0, stem1)
out_file = str(Path(opt.output_dir, stem + '.png'))
print('\nWriting image to {}'.format(out_file))
cv2.imwrite(out_file, out)
else:
raise ValueError("output_dir is required when no display is given.")
timer.update('viz')
timer.print()
cv2.destroyAllWindows()
vs.cleanup()

@ -0,0 +1,34 @@
#!/bin/bash
set -e
# set -x
if [ ! -f utils.py ]; then
echo "Downloading utils.py from the SuperGlue repo."
echo "We cannot provide this file directly due to its strict licence."
wget https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/models/utils.py
fi
# Use webcam 0 as input source.
input=0
# or use a pre-recorded video given the path.
# input=/home/sunjiaming/Downloads/scannet_test/$scene_name.mp4
# Toggle indoor/outdoor model here.
model_ckpt=../weights/indoor_ds.ckpt
# model_ckpt=../weights/outdoor_ds.ckpt
# Optionally assign the GPU ID.
# export CUDA_VISIBLE_DEVICES=0
echo "Running LoFTR demo.."
eval "$(conda shell.zsh hook)"
conda activate loftr
python demo_loftr.py --weight $model_ckpt --input $input
# To save the input video and output match visualizations.
# python demo_loftr.py --weight $model_ckpt --input $input --save_video --save_input
# Running on remote GPU servers with no GUI.
# Save images first.
# python demo_loftr.py --weight $model_ckpt --input $input --no_display --output_dir="./demo_images/"
# Then convert them to a video.
# ffmpeg -framerate 15 -pattern_type glob -i '*.png' -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4
Loading…
Cancel
Save