Project

# Title Team Members TA Documents Sponsor
37 Visual chatting and Real-time acting Robot
Haozhe Chi
Jiatong Li
Minghua Yang
Zonghai Jing
design_document1.pdf
design_document2.pdf
final_paper1.pdf
final_paper2.pdf
final_paper3.pdf
proposal1.pdf
proposal2.pdf
proposal3.pdf
video1.mp4
Gaoang Wang
Group member:
Haozhe Chi, haozhe4
Minghua Yang, minghua3
Zonghai Jing, zonghai2
Jiatong Li, jl180
Problem:
With the rise of large language models (LLMs), Large visual language models (LVLMs) have achieved great success in recent AI development. However, it's still a big challenge to configure an LVLM system for a robot and make all hardware work well around this system. We aim to design an LVLM-based robot that can react to multimodal inputs.
Solution overview:
We aim to deliver an LVLM system (software part), a robot arm for robot actions like grabbing objects (hardware part), a robot movement equipment for moving according to the environment (hardware part), a camera for real-time visual inputs (hardware part), a laser tracker for implicating the object (hardware part), and an audio equipment for audio inputs and outputs (hardware part).
Solution components:
LVLM system:
We will deploy a BLIP-2 based AI model for visual language processing. We will incorporate the strengths of several recent visual-language models, including LlaVA, Videochat, and VideoLlaMA, and design a better real-time visual language processing system. This system should be able to realize real-time visual chatting with less object hallucination.
Robot arm and wheels:
We will use ROS environment to control robot movements. We will apply to use robot arms in ZJUI ECE470 labs and buy certain wheels for moving. We may use four-wheel design or track design.
Camera:
We will configure cameras for real-time image inputs. 3D reconstruction may be needed, depending on our LVLM system design.
If multi-viewed inputs are needed, we will design a better camera configuration.
Audio processing:
We will use two audio processing systems: voice recognition and text-to-audio generation. They are responsible for audio inputs and outputs respectively. We will use certain audio broadcast components to make the robot talk.
Criterion for success:
The robot consists of functions including voice recognition, laser tracking, real-time visual chatting, a multimodal processing system, identifying a certain object, moving and grabbing it, and multi-view camera configuration. All the hardware parts should cooperate well in the final demo. This means that not only every single hardware should function well, but also perform more advanced functions. For instance, the robot should be able to move towards certain objects while chatting with humans.

Augmented Reality and Virtual Reality for Electromagnetics Education

Zhanyu Feng, Zhewen Fu, Han Hua, Daosen Sun

Featured Project

# PROBLEM

Many students found electromagnetics a difficult subject to master partly because electromagnetic waves are difficult to visualize directly using our own eyes. Thus, it becomes a mathematical abstract that heavily relies upon mathematical formulations.

# SOLUTION OVERVIEW

We focus on using AR/VR technology for large-scale, complex, and interactive visualization for the electromagnetic waves. To speed up the calculation, we are going to compute the field responses and render the fields out in real-time probably accelerated by GPU computing, cluster computation, and other more advanced numerical algorithms. Besides, we propose to perform public, immersive, and interactive education to users. We plan to use the existing VR equipment, VR square at laboratory building D220 to present users with a wide range of field of view, high-resolution, and high-quality 3D stereoscopic images, making the virtual environment perfectly comparable to the real world. Users can work together and interact with each other while maneuvering the virtual objects. This project also set up the basis for us to develop digital-twins technology for electromagnetics that effectively links the real world with digital space.

# COMPONENTS

1.Numerical computation component: The part that responsible for computing the field lines via Maxwell equations. We will try to load the work on the GPU to get better performance.

2.Graphic rendering component: The part will receive data from the numerical computation component and use renderers to visualize the data.

3.User interface component: This part can process users’ actions and allow the users to interact with objects in the virtual world.

4.Audio component: This part will generate audio based on the electromagnetic fields on charged objects.

5.Haptic component: This part will interact with the controller to send vibration feedback to users based on the field strength.

# CRITERIA OF SUCCESS

Set up four distinct experiments to illustrate the concept of four Maxwell equations. Students can work together and use controllers to set up different types of charged objects and operate the orientation/position of them. Students can see both static and real-time electromagnetic fields around charged objects via VR devices. Achieve high frame rates in the virtual world and fasten the process of computation and using advanced algorithms to get smooth electromagnetic fields.

# WHAT MAKES OUR PROJECT UNIQUE

We will build four distinct scenarios based on four Maxwell Equations rather than the one Gaussian’s Law made by UIUC team. In these scenarios, we will render both electric and magnetic field lines around charged objects, as well as the forces between them.

The experiments allow users to interact with objects simultaneously. In other words, users can cooperate with each other while conducting experiments. While the lab scene made by UIUC team only allows one user to do the experiment alone, we offer the chance to make the experiment public and allow multiple users to engage in the experiments.

We will use different hardware to do the computation. Rather than based on CPU, we will parallelize the calculation and using GPU to improve the performance and simulate large-scale visualization for the fields to meet the multi-users needs.

Compared to the project in the UIUC, we will not only try to visualize the fields, but also expand the dimension that we can perceive the phenomena i.e., adding haptic feedback in the game and also using audio feedback to give users 4D experience.