- Install Docker on your machine. Make sure your machine already supports KVM. You can use the following code to check if your machine supports kvm:
apt-get install cpu-checker
Meanwhile, ensure that your terminal has permission to start Docker. You can set it through the following code:
sudo usermod -aG docker $USER
newgrp docker
Download related docker files on link: https://drive.google.com/file/d/1SJ79gdO7whgUod3HnuS87aOKihRk1i-U/view?usp=drive_link
To create docker, run:
mkdir docker_file
cd docker_file
unzip /path/to/your/docker-file.zip
cd docker-file
docker build -t android_eval:latest .
Note that we use
RUN sed -i 's/deb.debian.org/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
to speed up download speed, you can replace it to your source if needed or just delete it.
- Create a yaml file in the project directory with the following content:
name: OpenAIAgent
api_key: sk-
api_base: ""
model_name: "gpt-4-1106-preview"
max_new_tokens: 512
class: TextOnlyMobileTask_AutoTest
save_dir: "./logs/evaluation"
max_rounds: 25
request_interval: 3
avd_name: Pixel_7_Pro_API_33
avd_log_dir: ./logs/evaluation
docker: True
image_name: android_eval:latest
port: 6060
name: The name of the agent being used, must be declared in
.- Type: String
- Example:
args: Arguments to configure the agent.
api_key: The API key for authenticating the agent.
- Type: String
api_base: The base URL for the API endpoint.
- Type: String
model_name: The name of the model to be used.
- Type: String
- Example:
max_new_tokens: The maximum number of new tokens the model can generate in one request.
- Type: Integer
- Example:
class: The class defining the type of task, must be declared in
. For basic evaluation, we use"TextOnlyMobileTask_AutoTest"
for XML mode and"ScreenshotMobileTask_AutoTest"
for SoM mode.- Type: String
- Example:
args: Arguments to configure the task.
save_dir: The directory where the evaluation logs will be saved.
- Type: String
- Example:
max_rounds: The maximum number of rounds for the task. default to be 25.
- Type: Integer
- Example:
request_interval: The interval between requests, in seconds. default to be 3.
- Type: Integer
- Example:
mode: The mode of operation. default to be
.- Type: String
- Example:
avd_name: The name of the AVD being used.
- Type: String
- Example:
avd_log_dir: The directory where the AVD logs will be saved.
- Type: String
- Example:
docker: Flag to indicate whether Docker is used for the evaluation. Need to set True.
- Type: Boolean
- Example:
docker_args: Arguments for configuring Docker.
image_name: The name of the Docker image to be used.
- Type: String
- Example:
port: The start port to be used for the Docker container.
- Type: Integer
- Example: