| United States Patent Application |
20090177300
|
| Kind Code
|
A1
|
|
Lee; Michael M.
|
July 9, 2009
|
Methods and apparatus for altering audio output signals
Abstract
Methods, systems and computer readable media for altering an audio output
are provided. In some embodiments, the system may change the original
frequency content of an audio data file to a second frequency content so
that a recorded audio track will sound as if a different person had
recorded it when it is played back. In other embodiments, the system may
receive an audio data file and a voice signature, and it may apply the
voice signature to the audio data file to alter the audio output of the
audio data file. In that instance, the audio data file may be a textual
representation of a recorded audio data file.
| Inventors: |
Lee; Michael M.; (San Jose, CA)
|
| Correspondence Name and Address:
|
KRAMER LEVIN NAFTALIS & FRANKEL LLP
1177 Avenue of the Americas
New York
NY
10036
US
|
| Assignee Name and Adress: |
Apple Inc.
|
| Serial No.:
|
080523 |
| Series Code:
|
12
|
| Filed:
|
April 2, 2008 |
| U.S. Current Class: |
700/94 |
| U.S. Class at Publication: |
700/94 |
| Intern'l Class: |
G06F 17/20 20060101 G06F017/20; G06F 17/00 20060101 G06F017/00 |
Claims
1. A method for producing altered audio output signals,
comprising:providing an audio data file having a first frequency
content;processing an audio input having a second frequency
content;changing the first frequency content of the audio data file to
the second frequency content; andplaying the altered audio file to
produce altered audio output signals.
2. The method of claim 1, wherein providing comprises:receiving a
selection by a user of an audio file for playback; andloading the
selected audio data file from storage into memory.
3. The method of claim 1, wherein processing comprises:recording an audio
input; andconfiguring the audio input such that it can be used to alter
the audio data file.
4. The method of claim 1, wherein changing comprises:receiving an
indication from a user that only a portion of the audio data file should
be altered; andaltering only the selected portion of the audio data file
using the audio input.
5. A method for producing altered audio output signals,
comprising:providing a text representation of an audio data
file;processing a voice signature;applying the voice signature to the
text representation of the audio data file to produce an altered audio
data file; andplaying the altered audio data file to produce altered
audio output signals.
6. The method of claim 5, wherein providing comprises:receiving a
selection by a user of an audio file for playback; andloading a text
representation of the selected audio data file from storage into memory.
7. The method of claim 5, wherein processing comprises:receiving a
selection by a user of an voice signature; andconfiguring the selected
voice signature such that it can be applied to the text representation.
8. The method of claim 5, wherein applying comprises:receiving an
indication from a user that only a portion of the audio data file should
be altered; andaltering only the selected portion of the audio data file
using the voice signature.
9. The method of claim 5, wherein processing comprises:receiving data
defining the voice signature.
10. The method of claim 9, wherein receiving data defining a voice
signature comprises:recording an audio input; andextracting data defining
the voice signature from the recorded audio input.
11. Apparatus for producing altered audio output signals,
comprising:storage circuitry for providing an audio data file having a
first frequency content;processing circuitry for processing an audio
input having a second frequency content;altering circuitry for changing
the first frequency content of the audio data file to the second
frequency content; andoutput circuitry for playing the audio data file
using the second frequency content.
12. The apparatus of claim 11, further comprising:display circuitry for
presenting a user a list of audio data files; andselection circuitry for
enabling the user to select a chosen audio data file from the list of
audio data files, and wherein the storage circuitry is configured to
provide the chosen audio data file.
13. The apparatus of claim 11, further comprising:display circuitry for
presenting a user a list of audio inputs; andselection circuitry for
enabling the user to select a chosen audio input from the list of audio
inputs, and wherein the processing circuitry is configured to process the
chosen audio input.
14. The apparatus of claim 11, further comprising:portion circuitry for
enabling a user to choose a portion of the audio data file that will be
altered.
15. Apparatus for producing altered audio output signals,
comprising:analysis circuitry for analyzing a text representation of a
recorded audio data file;processing circuitry for processing a voice
signature;application circuitry for applying the voice signature to the
text representation of the audio data file and producing an output;
andoutput circuitry for playing the output.
16. The apparatus of claim 15, further comprising:display circuitry for
presenting a user with a list of audio data files; andselection circuitry
for enabling a user to select a chosen audio data file from the list of
audio data files.
17. The apparatus of claim 15, further comprising:display circuitry for
presenting a user with a list of voice signatures; andselection circuitry
for enabling a user to select a chosen voice signature from the list of
voice signatures.
18. The apparatus of claim 15, further comprising:portion circuitry for
enabling a user to choose a portion of the audio data file that the
application circuitry will function on.
19. The apparatus of claim 15, further comprising:input circuitry for
receiving an audio input defining the voice signature.
20. The apparatus of claim 19, further comprising:recording circuitry for
recording the audio input; andextraction circuitry for extracting data
defining the voice signature from the recorded audio input.
21. A computer readable medium containing at least computer program code
for producing altered audio output signals, comprising:computer program
code for processing a text representation of an audio data file;computer
program code for processing a voice signature; andcomputer program code
for applying the voice signature to the text of the audio data file to
produce an output.
22. The computer readable medium of claim 21, further comprising:computer
program code for presenting a user with a list of audio data files;
andcomputer program code for enabling the user to select a chosen audio
data file from the list of audio data files.
23. The computer readable medium of claim 21, further comprising:computer
program code for presenting a user with a list of voice signatures;
andcomputer program code for enabling the user to select a chosen voice
signature from the list of voice signatures.
24. The computer readable medium of claim 21, further comprising:computer
program code for enabling a user to select a portion of the audio data
file that the voice signature will be applied to.
25. The computer readable medium of claim 21, further comprising:computer
program code for recording an audio input; andcomputer program code for
extracting data defining the voice signature from the recorded audio
input.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001]This application claims priority to U.S. Provisional Patent
Application No. 61/010,079 filed on Jan. 3, 2008, the entirety of which
is hereby incorporated by reference.
BACKGROUND OF THE INVENTION
[0002]This is directed to altering audio output signals. More
particularly, this invention relates to methods, systems and computer
readable media for simulating one or more voices when playing back an
audio file.
[0003]Media devices are widely used to play back various types of audio
files, such as audiobook files, podcast files and music files. When using
these devices, a user is limited to playing back the audio files as
recorded. For example, when a user plays an audiobook file, the user can
only listen to the originally recorded voice(s) of the narrator(s). Even
if different narrators are used for different characters in the book, the
voices of the narrators cannot be changed into different voices after the
recording has been made.
[0004]Despite the restrictions involved in playing back audio files, users
of media devices may wish to change the audio output of audio files. A
mother, for example, might wish to change the narrator's voice in a
pre-recorded, commercially available audiobook to her own voice, so that
her child can listen to the audiobook as narrated in the mother's voice
in her absence. In another scenario, a student listening to a lecture as
a podcast file might want to change the audio of certain sections of the
lecture to sound like someone else's voice, so as to emphasize important
parts of the lecture.
[0005]The present invention solves these problems and others.
SUMMARY OF THE INVENTION
[0006]Methods, systems and computer readable media are provided for
adjusting audio output signals. The system may include any suitable
electronic device for producing audio output signals. The audio output
signals produced may include vocal output signals.
[0007]In some embodiments, the system may receive an audio data file
containing voice signals with an original or first frequency content as
well as a second frequency content. The system may change the first
frequency content of the voice to the second frequency content to produce
an adjusted audio output.
[0008]In some embodiments, the system can utilize audio files containing
given content having a first frequency characteristic corresponding to
the spoken voice of one or more individuals. The system can also include
a microphone or other input device such that a different audio signal
characteristic of a different individual can be processed by the system.
The system can then take the second audio signal and apply its
characteristics to the first signal such that the given content can then
be played back using the individual's voice characteristics without that
individual ever having had to record the given content into the system.
[0009]In other embodiments, the system may receive a text of an audio data
file and a voice signature. The system may produce an adjusted audio
output of the audio data file by applying the voice signature to the text
of the audio data file.
[0010]Persons of ordinary skill in the art will appreciate that at least
some of the various embodiments described herein can be combined together
or they can be combined with other embodiments without departing from the
spirit of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011]The above and other features of the invention, its nature and
various advantages will be more apparent upon consideration of the
following detailed description, taken in conjunction with the
accompanying drawings in which:
[0012]FIGS. 1 and 2 are illustrative systems that may be used in
accordance with embodiments of the present invention;
[0013]FIG. 3 is a simplified schematic block diagram of an illustrative
embodiment of circuitry in accordance with embodiments of the present
invention;
[0014]FIGS. 4-8 are schematic views of illustrative displays in accordance
with various embodiments of the present invention;
[0015]FIGS. 9A-9C are simplified logical flow diagrams of illustrative
methods in accordance with embodiments of the present invention;
[0016]FIG. 10 is an illustrative audio data file structure in accordance
with embodiments of the present invention; and
[0017]FIG. 11 is an illustrative metadata alter file structure in
accordance with embodiments of the present invention.
DETAILED DESCRIPTION
[0018]FIG. 1 shows system 100. In some embodiments, such as that shown in
FIG. 1, system 100 only includes handheld device 102. One skilled in the
art would appreciate that various accessory devices (such as, e.g.,
headsets, docking stations, speaker systems, and others) could also be
included in system 100.
[0019]Handheld device 102 can be any device that is capable of producing
audio output signals including, but not limited to, a portable media
player, an audiobook, an audio player, a video player, a cellular
telephone, a computer, a stereo system, a personal organizer, a hybrid of
such devices, or combinations thereof. Handheld device 102 may perform a
single function (e.g., a device that plays music, such as the earlier
versions of the iPod.TM. marketed by Apple Inc., or Apple's iPod.TM.
Shuffle). Handheld device 102 may also perform multiple functions (e.g.,
a device that plays music, displays video, stores pictures, and receives
and transmits telephone calls, such as an iPhone.TM. marketed by Apple
Inc.).
[0020]Handheld device 102 is shown as including display component 104 and
user input component 106. Display component 104 is illustrated in FIG. 1
as a display that is integrated into handheld device 102. Display
component 104, like any other component discussed herein, does not have
to be integrated into handheld device 102 and can also be external to
handheld device 102. For example, display component 104 may be a computer
monitor, television screen, and/or any other graphical user interface,
textual user interface, or combination thereof. Display component 104 may
present graphical displays, such as those discussed below in connection
with FIGS. 4-8. Moreover, while device 102 is not shown with a speaker,
persons killed in the art will appreciate that one or more speakers could
be provided as integrated components in device 102, or as accessory
components, such as headphones.
[0021]User input component 106 is illustrated in FIG. 1 as a click wheel.
One skilled in the art would appreciate that user input component 106
could be any type of user input device that is integrated into or located
external to handheld device 102. For example, user input component 106
could also be a mouse, keyboard, audio trackball, slider bar, one or more
buttons, media device pad, dial, keypad, click wheel, switch, touch
screen, any other input component or device, and/or a combination
thereof. User input component 106 may also include a multi-touch screen
such as that shown in FIG. 2 and described in commonly assigned Westerman
et al., U.S. Pat. No. 6,323,846, issued Nov. 27, 2001, entitled "Method
and Apparatus for Integrating Manual Input," which is incorporated by
reference herein in its entirety.
[0022]FIG. 2 shows computer system 200 which can also be used in
accordance with the present invention. Computer system 200 includes media
device 202. Media device 202 can be any device that is capable of
producing audio output signals including, but not limited to, a portable
media player, an audiobook, an audio player, a video player, a cellular
telephone, a computer, a stereo system, a personal organizer, a hybrid of
such devices, or combinations thereof. Media device 202 may perform a
single function (e.g., a device that plays music, such as some of the
earlier versions of the iPod.TM. marketed by Apple Inc. or Apple's
iPod.TM. Shuffle). Media device 202 may also perform multiple functions
(e.g., a device that plays music, displays video, stores pictures, and
receives and transmits telephone calls, such as Apple's current line of
iPod.TM. products and the iPhone.TM. marketed by Apple Inc.).
[0023]Media device 202 comprises user interface component 204. User
interface component 204 is shown in FIG. 2 as a multi-touch screen that
can function as both an integrated display and user input device. Media
device 202 can also include one or more other user interface components,
such as button 206, which can be used to supplement user interface
component 204.
[0024]Microphone 208 and audio output 210 are respective examples of input
and output components that can be integrated into media device 202. Audio
output 210 is shown as being a speaker integrated into media device 202,
but one skilled in the art would appreciate that an external device (such
as headphones or any other accessory device, including wireless devices
such as Bluetooth earpieces) or a connector can be used to facilitate the
playing back of audio files and/or the audio portion of video and other
multi-media files.
[0025]FIG. 3 illustrates a simplified schematic diagram of circuitry that
can be implemented in a media device or devices, such as those discussed
above in accordance with embodiments of the present invention. Media
device 300 can include control processor 302, storage 304, memory 306,
communications circuitry 308, input/output circuitry 310, display
circuitry 312 and/or power supply 314. One skilled in the art would
appreciate that, in some embodiments, media device 300 can include more
than one of each component or circuitry, and that to avoid
over-complicating the drawing, only one of each is shown in FIG. 3. In
addition, one skilled in the art would appreciate that the functionality
of certain components and circuitry can be combined or omitted and that
additional components and circuitry, which are not shown in FIG. 3, can
be included in media device 300.
[0026]Processor 302 can be configured to perform any function. Processor
302 may be used to run operating system applications, firmware
applications, media playback applications, media editing applications,
and/or any other application.
[0027]Storage 304 can be, for example, one or more storage mediums,
including for example, a hard-drive, flash memory, permanent memory such
as ROM, any other suitable type of storage component, or any combination
thereof. Storage 304 may store, for example, media data (e.g., audio data
files), application data (e.g., for implementing functions on device
200), firmware, wireless connection information data (e.g., information
that may enable media device 300 to establish a wireless connection),
subscription information data (e.g., information that keeps track of
podcasts or audio broadcasts or other media a user subscribes to),
contact information data (e.g., telephone numbers and email addresses),
calendar information data, any other suitable data, or any combination
thereof. The data may be formatted and organized in one or more types of
data files.
[0028]Memory 306 can include cache memory, semi-permanent memory such as
RAM, and/or one or more different types of memory used for temporarily
storing data. Memory 306 can also be used for storing data used to
operate media device applications.
[0029]Communications circuitry 308 can permit device 300 to communicate
with one or more servers or other devices using any suitable
communications protocol. For example, communications circuitry 308 may
support Wi-Fi (e.g., a 802.11 protocol), Ethernet, Bluetooth.TM. (which
is a trademark owned by Bluetooth Sig, Inc.), high frequency systems
(e.g., 900 MHz, 2.4 GHz, and 5.6 GHz communication systems), infrared,
TCP/IP (e.g., any of the protocols used in each of the TCP/IP layers),
HTTP, BitTorrent, FTP, RTP, RTSP, SSH, any other communications protocol,
or any combination thereof.
[0030]Input/output circuitry 310 can convert (and encode/decode, if
necessary) analog signals and other signals (e.g., physical contact
inputs (from e.g., a multi-touch screen), physical movements (from, e.g.,
a mouse), analog audio signals, etc, into digital data. Input/output
circuitry can also convert digital data into any other type of signal or
vice-versa. The digital data can be provided to and received from
processor 302, storage 304, memory 306, or any other component of media
device 300. Although input/output circuitry 310 is illustrated in FIG. 3
as a single component of media device 300, a plurality of input/output
circuitry can be included in media device 300 (as discussed above).
Input/output circuitry 310 can be used to interface with any input or
output component, such as those discussed in connection with FIGS. 1 and
2. For example, media device 300 can include specialized input circuitry
associated with (e.g., one or more microphones, cameras, proximity
sensors, accelerometers, ambient light detectors, etc.). Media device 300
can also include specialized output circuitry associated with output
devices such as, for example, one or more speakers, etc.
[0031]Display circuitry 312 can accept and/or generate signals for
presenting media information (textual and/or graphical) on a display such
as those discussed herein. For example, display circuitry 312 can include
a coder/decoder (CODEC) to convert digital media data into analog
signals. Display circuitry 312 also can include display driver circuitry
and/or circuitry for driving display driver(s). The display signals can
be generated by processor 302 or display circuitry 312. The display
signals can provide media information related to media data received from
communications circuitry 308 and/or any other component of media device
300. In some embodiments, display circuitry 312, like any other component
discussed herein, can be integrated into and/or electrically coupled to
media device 300.
[0032]Power supply 314 can provide power to the components of device 300.
In some embodiments, power supply 314 can be coupled to a power grid
(e.g., a wall outlet, automobile cigarette lighter, etc.). In some
embodiments, power supply 314 can include one or more batteries for
providing power to a portable media device. As another example, power
supply 314 can be configured to generate power in a portable media device
from a natural source (e.g., solar power using solar cells).
[0033]Bus 316 can provide a data transfer path for transferring data to,
from, or between control processor 302, storage 304, memory 306,
communications circuitry 308, and any other component included in media
device 300.
[0034]In some embodiments, media device 300 may be coupled to one or more
other devices (not shown) for performing any suitable operation that may
require media device 300 and any other device to be coupled together.
Media device 300 may be coupled to a host, slave, master and/or accessory
device. The other device may perform operations such as data transfers
and software or firmware updates. The other device may also execute one
or more operations in lieu of media device 300 when, for example, memory
306 does not have enough memory space, or processor 302 does not have
enough processing power to perform the operations efficiently. For
example, if media device 300 is required to alter the audio output of an
audio data file that is too large to be stored in memory 306, another
device that is coupled to media device 300 may execute the alteration.
[0035]Alternatively, the other device may perform one or more operations
in conjunction with media device 300 so as to increase the efficiency of
media device 300. For example, if media device 300 needs to perform
several steps in a process, media device 300 may execute some of the
steps while the other device executes the rest.
[0036]The other device may be a device that is capable of functioning like
media device 300 (e.g., a device that is capable of altering and
producing audio outputs). In some embodiments, a plurality of media
devices may be coupled to another device, and may share data using the
other device as a server.
[0037]Media device 300 may be coupled with another device over a
communications link using any suitable approach. As an example, the
communications link may be any suitable wireless connection. The
communications link may support any suitable wireless protocol such as,
for example, Wi-Fi (e.g., a 802.11 protocol), Bluetooth.RTM., infrared,
GSM, GSM plus EDGE, CDMA, quadband, or any other suitable wireless
protocol. Alternatively, the communications link may be a wired link that
is coupled to both media device 300 and the other device (e.g., a wire
with a USB connector or a 30-pin connector). A combination of wired and
wireless links may also be used to couple media device 300 with another
device.
[0038]FIGS. 4-8 are illustrative displays which may be presented to a user
by a media device, such as the media devices discussed above, in
accordance with various embodiments of the present invention. The
displays shown in FIGS. 4-8 may be presented to a user in response to,
for example, the media device being activated (e.g., turned ON or
awakened from a sleep mode), receiving a user selection of a display
option, receiving a signal from a remote device, and/or any other
stimuli.
[0039]When presented with displays, a user can provide inputs to the media
device using any suitable user input mechanism. As an example, a user may
use an input mechanism to move a highlight region over an option on a
display so as to send an instruction to the media device to select the
option. As another example, the user may use an input mechanism to move a
display icon over an option to choose the option.
[0040]The displays in FIGS. 4-8 may be divided into one or more regions.
For example, FIG. 4 shows display 400 which includes information region
402, header region 404 and options region 406. Different embodiments of
the inventions, may include more or less display regions.
[0041]Information region 402 may include information generated from data
stored in memory of the media device. In some embodiments, animated
and/or static icons can also be included in information region 402.
Header region 404 may include a display title that helps a user
understand the information available in options region 406. For example,
header region 404 as shown in FIG. 4, can include a display title that
identifies the information in region 406 as "Options". The playback
options available to the user when the media device is playing back an
audio track, in this case, are "Play", "Stop", "Pause", "Rewind", etc.
[0042]As shown in FIG. 4, the currently selected option in region 406 is
Pause option 408. Other options that are relevant to the discussion of
one or more embodiments of the present invention can include audio track
option 410 and select a voice option 412. One skilled in the art would
appreciate that the options included in region 406, or any other region
of a display in the present invention, could be arranged and grouped in
any manner, including a vertical list or a two-dimensional table.
Information region 402 may be updated automatically as the user navigates
through the list of options. For example, options region 406 is shown
with pause option 408 highlighted and corresponding information (e.g., an
icon and the time the audio track playing was paused) being presented in
information region 402.
[0043]The media device may allow a user to select an audio track whose
audio output signals the user wishes to adjust. For example, FIG. 5 shows
display 500 that the media device may present to a user in response to
receiving a user selection of alter an audio track option 410. Display
500 may include selectable audio track options associated with alter an
audio track option 410. For example, options region 502 includes audio
track 1, audio track 2 and audio track 3. Any suitable number of audio
track options may be presented in options region 502. In addition,
display 500 may also include a region (as shown) that includes the start
and end times for a given selected track.
[0044]Audio track 1, audio track 2 and audio track 3 may be associated
with audio files that are accessible by the media device. The media
device may receive a user selection of an audio track option if a user
wishes to alter the audio output signals of the audio file that
corresponds to the audio track option. Audio track 1, audio track 2 and
audio track 3 may also be associated with one or more sections of an
audio file. As an example, an audio track option may be associated with a
section of an audio file that corresponds to a section of a recorded
lecture (e.g., a section that includes information on a particular
subject). As another example, an audio track option may be associated
with all the sections of an audiobook file that include a voice recording
for a character in the audiobook. The media device may receive a user
selection of an audio track option that is associated with one or more
sections of an audio file if a user wishes to alter the audio output of
the one or more sections of the audio file that corresponds to the audio
track option.
[0045]Once the media device has received a user selection of an audio
track, the media device may also allow the user to select a voice option
that the media device can use to alter the audio output of the audio
track. For example, in response to a user selection of select a voice
option 412 of FIG. 4, the media device may present display 600. Display
600 may include one or more selectable options associated with select a
voice option 412. For example, options region 602 in display 600 may
include selectable options such as a "female voices" option 604, a "male
voices" option 606, a "celebrity voices" option 608, an "accents" option
610 (for voices with different accents), an "emotions" option 612 (for
voices with different emotions), a "use my voice" option 614, a "download
a voice" option 616 and/or other voice options that, when selected, can
alter the pre-recorded voice(s) when playing back a given prerecorded
audio file. Option region 602 may include any other suitable voice
options.
[0046]Each voice option listed in options region 602 may be associated
with one or more metadata alter files, which may contain information that
describes the voice option. For example, the metadata alter files may be
similar to the metadata alter file discussed below in connection with
FIG. 11. In response to receiving a user selection of a voice option, the
media device may establish that the voice option has been selected by the
user. For example, the media device may establish a tag in the
corresponding metadata alter file to indicate a user selection.
[0047]The media device may receive a user selection of a voice option
listed in options region 602 if a user wishes to use a voice associated
with the voice option to alter the audio output signals of an audio file.
In response to a user selection of any of the voice options listed in
options region 602, the media device may present the user with other
selectable options that are associated with the selected voice option. As
an example, after receiving a user selection of female voices option 604,
the media device may present selectable voice options that are associated
with different female voices (e.g., the voices of a baby, a girl, a
teenager, a young woman, an elderly woman, or any other suitable female
voice option). As another example, in response to receiving emotions
option 612, the media device may present selectable voice options that
are associated with voices that express different emotions (e.g., voices
that convey happiness, sadness, anger, or any other emotion).
[0048]The media device may also receive a user selection of "download a
voice" option 616, if a user wishes to access a voice from a database
remote from the media device. In response to receiving "download a voice"
option 616, the media device may communicate with the another device that
the media device is coupled to so as to transfer the metadata alter files
associated with the user selected voice option from the other device to
the memory of the media device. In some embodiments, the media device may
ask for a user selection of a device the user wants to access a voice
option from if the media device is coupled to more than one other device.
The media device may communicate with other devices using any of the
communications techniques discussed above in connection with FIG. 3.
[0049]FIG. 7 shows display 700, which includes selectable celebrity voice
options that may be presented to a user in response to a user selection
of "celebrity voice" option 606. As shown in FIG. 7, options region 702
in display 700 may include selectable options such as a "Paul McCartney"
option 704, a "Madonna" option 706, and an "Oprah" option 708.
[0050]Instead of selecting one of the available celebrity voices listed in
options region 702, the user may decide to select a different voice
option. Choosing "different voice" option 710 may allow the user to
navigate from display 700 to a display with different voice options so
the user can select a different voice option. As an example, in response
to receiving a user selection of different voice option 710, the media
device may present the user with a display that includes other voice
options besides the voice options listed in display 700 (e.g., display
600, FIG. 6).
[0051]The media device may allow a user to record the user's voice, and to
use the recorded voice to alter audio data files. For example, in
response to receiving a user selection of the "my voice" option (e.g.,
"my voice" option 606, FIG. 6), the media device may present the user
with display 800 shown in FIG. 8. Display 800 may include several
selectable options for recording a user's voice. For example, options
region 802 may include selectable options such as "record" option 804,
"stop recording" option 806, "replay recording" option 808 and "delete
recording" option 810.
[0052]In response to receiving a user selection of "record" option 804,
the media device may execute a recording process associated with "record"
option 804. The recording process may include capturing the voice signals
produced by a user, and storing information describing the voice signal
in a metadata adjust file. For example, the media device may record a
user's voice signals using the process 900 discussed below in connection
with FIG. 9C.
[0053]After selecting record option 804, the user may select "stop
recording" option 806 at any time during the recording process. In some
embodiments, if the media device receives "stop recording" option 806
while executing a step associated with "record option" 804, the media
device may instantly terminate the execution of the step so as to
terminate the recording process. In other embodiments, if the media
device receives "stop recording" option 806 while executing a step
associated with "record" option 804, the media device may finish
performing the step before terminating the recording process.
[0054]When the media device has finished recording voice signals, the
media device may play back the voice signals in response to a user
selection of "replay recording" option 808. The media device may also
receive a user selection of "delete recording" option 810 if the user
wishes to delete a recorded voice. In response, the media device may
remove the voice recording information that is stored in a metadata
adjust file.
[0055]Instead of recording, replaying or deleting a voice, the user may
decide to select a different voice option. The user may select "different
voice" option 812 to do this. In response to receiving a user selection
of "different voice" option 812, the media device may present the user
with a display with different voice options (e.g., display 600, FIG. 6).
[0056]In accordance with some embodiments of the present invention, at
least two approaches may be used to alter the audio output signals of an
audio data file. In one approach, the media device may change the
original frequency of an audio data file to produce audio output signals
containing a second frequency value. In another approach, the media
device may apply a voice signature to a text of an audio data file to
produce altered audio output signals. The two approaches may be
generalized as shown in FIGS. 9A-B. Process 900 begins at step 902. At
step 904, the media device may be activated (e.g., turned ON or awakened
from a sleep mode) either automatically or in response to a user
interaction or a command from another device. For example, the media
device can be an iPod.TM. that is powered down until a user interacts
with, for example, by depressing its click wheel. As another example, the
media device could be a cellular telephone that is activated in response
to receiving a wireless signal from a cellular telephone tower.
[0057]Once the media device is activated, the circuitry of the media
device may present a display to the user at step 906. The display
presented may include options available to the user which are related to
the function the media device is performing or is about to perform.
[0058]At step 908, the media device waits for a user interaction. The user
may interact with the media device using an input component, device or by
any other means. For example, the user may interact with the media device
using the input components discussed above in reference to FIGS. 1-3.
[0059]At step 910, the media device determines whether it has received an
indication of a user interaction. If there is no indication of a user
interaction, process 900 proceeds to step 912, and the media device
determines whether it has waited a predetermined amount of time. The
media device may be configured to wait a specified amount of time for a
user interaction. In some embodiments, the user may indicate the amount
of time the media device has to wait for a user interaction.
[0060]If the media device has not waited for the predetermined amount of
time, process 900 advances back to step 908, and the media device
continues waiting for a user interaction. In some embodiments, the media
device may be configured to display how much time it has been waiting for
a user interaction.
[0061]If, however, the predetermined amount of time has elapsed, process
900 may end at step 914. For example, the media device may automatically
shut down, turn on a screen saver, enter a sleep mode or execute any
other suitable function to conserve battery power.
[0062]Alternatively, if the media device receives a user interaction at
step 910, the media device may verify whether the user wants to
deactivate the media device at step 916. If the user wants to deactivate
the media device, process 900 may end at step 914. This step may be
skipped in some embodiments, or it may be a user-selectable option.
[0063]Conversely, if the user does not wish to deactivate the media
device, process 900 advances to step 918, and the media device determines
whether the user interaction resulted in a command that requires
accessing an audio data file. For example, the user may have selected an
audio data file that is being played by the media device. As another
example, the user may have been presented with a list of available audio
data files, and may have selected one.
[0064]The media device may receive a command to access an audio data file
that is stored in memory or on another device. An audio data file
accessed by the media device may be any electronic file that contains
audio data. Audio data files accessed by the media device may be
formatted as *.m4p files, *.wav files, *.mp3 files, *.wma files, etc. Any
other suitable format may be used.
[0065]When the media device determines that it has to access an audio data
file in response to the user interaction, the media device accesses the
appropriate storage device and retrieves the audio data file at step 920.
If the audio data file to be retrieved is available on another device,
the media device may communicate with the other device using any of the
communication techniques discussed above in connection with FIG. 3 so as
to transfer the audio data file from the other device to the memory of
the media device.
[0066]At step 922, the media device establishes whether the audio data
file is associated with a metadata alter file. A metadata alter file in
accordance with embodiments of the present invention may be an electronic
file that contains dynamic information that can be used to alter an audio
data file. In some embodiments, a metadata alter file can contain
frequency value information that may be used to change the frequency
value of an audio data file. In other embodiments, a metadata alter file
can include data that describes a voice signature that may be applied to
the text of an audio data file. An audio data file accessed by the media
device may include pointers to any or all corresponding metadata alter
files. A metadata alter file can also contain pointers to any or all
corresponding audio data files.
[0067]If the media device determines that there are no metadata alter
files associated with the audio data file, process 900 proceeds to step
924. At step 924, the media device may generate audio output signals
based on the data in the audio data file. From step 924, process 900 ends
at step 914.
[0068]Conversely, if the media device determines at step 922 that there is
a metadata alter file associated with the audio data file, process 900
advances to step 926 where the media device retrieves the associated
metadata alter file. After retrieving the associated metadata alter file,
the media device may determine whether the metadata alter file includes
frequency value data at step 928.
[0069]In response to determining at step 928 that the metadata alter file
includes frequency value data, process 900 advances to step 930. At step
930, the media device may change the frequency of the audio data file to
the frequency value in the metadata alter file. The media device may
contain an oscillator which may execute step 930. After changing the
frequency of the audio data file, the media device may generate audio
output signals at step 932. Process 900 then ends at step 934.
[0070]Returning to step 928, if the media device determines that the
metadata alter file does not include frequency data, process 900 proceeds
to step 936. At step 936, the media device determines whether the
metadata alter file includes voice signature data. If the metadata alter
file does not include voice signature data, the media device generates a
display with an error message at step 938. The message displayed may
alert the user that the information needed to alter the audio data file
is not available. Process 900 then returns to step 906 and the display
generated at step 938 is presented to the user.
[0071]In response to determining at step 936 that the metadata alter file
includes voice signature data, process 900 advances to step 940. At step
940, the media device may normalize the text of the audio data file. In
some embodiments, normalizing the text may involve converting it from
written form into spoken form. For example, symbols and numbers in the
text may be converted into spoken form (e.g., "$500" may be converted to
"five hundred dollars"). As another example, abbreviations may be
converted into spoken form (e.g., "etc" may be converted to "et cetera").
Step 940 may also involve the removal of punctuation from the text, for
ease of normalization.
[0072]At step 942, the normalized text from step 936 may be assigned
phonetic transcriptions and converted into phoneme. At step 944, the
phoneme from step 938 may be divided into speech units. The speech units
may represent the speech units that will be included in the audio output
signals that are generated at step 932. As an example, a sentence such as
"Where does your grandmother live, Little Red Riding Hood?" may be
divided into two speech units comprising "Where does your grandmother
live" and "Little Red Riding Hood?." The speech units created at step 944
may be stored in a queue in the metadata alter file.
[0073]After dividing the normalized text into speech units, process 900
proceeds to step 946, and the media device selects the next speech unit
in queue. At step 948, the media device may determine whether all the
speech units in the queue in the metadata alter file have been selected.
If all the speech units have been selected, process 900 ends at step 934.
If there is at least one speech unit that has not been selected, process
900 advances to step 950. At step 950, the media device may apply the
voice signature data in the metadata alter file to the selected speech
unit. After applying the voice signature data to a speech unit, audio
output signals of the speech unit may be produced at step 932. Process
900 then ends at step 934.
[0074]Various speech recognition and speech synthesis systems may be used
to execute process 900. For example, Apple Mac.TM. computers marketed by
Apple Inc., contain speech recognition and speech synthesis technologies
that may be used to perform process 900. Additionally, in some
embodiments, the media device may be coupled to another device, which may
perform one or more of the steps in process 900.
[0075]Returning to step 918, if the user interaction received at step 910
does not require data from an audio data file, process 900 proceeds to
step 952. At step 952, the media device determines whether the user
interaction includes a record request.
[0076]If the user interaction does not include a record request, the media
device may generate a display based on the user interaction at step 954.
As an example, the media device may generate a display that asks if the
user would like to provide another user interaction.
[0077]If the user interaction includes a record request (e.g., the user
selected my voice option 606, FIG. 6), process 900 advances to step 956,
and the media device may activate one or more input components or
devices. For example, the media device may activate the input components
and devices discussed above in reference to FIGS. 1-3.
[0078]After activating the input component(s) or device(s), the media
device may capture the voice signals the user wishes to record as an
analog signal at step 958. At step 960, the media device may convert the
analog signal into a digital signal and may store the digital signal as
an audio data file. The media device may also create and save one or more
metadata alter files in memory. A metadata alter file may include one or
more data fields which may store variable information. The variable
information may describe data that can be used to alter an audio data
file (e.g., data that describes a voice signature).
[0079]At step 962, the media device may select the next variable listed in
the corresponding metadata alter file. At step 964, the media device may
determine whether all the variables in the metadata alter file have been
selected. If all the variables have been selected, process 900 ends at
step 966.
[0080]If at least one variable has not been selected, process 900 advances
to step 968, and the media device generates a variable value for the
selected variable. The media device may include any suitable mechanism
for determining variable values. As an example, the media device may
include a counter (not shown) that may measure the speech rate of a
digital signal. Other suitable mechanisms may be used to measure other
variables. Next, the media device may store the variable value in the
appropriate data field in the metadata alter file at step 970. Process
900 then returns to step 962, and the media device selects the next
variable in the metadata alter file.
[0081]The processes discussed above are intended to be illustrative and
not limiting. One skilled in the art would appreciate that steps of the
processes discussed herein may be omitted, modified, combined, and/or
rearranged, and any additional steps may be performed without departing
from the scope of the invention.
[0082]In the processes described above, the media device may store audio
data files in memory. FIG. 10 shows an example audio data file 1000 in
the form of an XML file. Any other suitable format may be used to define
audio data file 1000. Audio data file 1000 may be associated with
filename 1002, which may uniquely identify audio data file 1000. For
example, filename 1002 uniquely identifies audio data file 1000 as the
Little Red Riding Hood audio data file. Body 1004 of audio data file 1000
may include one or more tags that describe audio data file 1000. Examples
may include the "name" tag (specifying the audio data file name), the
"file size" tag (specifying the file size), and the "sampling frequency"
tag (specifying the sampling frequency).
[0083]The start and end times of the audio data file may also be specified
in "start" and "end" tags respectively. If sections of the audio data
file belong to a same category, tags may be used to identify these
sections. As an example, an audiobook audio data file may contain
categories that correspond to the characters in the audiobook. The
"Little Red Riding Hood" category might, for instance, define the
sections of the audio data file that contain the voice of the Little Red
Riding Hood character. Tags may be used to define the start and end times
of each section that belongs to a category. For example, "LRRH_start_n"
and "LRRH_end_n" tags may identify the start and end times of the nth
section of the "Little Red Riding Hood" category.
[0084]Body 1004 of audio data file 1000 may also include an indication of
the sections of the audio data file that a user wants to alter. As an
example, tags "alter_start_n" and "alter_end_n" may signify the start and
end times of the nth section of the audio data file that a user wants to
alter. If a user selects a section of the audio data file to alter, the
media device may create tags in audio data file 1000 to specify the
section that has been selected. Body 1004 may also include a pointer that
identifies a metadata alter file that is associated with the audio data
file. As example, the tag "alterfile_section_n" may designate the
metadata alter file that is associated with section n of the audio data
file.
[0085]In some embodiments, a user may provide the input for both tags
"alter_start_n" and "alter_end_n". For example, a user may use the media
device to supply the media device with the start and end times of
selected audio data files. In other embodiments, an audio data file may
contain a first tag that labels the beginning of the audio data file as
the beginning of a user-selected audio data file (e.g., the
"alter_start_n" tag). While a user is playing the audio data file, the
user may instruct the media device to establish a second tag to indicate
the end of the user-selected audio data file (e.g., the "alter_end_n"
tag).
[0086]A user may use different techniques to select an audio data file. In
some embodiments, a user may be presented with a list of available audio
data files prior to playing an audio data file. The user may select an
audio data file by instructing the media device to establish one or more
tags the audio data file. The media device may establish tags in an audio
data file to indicate the start and end times of a selected audio data
file (e.g., "alter_start_n" and "alter_end_n" discussed above). In other
embodiments, a user may select an audio data file as an audio output of
the audio data file is being produced. For example, while a user is
playing an audio data file, the user may instruct the media device to
establish one or more tags in the audio data file. The media device may
establish tags in an audio data file to indicate the start and end times
of a selected audio data file (e.g., "alter_start_n" and "alter_end_n"
discussed above). If a user decides not to select an audio data file, the
user may also instruct the media device to remove one or more tags
previously created in the audio data file.
[0087]In addition to audio data files, the media device may also store
metadata alter files in memory. FIG. 11 shows data structure 1100 of a
metadata alter file. Although data structure 1100 takes the form of a
table in the example of FIG. 12, any other suitable data structure may be
used in other embodiments. Data structure 1100 may include a tag column
1102 and one or more corresponding speech unit value columns. A speech
unit may be defined as a word, a phrase, a sentence or any other suitable
speech entity. As an example, data structure 1100 contains n speech unit
value columns corresponding to the n speech units stored in the metadata
alter file.
[0088]Tag column 1102 may contain suitable variables that may be used to
define the voice signature. For example, tag column 1102 may include
variables such as pitch, speech rate, tone, frequency, timbre and
intonation. Tag column 1102 may contain any other suitable variable that
may be used to characterize the voice signature. Data structure 1100 may
include speech unit 1 value column 1104, speech unit 2 value column 1106
and speech unit n value column 1108. The speech unit value columns may
include the corresponding unique values associated with the variables
listed in tag column 1102. As an example, speech unit 1 value column 704,
speech unit 2 value column 1106 and speech unit n value column 1108 may
include the values 220 Hz, 302 Hz and 192 Hz for the pitch of speech unit
1, speech unit 2 and speech unit n respectively. As another example, the
speech rates of speech unit 1, speech unit 2 and speech unit n may be
recorded as 112 words/minute, 120 words/minute and 135 words/minute
respectively.
[0089]The above described embodiments of the present invention are
presented for purposes of illustration and not of limitation, and the
present invention is limited only by the claims which follow.
* * * * *